When companies release AI tools for science, they typically focus on their capabilities: what the system can do, the problems it solves, and its accuracy. However, the question of how scientists actually use these tools in their daily work often remains unanswered. Specialists from Allen AI decided to investigate this and published an analysis of over 250,000 real queries submitted to their scientific AI tools.
Откуда данные и почему это важно для разработчиков ИИ
Where the Data Comes From and Why It Matters
Allen AI is a non-profit research laboratory that develops AI tools specifically for the scientific community. Among their offerings are Semantic Scholar (a search engine for scientific papers) and several specialized services that assist researchers in working with literature.
The dataset, which they named ASTA (Academic Search and Task Analysis), comprises queries from real users – scientists, students, and analysts. Simply put, this isn't synthetic data or lab testing; it's verbatim what people typed when they needed help with scientific texts.
Why is this valuable? Because developers of AI tools often build systems based on assumptions about how they will be used. Reality, however, frequently proves otherwise. Analyzing real queries is a way to validate expectations against actual usage.
Какие задачи ученые решают с помощью ИИ-инструментов
What People Are Actually Asking
The first striking observation is that most queries don't resemble simple search phrases like, «find an article about X».People formulate tasks – in detail, with context, sometimes almost like an email to a colleague.
The researchers identified several main types of interaction:
- Literature Search – finding papers on a topic, often with specific conditions such as time period, methodology, or area of application.
- Comprehension and Explanation – «explain what this term means», «what's the difference between these approaches?» or «briefly summarize the article»./li>
- Comparison and Synthesis – «how do different researchers approach this problem?» or «what does the literature say about this issue overall?»
- Writing Assistance – phrasing, structure, and finding suitable citations.
This is an important observation: people perceive scientific AI tools not as mere search engines, but rather as thinking assistants to whom they can explain a task and receive a meaningful answer. The distinction is fundamental, and it influences how such systems should be designed.
Почему поисковые запросы к ИИ сложнее, чем кажутся
Queries Are More Complex Than They Seem
Another key finding is that a significant portion of queries are multifaceted. A user isn't just looking for an article – they want to find an article, understand its place within the context of the field, and get a brief summary. All in one go.
This presents a real challenge for AI systems. Processing a single, clear query is a straightforward task. But when a person formulates something like, «show me the latest papers on topic X, explain the main disagreements in this field, and help me understand if I should read paper Y», it becomes a complex set of subtasks requiring different capabilities.
According to the data, these complex queries constitute a significant portion of actual use. This implies that tools designed solely for simple searches do not meet the real needs of scientists.
Кто использует ИИ в науке и как это выглядит на практике
Who Uses Them and What It Looks Like in Practice
The audience for these tools turned out to be broader than one might expect. Alongside experienced researchers, the system is actively used by students and individuals just entering a new field of knowledge. For them, an AI tool often becomes the first point of entry – a way to quickly orient themselves on an unfamiliar topic before diving deep into reading.
This changes the perspective on who these tools are actually made for. While one might have previously assumed they were primarily used by experts needing to quickly find a specific paper, the reality is more nuanced. A significant portion of users are in the process of learning, and they require more than just a list of relevant documents; they need assistance with understanding.
Что исследование значит для разработчиков ИИ-инструментов
What This Means for AI Tool Developers
Allen AI is publishing this dataset with open access – and this is perhaps the main practical value of the publication. Any team developing tools for working with scientific texts can now rely on real usage patterns instead of building hypotheses.
The conclusions that suggest themselves here are quite specific:
- Tools must be able to handle complex, multifaceted queries – not just simple search phrases.
- Explanation and synthesis are not optional features but fundamental user needs.
- A large part of the audience consists of people who are just getting to grips with a topic, rather than established experts. The interface and the logic of the responses must take this into account.
To put it simply: if you build a scientific AI assistant based on how scientists really work with it, you get one kind of product. If you base it on assumptions, you'll likely get another – and it's not guaranteed to be useful.
Открытые вопросы по использованию ИИ в научной деятельности
Open Questions
For all its value, this research has its natural limitations. The queries were collected on specific Allen AI platforms, which means the sample reflects this particular audience and these specific tools. The behavior of users on other systems may differ.
Furthermore, the analysis reveals what people ask but doesn't always explain why they ask it that way. Why do some prefer detailed queries while others use short ones? To what extent does the phrasing depend on habit, the interface, or previous experience with AI? These questions remain open.
But even with these caveats, having 250,000 real examples of how scientists interact with AI is significantly better than building systems in an informational vacuum. Such data gradually shifts development from intuition to an evidence-based approach – and this is, perhaps, exactly the direction we should be moving in.