The longer the text, the worse language models tend to handle it. This might seem counterintuitive, as manufacturers constantly boast about the ever-expanding “context windows” of their systems. However, a large context window and effective processing of a long text are not the same thing.
This very contradiction is the foundation of an idea researchers have dubbed “Divide and Conquer.” Simply put, instead of feeding the entire document to the model at once and hoping for the best, the document is broken down into parts and processed in parallel by multiple instances of the model simultaneously.
Why a Large Context Isn't Always a Good Thing
When a model receives a very long text, it doesn't always “remember” everything written in it equally well. A well-documented problem is that information in the middle of a document is processed less effectively than information at the beginning or end. This phenomenon is known as the “lost in the middle” effect.
Moreover, processing a long context inherently requires significantly more computational resources. The longer the text, the more expensive and slower the model operates.
As a result, even a very powerful model, when given a long document “head-on,” can make mistakes or miss important details simply because this approach makes it a fundamentally difficult task.
Three Roles Instead of One
The core idea behind this framework is that the task isn't solved by a single model in one pass. Instead, a small “team” of three types of participants is assembled.
- The Planner – studies the document and decides how to break it into logical parts. It also formulates a specific question or task for each part.
- The Workers – each receives its fragment and answers the assigned question independently of the others. They all work in parallel.
- The Manager – gathers the answers from all the workers and formulates the final conclusion.
This setup is reminiscent of how any team works on a large project: one person breaks the task down, others take on their respective pieces, and a third brings it all together.
What the Experiments Showed
Researchers tested this approach on several tasks where models need to work with long documents: answering questions about large texts, finding specific information across multiple files, and drawing conclusions from disparate data.
The results were surprising, even exceeding modest expectations. Relatively small models – like Llama-3-70B and Qwen-72B – when utilizing this framework, outperformed GPT-4o operating in the standard “single query, single answer” mode. This wasn't the case in every test or by a huge margin, but it was systematic and reproducible.
In short: a less powerful model with a proper workflow can outperform a more powerful one working alone.
Parallelism Is the Key
An important detail that's easy to miss: the worker agents process their fragments simultaneously, not one after another. This means the total response time doesn't grow proportionally to the document's length – it remains roughly constant as long as there are enough computational resources.
This fundamentally distinguishes the approach from a naive “summarize-by-parts” method, where the model simply reads the text in chunks sequentially. In that case, the time increases linearly. Here, it does not.
This Isn't Just an Experiment for Experiment's Sake
The practical significance of this work becomes clear when you consider real-world scenarios. Imagine needing to analyze a hundred-page contract, find contradictions in a large set of documents, or answer questions based on extensive technical documentation.
Currently, most systems either truncate such texts to an acceptable size (losing some information) or use expensive models with large context windows. The “Divide and Conquer” framework offers a third way: employing a simpler model, but organizing its work more intelligently.
This, by the way, fits well into a broader trend currently being actively discussed in the industry – the trend toward agentic systems. The idea is that instead of one super-intelligent model, it's sometimes better to use several more modest ones, each solving its own part of the task. And ai-stat.ru confirms this in practice: Andrej Karpathy, with his autoresearch project, demonstrated a similar logic – an agent divides the work into iterations and achieves results not through raw power, but through proper process organization.
Where Questions Still Remain
The approach is not without its limitations, and the paper's authors are transparent about them.
First, not all tasks are easily divisible. If an answer requires keeping the entire document in mind at once – for example, to track the overarching logic of an argument – splitting it into parts can be detrimental.
Second, the planner's performance is critical. If it incorrectly defines the fragment boundaries or formulates the task for the worker agents unclearly, the final result will suffer, regardless of how well the others perform.
Third, this scheme involves multiple calls to the model, which can be more expensive in terms of cost than a single large query, even if it's faster in terms of time.
Finally, how well this scales to documents with millions of tokens remains an open question. The experiments were conducted on finite volumes, and the system's behavior beyond those limits is unknown.
In Conclusion
The main conclusion of this work can be summarized as follows: model size is not the only variable. How the work on a task is organized is just as important as how powerful the model itself is.
For those who work with large volumes of text or are considering automating document workflows, this is a practically significant result. You don't need to chase the most expensive model – sometimes, it's enough to build the right interaction architecture.