Published on March 27, 2026

How Small Language Models Outperform Large Ones with Long Texts

When Documents Are Too Long: How Small Models Can Outperform Large Ones

Researchers have demonstrated that small language models can outperform GPT-4o when processing long texts by breaking down tasks and distributing the work among multiple agents.

Research 5 – 7 minutes min read

Event Source: Together.ai 5 – 7 minutes min read

The longer the text, the worse language models tend to handle it. This might seem counterintuitive, as manufacturers constantly boast about the ever-expanding “context windows” of their systems. However, a large context window and effective processing of a long text are not the same thing.

This very contradiction is the foundation of an idea researchers have dubbed “Divide and Conquer.” Simply put, instead of feeding the entire document to the model at once and hoping for the best, the document is broken down into parts and processed in parallel by multiple instances of the model simultaneously.

Why Large Language Model Context Isn't Always Good

Why a Large Context Isn't Always a Good Thing

When a model receives a very long text, it doesn't always “remember” everything written in it equally well. A well-documented problem is that information in the middle of a document is processed less effectively than information at the beginning or end. This phenomenon is known as the “lost in the middle” effect.

Moreover, processing a long context inherently requires significantly more computational resources. The longer the text, the more expensive and slower the model operates.

As a result, even a very powerful model, when given a long document “head-on,” can make mistakes or miss important details simply because this approach makes it a fundamentally difficult task.

Three Roles for Processing Long Documents

Three Roles Instead of One

The core idea behind this framework is that the task isn't solved by a single model in one pass. Instead, a small “team” of three types of participants is assembled.

The Planner – studies the document and decides how to break it into logical parts. It also formulates a specific question or task for each part.
The Workers – each receives its fragment and answers the assigned question independently of the others. They all work in parallel.
The Manager – gathers the answers from all the workers and formulates the final conclusion.

This setup is reminiscent of how any team works on a large project: one person breaks the task down, others take on their respective pieces, and a third brings it all together.

Results of Divide and Conquer Experiments

What the Experiments Showed

Researchers tested this approach on several tasks where models need to work with long documents: answering questions about large texts, finding specific information across multiple files, and drawing conclusions from disparate data.

The results were surprising, even exceeding modest expectations. Relatively small models – like Llama-3-70B and Qwen-72B – when utilizing this framework, outperformed GPT-4o operating in the standard “single query, single answer” mode. This wasn't the case in every test or by a huge margin, but it was systematic and reproducible.

In short: a less powerful model with a proper workflow can outperform a more powerful one working alone.

Parallel Processing for Long Text Analysis

Parallelism Is the Key

An important detail that's easy to miss: the worker agents process their fragments simultaneously, not one after another. This means the total response time doesn't grow proportionally to the document's length – it remains roughly constant as long as there are enough computational resources.

This fundamentally distinguishes the approach from a naive “summarize-by-parts” method, where the model simply reads the text in chunks sequentially. In that case, the time increases linearly. Here, it does not.

Practical Applications of the Divide and Conquer Method

This Isn't Just an Experiment for Experiment's Sake

The practical significance of this work becomes clear when you consider real-world scenarios. Imagine needing to analyze a hundred-page contract, find contradictions in a large set of documents, or answer questions based on extensive technical documentation.

Currently, most systems either truncate such texts to an acceptable size (losing some information) or use expensive models with large context windows. The “Divide and Conquer” framework offers a third way: employing a simpler model, but organizing its work more intelligently.

This, by the way, fits well into a broader trend currently being actively discussed in the industry – the trend toward agentic systems. The idea is that instead of one super-intelligent model, it's sometimes better to use several more modest ones, each solving its own part of the task. And ai-stat.ru confirms this in practice: Andrej Karpathy, with his autoresearch project, demonstrated a similar logic – an agent divides the work into iterations and achieves results not through raw power, but through proper process organization.

Limitations of the Divide and Conquer Approach

Where Questions Still Remain

The approach is not without its limitations, and the paper's authors are transparent about them.

First, not all tasks are easily divisible. If an answer requires keeping the entire document in mind at once – for example, to track the overarching logic of an argument – splitting it into parts can be detrimental.

Second, the planner's performance is critical. If it incorrectly defines the fragment boundaries or formulates the task for the worker agents unclearly, the final result will suffer, regardless of how well the others perform.

Third, this scheme involves multiple calls to the model, which can be more expensive in terms of cost than a single large query, even if it's faster in terms of time.

Finally, how well this scales to documents with millions of tokens remains an open question. The experiments were conducted on finite volumes, and the system's behavior beyond those limits is unknown.

Key Takeaways on Language Model Performance for Long Texts

In Conclusion

The main conclusion of this work can be summarized as follows: model size is not the only variable. How the work on a task is organized is just as important as how powerful the model itself is.

For those who work with large volumes of text or are considering automating document workflows, this is a practically significant result. You don't need to chase the most expensive model – sometimes, it's enough to build the right interaction architecture.

#analysis #machine learning #ai development #engineering #scaling #multi-agent systems #large language model optimization

Link to Original: https://www.together.ai/blog/plan-divide-conquer

Original Title: Plan, divide, and conquer: How weak models excel at long context tasks

Publication Date: Mar 26, 2026

Together.ai www.together.ai A U.S.-based platform for running and scaling open AI models.

Previous Article How Scammers Industrialize Trust: From Celebrity Deepfakes to Avatar Farms Next Article AI Agents Under Threat: What a Review of 30,000 Skills in the Alibaba Cloud Catalog Revealed

How Small Language Models Outperform Large Ones with Long Texts

Why Large Language Model Context Isn't Always Good

Three Roles for Processing Long Documents

Results of Divide and Conquer Experiments

Parallel Processing for Long Text Analysis

Practical Applications of the Divide and Conquer Method

Limitations of the Divide and Conquer Approach

Key Takeaways on Language Model Performance for Long Texts

Related Publications

Mercury 2: Fast AI Models and the First Steps Towards a Personal Assistant

DeepSeek on New NVIDIA Hardware: What's Changed for Long-Text Processing

EvoClaw: A New Benchmark for Testing AI in Real-World Development

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration