Published January 29, 2026

How AI21 Labs Tackles RAG System Chunking Challenges

Chunk Size Depends on the Query: How AI21 Labs Proposes Solving a Major RAG System Challenge

AI21 Labs demonstrated that a single «chunk» size in RAG systems is a compromise and proposed a simple way to adapt text segmentation to the user's query type.

Development
Event Source: AI21 Labs Reading Time: 4 – 6 minutes

When working with RAG systems–those that retrieve specific information from documents and feed it to a language model for an answer–you will eventually face a choice: into what size pieces should the text be broken before indexing? This is called «chunking», and the size of these pieces determines how accurately the system finds what is needed.

Usually, people select one size and apply it to all documents. The problem is that different queries require different approaches. AI21 Labs investigated this issue and proposed a solution that doesn't require overhauling the entire system.

Why Chunk Size Matters in RAG Systems

Why Size Matters

Imagine you have a long document. If you break it into tiny pieces–a couple of sentences each–then every piece will be very precise and specific. This is good for narrow, detailed questions: «What date is listed in section 3.2»? The system will quickly find the necessary sentence.

However, if the question is broad–for example, «What are the key findings of this report?"–tiny pieces won't help. The information will likely be scattered across dozens of fragments, and the system will either miss the context or retrieve something secondary.

Large chunks work the other way around. They cover more context and are better suited for summarization questions. But if the question is specific, a large piece of text might «add noise» to the result–relevant information could get lost amidst general reasoning.

Simply put: there is no universal size. It's always a compromise.

What AI21 Labs Proposes for Adaptive Chunking

What AI21 Labs Proposes

The AI21 team decided not to choose just one size, but to use several simultaneously. The idea is simple: index documents with different levels of granularity–say, small chunks of 128 tokens, medium ones of 512, and large ones of 2048. Then, at the moment of the query, determine which scale fits best.

To do this, they use a classifier–a small model that analyzes the user's query and decides: does this question require a detailed search or general understanding? Depending on the answer, the system turns to the appropriate set of chunks.

The classifier is trained on examples of queries labeled by humans. It doesn't try to guess the «correct» size in an absolute sense–it simply selects the one that is more likely to yield a useful result for that type of question.

How Adaptive Chunking Works in Practice

How It Works in Practice

AI21 tested the approach on several popular benchmarks for RAG systems. The results showed that adaptive chunk size selection improves search quality compared to a fixed size–regardless of which fixed size was used.

The effect is especially noticeable for queries that explicitly require either a very detailed or a very general answer. In such cases, a single-chunk-size system often misses the mark. The multi-scale approach allows covering both scenarios without loss of quality.

An important point: this method doesn't require complex infrastructure. You don't need to rewrite the data processing pipeline or implement any exotic algorithms. It's enough to index the documents several times with different settings and add a lightweight classifier at the input.

Potential Pitfalls of Adaptive Chunking

Where the Pitfalls Might Be

Of course, there are nuances. First, storage. If you index documents at three different scales, the volume of data in the vector database will grow roughly threefold. For large corpora, this can be significant.

Second, the classifier is yet another component that needs training and maintenance. If your queries differ significantly from those the model was trained on, accuracy may drop. Fine-tuning on your own data might be required.

Third, not all queries clearly divide into «detailed» and «general». There are intermediate cases where the choice of scale isn't obvious. In such situations, the system might make a mistake, and the result could be worse than with a universal medium size.

Why Adaptive Chunking is Necessary for RAG Systems

Why It Is Necessary

Chunking is one of those things that seem like a technical detail until you encounter a real task. When a RAG system operates in production (real-world operating conditions), and whether the user finds the necessary information depends on it, search quality becomes critical.

AI21's approach doesn't aim for a revolution. It is rather a practical improvement that can be implemented without major overhauls. If you already have a RAG system and observe that it works well for some queries but not for others, the issue might be precisely that the chunk size doesn't fit all question types.

The multi-scale approach provides a way to fix this. It's not perfect, but it's better than guessing which single size to pick and hoping it works for everyone.

#applied analysis #methodology #neural networks #engineering #data #human–machine interaction #model optimization
Original Title: Chunk size is query-dependent: a simple multi-scale approach to RAG retrieval
Publication Date: Jan 29, 2026
AI21 Labs www.ai21.com An Israeli company building large language models and AI tools for working with text.
Previous Article Theorizer: How AI Learns to Formulate Scientific Laws from Thousands of Papers Next Article YouTube Now Allows Creators to Make Shorts Using AI Avatars

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.5 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.5 Anthropic
2.
Gemini 3 Pro Preview Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 3 Pro Preview Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

AI: Events

How Agentic Models Are Trained After Base Training

Technical context Development

MiniMax has discussed its approach to fine-tuning language models that do more than just answer questions – they execute complex tasks by interacting with tools.

MiniMaxwww.minimax.io Jan 22, 2026

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!

Subscribe