Published January 23, 2026

How Dynamic Data Snoozing Saves on AI Training Costs

How “Snoozing” Data Helps Save on AI Training Costs

Researchers at AI21 Labs have devised a method to reduce online reinforcement learning costs by sending some data to «take a nap» until better times.

Research
Event Source: AI21 Labs Reading Time: 4 – 6 minutes

Online reinforcement learning is an expensive endeavor. The model constantly interacts with its environment, gathers new data, learns from it, and then continues the cycle. This process is continuous, and every step requires computational resources, especially when it involves large language models or complex tasks.

A team from AI21 Labs proposed an unexpected solution: what if we temporarily “put to sleep” part of the data? Not delete it, not ignore it completely, but simply postpone its use until later – when it is truly needed. The idea, called Dynamic Data Snoozing, allows for noticeably cutting training costs without compromising quality.

Why Dynamic Data Snoozing is Essential

Why “Snooze” Anything at All?

In classic online reinforcement learning, the agent collects experience and immediately uses it to update the model. The problem is that not all data is equally useful at every moment. Some examples are critical right now, while others can wait – they will become relevant later, when the model has developed enough to understand more complex patterns.

Typically, this problem is addressed using a replay buffer – a memory buffer where all data is stored, and the model periodically revisits old examples. However, this requires significant memory and computation. Dynamic Data Snoozing goes further: instead of indiscriminately storing everything and constantly sifting through it, the system decides for itself which data to set aside and when to “wake it up.”

Dynamic Data Snoozing in Practice

How It Works in Practice

The essence of the method is simple. When the model receives a new example, it evaluates how useful it is at that moment. If its utility is low, the example is sent “to sleep” for a certain period. When this period expires, the data returns to the active sample, and the model can use it again.

The key aspect is dynamics. The system doesn't just put data aside for a fixed period; instead, it adaptively adjusts the “sleep” time to the current state of training. If the model develops quickly, data might “wake up” sooner. If it gets “stuck” at one level, it might wake up later.

All of this happens automatically, without manual tuning. The algorithm determines on its own when an example will become maximally useful.

Benefits of Dynamic Data Snoozing

What This Yields

Researchers tested the method on several tasks and observed intriguing results. On average, the amount of data needing processing at each step decreased by 30–50%. Concurrently, training quality remained at the same level, and in some cases even improved – because the model focused on truly important examples.

Simply put, instead of processing everything, the system learned to work selectively. This reduces training time and lowers the load on computational resources.

The advantage is particularly noticeable in tasks where data varies greatly in complexity. For example, if at early stages the model isn't yet ready to understand complex patterns, it defers them until it can extract benefit from them. This helps avoid overload and allows for a focus on gradual development.

Limitations of Dynamic Data Snoozing

Where the Limits Lie

The method works well when data is heterogeneous and the model progresses through distinct training stages. If the task is simple and all examples are of roughly equal complexity, the effect will be smaller – simply because there isn't much to set aside.

Another nuance is the necessity of a correct metric for evaluating data utility. If the system incorrectly identifies which examples are important, it might “snooze” something necessary or, conversely, keep active something that is not yet meaningful. In the AI21 Labs study, they used several heuristics but acknowledge that each task might require its own specific setup.

It is also worth noting that the method is oriented toward online learning. For offline scenarios, where all data is known in advance, the approach might be less relevant – there, it is simpler to pre-sort examples by complexity.

Importance of Dynamic Data Snoozing

Why This Matters

Online reinforcement learning is increasingly being used in real-world applications: recommendation systems, robot control, adaptive interfaces. In these domains, the model must learn on the fly, rather than from a pre-collected dataset.

The problem is that such training is expensive. Every new example requires computation, and if there is a lot of data, costs quickly escalate. Dynamic Data Snoozing demonstrates that one can train more efficiently without sacrificing quality. This is especially important for companies working with large models and limited budgets.

Furthermore, the method opens the door to more flexible data management strategies. If the system can decide for itself when and how to use examples, this reduces the need for manual tuning and makes learning more autonomous.

Future of Dynamic Data Snoozing

What's Next

For now, Dynamic Data Snoozing is a research endeavor, and it's uncertain how quickly the method will “migrate” to production. However, the idea is logical and practical, so the chances are high.

It will also be interesting to see how this approach combines with other optimization techniques – for example, with curriculum learning (learning by increasing complexity) or data compression methods. Perhaps a combination of several strategies will yield an even greater effect.

In any case, this is another step toward making AI training not only powerful but also accessible. When you can save 30–50% of resources simply by teaching the system to defer data for later, that is a significant achievement.

#analysis #research review #machine learning #ai training #engineering #scaling #data #model_scaling
Original Title: When sleeping in saves you money: dynamic data snoozing for efficient online RL
Publication Date: Jan 22, 2026
AI21 Labs www.ai21.com An Israeli company building large language models and AI tools for working with text.
Previous Article How Agentic Models Are Trained After Base Training Next Article How to Teach AI to Properly Read Arabic and Hebrew PDF Files

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.5 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.5 Anthropic
2.
Gemini 3 Pro Preview Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 3 Pro Preview Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Researchers have shown that «memory attacks» on neural networks only work with prior knowledge. Without it, these models become impregnable fortresses.

Professor Lars Nielsen Oct 16, 2025

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe