Published February 27, 2026

Mercury 2: Diffusion Language Models Get a Major Upgrade

Inception Labs has released Mercury 2, a new generation of diffusion language models that generate text in a fundamentally different way than the AI assistants we are accustomed to.

Products
Event Source: Inception Reading Time: 4 – 6 minutes

Most modern language models operate on a single principle: they generate text word by word, from left to right. This approach is called autoregressive: the model predicts the next token each time, based on everything that came before. It works well, but this method has its limitations: generation speed is bottlenecked by the fact that each step depends on the previous one, making it impossible to perform them in parallel.

Inception Labs has taken a different path. Their Mercury series models use a diffusion approach to text generation – the same one that popularized image generators like Stable Diffusion. In this case, it's applied not to images, but to text. Simply put, the model doesn't write text sequentially but gradually “clarifies” it from a noisy state, much like a photographer developing a picture in a darkroom.

Overview of Mercury Diffusion Models and Their Purpose

What Is Mercury and Why Is It Needed?

The first generation of Mercury already demonstrated that the diffusion approach to text models is viable. The main advantage of such models is speed: they can generate text significantly faster than their classic autoregressive counterparts because they can process multiple parts of the text in parallel.

Mercury 2 is the next step. Inception Labs describes it as a significant leap in quality while maintaining the same high-speed performance. In short: the model has become smarter without sacrificing speed.

Key Features and Updates in Mercury 2

What's New in Mercury 2?

Mercury 2 comes in two versions: Mercury Coder 2 and Mercury Nova.

Mercury Coder 2 is a specialized model for writing and editing code. According to Inception Labs, it achieves results on par with the best models in its class on standard programming benchmarks – while operating noticeably faster than its competitors. We're talking about generation speeds of around 1,000 tokens per second or more, which is roughly 5 to 10 times faster than autoregressive models of comparable quality.

For developers, this isn't just an abstract number. When a model generates code quickly, tools built on it – such as autocompletion, refactoring, and code explanation – start to feel truly responsive, rather than like waiting at a loading screen.

Mercury Nova is a versatile, general-purpose model. It is designed for a broader range of tasks: working with text, answering questions, and assisting with writing and editing materials. According to its stated performance metrics, Mercury Nova competes with models on the level of GPT-4o mini and Gemini Flash, all while retaining the speed advantage of the diffusion approach.

Benefits of High Generation Speed in LLMs

Why Speed Is More Than Just a Convenience

One might think that generation speed is a nice bonus but not a crucial feature. In reality, that's not the case.

First, fast models enable a new class of applications. For example, systems that operate in real time: live subtitles, interactive training simulators, and dynamic suggestions while typing. In situations where a delay of just a few seconds ruins the user experience, high speed becomes a prerequisite for functionality, not just a matter of comfort.

Second, speed directly impacts cost. The faster a model processes requests, the fewer computational resources are needed to serve the same number of users. This benefits both product developers and end-users.

Third, for tasks like code writing or autocompletion, speed is literally part of the functionality. If a suggestion appears three seconds after you've finished typing, it's already useless.

Future of Diffusion vs Autoregressive Architectures

The Diffusion Approach to Text: Is It Here to Stay?

Diffusion models for images have already proven their worth – they've changed an entire industry. Applying the same principle to text has proven to be much more complex because text is discrete: words don't blur as smoothly as pixels. Inception Labs has worked for several years to make this approach practically applicable.

Mercury 2 is, in essence, a demonstration that diffusion language models have matured to a point where they can be seriously compared to their autoregressive counterparts in terms of quality. Previously, the main argument for such models was speed, while quality remained noticeably lower. Now, that gap has significantly narrowed.

This is important not just for Inception Labs. If the diffusion approach continues to evolve at this pace, developers will have a real alternative to the dominant architecture – and competition in this field, as a rule, benefits everyone.

Availability and What's Next

Both models – Mercury Coder 2 and Mercury Nova – are available via the Inception Labs API. The company has also provided access to demos where you can evaluate the speed and quality of generation for yourself.

For now, Mercury 2 is positioned primarily as a tool for developers and teams integrating language models into their products. But if the speed advantage of the diffusion approach can be maintained with further improvements in quality, the range of applications for such models will only expand.

An open question remains about how well diffusion models handle tasks that require sequential reasoning – where it's important to build a logical chain step by step. The autoregressive approach has a structural advantage here: each subsequent token builds upon all the previous ones. How diffusion models will tackle this class of tasks as they scale is one of the interesting questions that only time and practice will answer.

Original Title: Introducing Mercury 2
Publication Date: Feb 24, 2026
Inception www.inceptionlabs.ai A U.S.-based AI company developing next-generation diffusion large language models (dLLMs) and language technologies designed for high-speed text generation and multimodal tasks.
Previous Article Perplexity Releases Its Own Models for Searching Massive Text Datasets Next Article How Scientists Really Use AI Tools: An Analysis of 250,000 Real Queries

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe