Most modern language models operate on a single principle: they generate text word by word, from left to right. This approach is called autoregressive: the model predicts the next token each time, based on everything that came before. It works well, but this method has its limitations: generation speed is bottlenecked by the fact that each step depends on the previous one, making it impossible to perform them in parallel.
Inception Labs has taken a different path. Their Mercury series models use a diffusion approach to text generation – the same one that popularized image generators like Stable Diffusion. In this case, it's applied not to images, but to text. Simply put, the model doesn't write text sequentially but gradually “clarifies” it from a noisy state, much like a photographer developing a picture in a darkroom.
Overview of Mercury Diffusion Models and Their Purpose
What Is Mercury and Why Is It Needed?
The first generation of Mercury already demonstrated that the diffusion approach to text models is viable. The main advantage of such models is speed: they can generate text significantly faster than their classic autoregressive counterparts because they can process multiple parts of the text in parallel.
Mercury 2 is the next step. Inception Labs describes it as a significant leap in quality while maintaining the same high-speed performance. In short: the model has become smarter without sacrificing speed.
Key Features and Updates in Mercury 2
What's New in Mercury 2?
Mercury 2 comes in two versions: Mercury Coder 2 and Mercury Nova.
Mercury Coder 2 is a specialized model for writing and editing code. According to Inception Labs, it achieves results on par with the best models in its class on standard programming benchmarks – while operating noticeably faster than its competitors. We're talking about generation speeds of around 1,000 tokens per second or more, which is roughly 5 to 10 times faster than autoregressive models of comparable quality.
For developers, this isn't just an abstract number. When a model generates code quickly, tools built on it – such as autocompletion, refactoring, and code explanation – start to feel truly responsive, rather than like waiting at a loading screen.
Mercury Nova is a versatile, general-purpose model. It is designed for a broader range of tasks: working with text, answering questions, and assisting with writing and editing materials. According to its stated performance metrics, Mercury Nova competes with models on the level of GPT-4o mini and Gemini Flash, all while retaining the speed advantage of the diffusion approach.
Benefits of High Generation Speed in LLMs
Why Speed Is More Than Just a Convenience
One might think that generation speed is a nice bonus but not a crucial feature. In reality, that's not the case.
First, fast models enable a new class of applications. For example, systems that operate in real time: live subtitles, interactive training simulators, and dynamic suggestions while typing. In situations where a delay of just a few seconds ruins the user experience, high speed becomes a prerequisite for functionality, not just a matter of comfort.
Second, speed directly impacts cost. The faster a model processes requests, the fewer computational resources are needed to serve the same number of users. This benefits both product developers and end-users.
Third, for tasks like code writing or autocompletion, speed is literally part of the functionality. If a suggestion appears three seconds after you've finished typing, it's already useless.
Future of Diffusion vs Autoregressive Architectures
The Diffusion Approach to Text: Is It Here to Stay?
Diffusion models for images have already proven their worth – they've changed an entire industry. Applying the same principle to text has proven to be much more complex because text is discrete: words don't blur as smoothly as pixels. Inception Labs has worked for several years to make this approach practically applicable.
Mercury 2 is, in essence, a demonstration that diffusion language models have matured to a point where they can be seriously compared to their autoregressive counterparts in terms of quality. Previously, the main argument for such models was speed, while quality remained noticeably lower. Now, that gap has significantly narrowed.
This is important not just for Inception Labs. If the diffusion approach continues to evolve at this pace, developers will have a real alternative to the dominant architecture – and competition in this field, as a rule, benefits everyone.
Availability and What's Next
Both models – Mercury Coder 2 and Mercury Nova – are available via the Inception Labs API. The company has also provided access to demos where you can evaluate the speed and quality of generation for yourself.
For now, Mercury 2 is positioned primarily as a tool for developers and teams integrating language models into their products. But if the speed advantage of the diffusion approach can be maintained with further improvements in quality, the range of applications for such models will only expand.
An open question remains about how well diffusion models handle tasks that require sequential reasoning – where it's important to build a logical chain step by step. The autoregressive approach has a structural advantage here: each subsequent token builds upon all the previous ones. How diffusion models will tackle this class of tasks as they scale is one of the interesting questions that only time and practice will answer.