Liquid AI, a company known for its unconventional approach to creating language models, has released a new development: LFM2-24B. It's the largest model in the LFM2 series to date, and what makes it interesting isn't just its size, but how it manages to compete with larger models while remaining noticeably more resource-efficient.
What is LFM2 architecture
What is LFM2?
Most modern language models are based on the Transformer architecture – it's been the industry standard for the past few years. Liquid AI is taking a different path. Their LFM2 series models are based on a hybrid architecture that combines several different approaches to information processing. Simply put, instead of a single mechanism, the model uses several, allowing it to better handle long texts and put less strain on memory.
Previously, the series included compact models with 1.3 and 3.4 billion parameters. LFM2-24B is a significant step forward: 24 billion parameters with an important caveat. The model belongs to the mixture of experts (MoE) class: it contains 24 billion parameters in total, but only activates about 2 billion of them during operation. This is where the A2B designation in the name comes from – «active 2 billion».
This isn't a marketing gimmick, but a fully functional principle: different parts of the model specialize in different tasks, and only the necessary part is utilized at any given moment. The result is fewer computations with comparable or even better quality.
LFM2-24B performance in practice
How does it perform in practice?
Liquid AI compared LFM2-24B with a number of other popular medium and large-sized models. And this is where things get interesting.
According to standard benchmark results, the model performs on par with or surpasses significantly larger models – notably, Gemma 3 27B and Mistral Small 3.1. All this while actively using only about 2 billion parameters, making it far less demanding on hardware.
To be more specific, LFM2-24B performs well in:
- reasoning and logical tasks;
- mathematics;
- coding;
- long texts – the model's context window is 32,000 tokens, which is roughly equivalent to a small book.
The generation speed is also worth noting. Because it has few active parameters, the model runs faster during inference – that is, when it's already trained and simply responding to requests. This is crucial for real-world applications where response speed matters.
Memory efficiency in LFM2-24B
Memory: The Key Advantage
One of the main challenges when working with language models on long texts is the so-called KV cache. To put it very simply: to «remember» the context of a conversation, the model needs to store intermediate data, and the longer the text, the more memory this requires. With standard Transformers, this volume grows linearly with the context length and quickly becomes a bottleneck.
The LFM2-24B architecture is designed differently. According to Liquid AI, the model consumes 28 times less cache memory compared to similarly sized Transformer-based models. This isn't a minor improvement – it's a fundamentally different scale of consumption.
In practice, this means the model can run on much more modest hardware than its counterparts would require. Alternatively – on the same hardware – it can process many more requests simultaneously. For companies building products on top of language models, this has a direct impact on operational costs.
Who can benefit from LFM2-24B
Who Needs This and Why?
If you're a developer or researcher looking for a model to embed in a product or run locally, LFM2-24B is an interesting option. This is especially true in cases where it's important to work with long documents or ensure high throughput without a huge budget for graphics processing units (GPUs).
The model is available for download on Hugging Face and is released under a license that permits commercial use with certain conditions, which should be reviewed before using it in a specific project.
Liquid AI also provides access through its own API for those who prefer not to deploy the model locally.
LFM2 architecture scaling possibilities
This Is Just the Beginning of Scaling
Notably, LFM2-24B is not just a new model, but a test of a hypothesis. Liquid AI wanted to confirm that their architecture retains its advantages as it scales up. Judging by the results, the scaling works: the model doesn't lose efficiency as the parameter count grows, and in some ways, it even gains an edge.
This is important in the context of the broader discussion about the efficiency of AI models. The industry has long been searching for ways to get «more from less», and approaches like the one used by Liquid AI are becoming increasingly relevant as the cost of computation continues to rise.
An open question remains as to how well the model will handle more complex, multi-step tasks – those that require not just answering a question, but building a chain of reasoning or acting as an agent. This is an area where architectural differences can become more pronounced, and LFM2-24B has less public data here so far.
But as a step toward more efficient yet powerful models, it's a strong argument that the Transformer architecture is not the only path forward. 🙂