Published February 24, 2026

Liquid AI LFM2-24B: large language model with small memory footprint

Liquid AI Releases LFM2-24B: A Large Model with a Small Memory Footprint

Liquid AI has introduced LFM2-24B, a 24-billion-parameter language model capable of outperforming larger competitors with significantly lower memory requirements.

Products
Event Source: Liquid Reading Time: 4 – 6 minutes

Liquid AI, a company known for its unconventional approach to creating language models, has released a new development: LFM2-24B. It's the largest model in the LFM2 series to date, and what makes it interesting isn't just its size, but how it manages to compete with larger models while remaining noticeably more resource-efficient.

What is LFM2 architecture

What is LFM2?

Most modern language models are based on the Transformer architecture – it's been the industry standard for the past few years. Liquid AI is taking a different path. Their LFM2 series models are based on a hybrid architecture that combines several different approaches to information processing. Simply put, instead of a single mechanism, the model uses several, allowing it to better handle long texts and put less strain on memory.

Previously, the series included compact models with 1.3 and 3.4 billion parameters. LFM2-24B is a significant step forward: 24 billion parameters with an important caveat. The model belongs to the mixture of experts (MoE) class: it contains 24 billion parameters in total, but only activates about 2 billion of them during operation. This is where the A2B designation in the name comes from – «active 2 billion».

This isn't a marketing gimmick, but a fully functional principle: different parts of the model specialize in different tasks, and only the necessary part is utilized at any given moment. The result is fewer computations with comparable or even better quality.

LFM2-24B performance in practice

How does it perform in practice?

Liquid AI compared LFM2-24B with a number of other popular medium and large-sized models. And this is where things get interesting.

According to standard benchmark results, the model performs on par with or surpasses significantly larger models – notably, Gemma 3 27B and Mistral Small 3.1. All this while actively using only about 2 billion parameters, making it far less demanding on hardware.

To be more specific, LFM2-24B performs well in:

  • reasoning and logical tasks;
  • mathematics;
  • coding;
  • long texts – the model's context window is 32,000 tokens, which is roughly equivalent to a small book.

The generation speed is also worth noting. Because it has few active parameters, the model runs faster during inference – that is, when it's already trained and simply responding to requests. This is crucial for real-world applications where response speed matters.

Memory efficiency in LFM2-24B

Memory: The Key Advantage

One of the main challenges when working with language models on long texts is the so-called KV cache. To put it very simply: to «remember» the context of a conversation, the model needs to store intermediate data, and the longer the text, the more memory this requires. With standard Transformers, this volume grows linearly with the context length and quickly becomes a bottleneck.

The LFM2-24B architecture is designed differently. According to Liquid AI, the model consumes 28 times less cache memory compared to similarly sized Transformer-based models. This isn't a minor improvement – it's a fundamentally different scale of consumption.

In practice, this means the model can run on much more modest hardware than its counterparts would require. Alternatively – on the same hardware – it can process many more requests simultaneously. For companies building products on top of language models, this has a direct impact on operational costs.

Who can benefit from LFM2-24B

Who Needs This and Why?

If you're a developer or researcher looking for a model to embed in a product or run locally, LFM2-24B is an interesting option. This is especially true in cases where it's important to work with long documents or ensure high throughput without a huge budget for graphics processing units (GPUs).

The model is available for download on Hugging Face and is released under a license that permits commercial use with certain conditions, which should be reviewed before using it in a specific project.

Liquid AI also provides access through its own API for those who prefer not to deploy the model locally.

LFM2 architecture scaling possibilities

This Is Just the Beginning of Scaling

Notably, LFM2-24B is not just a new model, but a test of a hypothesis. Liquid AI wanted to confirm that their architecture retains its advantages as it scales up. Judging by the results, the scaling works: the model doesn't lose efficiency as the parameter count grows, and in some ways, it even gains an edge.

This is important in the context of the broader discussion about the efficiency of AI models. The industry has long been searching for ways to get «more from less», and approaches like the one used by Liquid AI are becoming increasingly relevant as the cost of computation continues to rise.

An open question remains as to how well the model will handle more complex, multi-step tasks – those that require not just answering a question, but building a chain of reasoning or acting as an agent. This is an area where architectural differences can become more pronounced, and LFM2-24B has less public data here so far.

But as a step toward more efficient yet powerful models, it's a strong argument that the Transformer architecture is not the only path forward. 🙂

Original Title: LFM2-24B-A2B: Scaling Up the LFM2 Architecture
Publication Date: Feb 24, 2026
Liquid www.liquid.ai A U.S.-based AI company researching alternative neural architectures and adaptive models.
Previous Article How AliSQL Stores Vectors and Performs Similarity Searches: An Inside Look at Its Internal Mechanics Next Article Alibaba Open-Sources Its Robot Control Model

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

AI: Events

How to Scale vLLM and Avoid Out-of-Memory Errors

Technical context Infrastructure

The AI21 Labs team shared their experience optimizing vLLM – a popular tool for deploying language models that often faces critical errors due to RAM shortages when scaling.

AI21 Labswww.ai21.com Feb 6, 2026

AI: Events

Perplexity Shows How to Train Trillion-Parameter Models on AWS

Technical context Infrastructure

The Perplexity team has adapted a framework for training ultra-large neural networks for Amazon's cloud infrastructure. This allowed them to eliminate the rigid dependency on proprietary NVIDIA hardware and utilize standard networking solutions.

Perplexity AIresearch.perplexity.ai Feb 7, 2026

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe