Published on March 6, 2026

A Powerful AI Agent Without the Cloud: How LFM2-24B-A2B Runs Directly on Your Computer

Liquid AI has introduced the LFM2-24B-A2B model, capable of running AI agents with tool-calling capabilities directly on consumer hardware – without the cloud or latency.

Products 4 – 6 minutes min read
Event Source: Liquid 4 – 6 minutes min read

When it comes to AI agents – systems that don't just answer questions but also perform tasks like searching for information, calling external tools, and planning steps – we usually think of powerful cloud infrastructure. Servers somewhere far away, back-and-forth requests, latency, and dependency on a network connection. This has become so common that it seemed almost inevitable.

Liquid AI decided to challenge this notion. The company has released the LFM2-24B-A2B model, which, they claim, can fully function in agent mode – with tool-calling and multi-step task execution – directly on consumer hardware. No cloud, no waiting, no reliance on a third-party server.

Understanding Tool Calling in AI Agents and Its Importance

What Is «Tool-Calling» and Why Does It Matter

In short, a standard language model responds with text. An agent with tool-calling capabilities can do things: request the weather via an API, perform an internet search, run a script, or query a database. This represents a fundamentally different level of utility.

Simply put, the difference is similar to that between someone giving advice over the phone and someone who physically shows up and does the work with their own hands. The first is useful. The second is far more valuable for specific tasks.

This is precisely why agent mode is one of the most talked-about areas in AI right now. However, most powerful agent models require significant computational resources that are typically only available from cloud providers.

Sparse Architecture of the LFM2-24B-A2B Model Explained

24 Billion Parameters, but Only 2 Billion «Active»

It's worth saying a few words about the architectural solution here, as it explains why the model can fit on a consumer device in the first place.

LFM2-24B-A2B is a so-called sparse model. It has 24 billion parameters in total, but only about 2 billion of them are activated when processing any given request. The rest remain «silent» at that moment.

It's like a large library with thousands of books on the shelves, but to answer a specific question, the librarian only takes the necessary ones – they don't haul everything at once. As a result, the computational load is significantly lower than one might expect from a model of this size.

This is what makes running it on a standard consumer GPU realistic – not just as a demonstration, but as a viable working option.

LFM2-24B-A2B Performance and Benchmarks in Agent Tasks

What the Model Can Do in Practice

Liquid AI tested LFM2-24B-A2B on several standard benchmarks for agent tasks – the kind of test sets where models need to not just answer a question, but execute a chain of actions using tools.

The results proved to be competitive with models that require significantly more resources or operate exclusively in the cloud. The model handles multi-step tasks, correctly calls tools, and maintains context throughout a dialogue.

The speed is worth a separate mention. Local execution without network latency isn't just a convenience; it's a qualitatively different user experience, especially when a task requires several sequential steps, and each one used to be «slowed down» by a cloud request.

Key Benefits of Running AI Agent Models Locally

Why This Matters for More Than Just Enthusiasts

Running powerful models locally has long been seen as a hobby for those who enjoy tinkering with hardware. But it's gradually turning into something more.

First, privacy. Data processed locally doesn't go anywhere. For corporate users, medical applications, and legal tools, this isn't just a convenience – it's often a requirement.

Second, infrastructure independence. No subscriptions, no request limits, and no risk of the service changing its terms or becoming temporarily unavailable.

Third, latency. Agent tasks often involve dozens of sequential calls to the model. Every millisecond of delay adds up, and with cloud-based solutions, this is noticeable. A local model eliminates this problem almost entirely.

When an agent model with real capabilities can fit on a device that a developer or researcher already has on their desk, the barrier to entry drops sharply. This means more people can build agent systems without needing to pay for cloud computing or gain access to corporate infrastructure.

How to Access and Run LFM2-24B-A2B via Hugging Face

Open Access and Where to Go from Here

The model is publicly available – it can be found and downloaded via Hugging Face. Liquid AI has also published materials on how to run LFM2-24B-A2B in agent mode, including configuration examples for working with tools.

This isn't a closed product for corporate clients but an open release – which in itself suggests that the company is betting on the developer community and wants the model to be tested, used, and built upon.

Still, open questions remain. How stably will the model perform in complex agent scenarios with non-standard tools? How will it handle long chains of reasoning? These things are always better tested in real-world conditions, not just on benchmarks.

But the very fact that an agent model of this caliber is now available for local execution marks a shift in the baseline. It's not a revolution, but it is a significant change in what has become possible without the cloud.

Original Title: No Cloud, No Waiting: Tool-Calling Agents on Consumer Hardware with LFM2-24B-A2B
Publication Date: Mar 5, 2026
Liquid www.liquid.ai A U.S.-based AI company researching alternative neural architectures and adaptive models.
Previous Article OLMo Hybrid: Transformers and Recurrent Networks Join Forces Next Article Open, Hardware-Agnostic AI: Why It's Needed and Who's Working on It

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe