Published on March 6, 2026

A Powerful AI Agent Without the Cloud: How LFM2-24B-A2B Runs Directly on Your Computer

Liquid AI has introduced the LFM2-24B-A2B model, capable of running AI agents with tool-calling capabilities directly on consumer hardware – without the cloud or latency.

Products 4 – 6 minutes min read

Event Source: Liquid 4 – 6 minutes min read

When it comes to AI agents – systems that don't just answer questions but also perform tasks like searching for information, calling external tools, and planning steps – we usually think of powerful cloud infrastructure. Servers somewhere far away, back-and-forth requests, latency, and dependency on a network connection. This has become so common that it seemed almost inevitable.

Liquid AI decided to challenge this notion. The company has released the LFM2-24B-A2B model, which, they claim, can fully function in agent mode – with tool-calling and multi-step task execution – directly on consumer hardware. No cloud, no waiting, no reliance on a third-party server.

Understanding Tool Calling in AI Agents and Its Importance

What Is «Tool-Calling» and Why Does It Matter

In short, a standard language model responds with text. An agent with tool-calling capabilities can do things: request the weather via an API, perform an internet search, run a script, or query a database. This represents a fundamentally different level of utility.

Simply put, the difference is similar to that between someone giving advice over the phone and someone who physically shows up and does the work with their own hands. The first is useful. The second is far more valuable for specific tasks.

This is precisely why agent mode is one of the most talked-about areas in AI right now. However, most powerful agent models require significant computational resources that are typically only available from cloud providers.

Sparse Architecture of the LFM2-24B-A2B Model Explained

24 Billion Parameters, but Only 2 Billion «Active»

It's worth saying a few words about the architectural solution here, as it explains why the model can fit on a consumer device in the first place.

LFM2-24B-A2B is a so-called sparse model. It has 24 billion parameters in total, but only about 2 billion of them are activated when processing any given request. The rest remain «silent» at that moment.

It's like a large library with thousands of books on the shelves, but to answer a specific question, the librarian only takes the necessary ones – they don't haul everything at once. As a result, the computational load is significantly lower than one might expect from a model of this size.

This is what makes running it on a standard consumer GPU realistic – not just as a demonstration, but as a viable working option.

LFM2-24B-A2B Performance and Benchmarks in Agent Tasks

What the Model Can Do in Practice

Liquid AI tested LFM2-24B-A2B on several standard benchmarks for agent tasks – the kind of test sets where models need to not just answer a question, but execute a chain of actions using tools.

The results proved to be competitive with models that require significantly more resources or operate exclusively in the cloud. The model handles multi-step tasks, correctly calls tools, and maintains context throughout a dialogue.

The speed is worth a separate mention. Local execution without network latency isn't just a convenience; it's a qualitatively different user experience, especially when a task requires several sequential steps, and each one used to be «slowed down» by a cloud request.

Key Benefits of Running AI Agent Models Locally

Why This Matters for More Than Just Enthusiasts

Running powerful models locally has long been seen as a hobby for those who enjoy tinkering with hardware. But it's gradually turning into something more.

First, privacy. Data processed locally doesn't go anywhere. For corporate users, medical applications, and legal tools, this isn't just a convenience – it's often a requirement.

Second, infrastructure independence. No subscriptions, no request limits, and no risk of the service changing its terms or becoming temporarily unavailable.

Third, latency. Agent tasks often involve dozens of sequential calls to the model. Every millisecond of delay adds up, and with cloud-based solutions, this is noticeable. A local model eliminates this problem almost entirely.

When an agent model with real capabilities can fit on a device that a developer or researcher already has on their desk, the barrier to entry drops sharply. This means more people can build agent systems without needing to pay for cloud computing or gain access to corporate infrastructure.

How to Access and Run LFM2-24B-A2B via Hugging Face

Open Access and Where to Go from Here

The model is publicly available – it can be found and downloaded via Hugging Face. Liquid AI has also published materials on how to run LFM2-24B-A2B in agent mode, including configuration examples for working with tools.

This isn't a closed product for corporate clients but an open release – which in itself suggests that the company is betting on the developer community and wants the model to be tested, used, and built upon.

Still, open questions remain. How stably will the model perform in complex agent scenarios with non-standard tools? How will it handle long chains of reasoning? These things are always better tested in real-world conditions, not just on benchmarks.

But the very fact that an agent model of this caliber is now available for local execution marks a shift in the baseline. It's not a revolution, but it is a significant change in what has become possible without the cloud.

#event #applied analysis #ai development #engineering #infrastructure #human–machine interaction #in-device ai #agent benchmarking

Link to Original: https://www.liquid.ai/blog/no-cloud-tool-calling-agents-consumer-hardware-lfm2-24b-a2b

Original Title: No Cloud, No Waiting: Tool-Calling Agents on Consumer Hardware with LFM2-24B-A2B

Publication Date: Mar 5, 2026

Liquid www.liquid.ai A U.S.-based AI company researching alternative neural architectures and adaptive models.

Previous Article OLMo Hybrid: Transformers and Recurrent Networks Join Forces Next Article Open, Hardware-Agnostic AI: Why It's Needed and Who's Working on It

A Powerful AI Agent Without the Cloud: How LFM2-24B-A2B Runs Directly on Your Computer

Understanding Tool Calling in AI Agents and Its Importance

Sparse Architecture of the LFM2-24B-A2B Model Explained

LFM2-24B-A2B Performance and Benchmarks in Agent Tasks

Key Benefits of Running AI Agent Models Locally

How to Access and Run LFM2-24B-A2B via Hugging Face

Related Publications

AMD Releases Ryzen AI Software 1.7 – What's New in the Local AI Platform?

Hummingbird-XT: How AMD Enables Video Generation on Consumer Graphics Cards

Waypoint-1: Interactive Real-Time Video on Your Computer

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration