When it comes to AI agents – systems that don't just answer questions but also perform tasks like searching for information, calling external tools, and planning steps – we usually think of powerful cloud infrastructure. Servers somewhere far away, back-and-forth requests, latency, and dependency on a network connection. This has become so common that it seemed almost inevitable.
Liquid AI decided to challenge this notion. The company has released the LFM2-24B-A2B model, which, they claim, can fully function in agent mode – with tool-calling and multi-step task execution – directly on consumer hardware. No cloud, no waiting, no reliance on a third-party server.
What Is «Tool-Calling» and Why Does It Matter
In short, a standard language model responds with text. An agent with tool-calling capabilities can do things: request the weather via an API, perform an internet search, run a script, or query a database. This represents a fundamentally different level of utility.
Simply put, the difference is similar to that between someone giving advice over the phone and someone who physically shows up and does the work with their own hands. The first is useful. The second is far more valuable for specific tasks.
This is precisely why agent mode is one of the most talked-about areas in AI right now. However, most powerful agent models require significant computational resources that are typically only available from cloud providers.
24 Billion Parameters, but Only 2 Billion «Active»
It's worth saying a few words about the architectural solution here, as it explains why the model can fit on a consumer device in the first place.
LFM2-24B-A2B is a so-called sparse model. It has 24 billion parameters in total, but only about 2 billion of them are activated when processing any given request. The rest remain «silent» at that moment.
It's like a large library with thousands of books on the shelves, but to answer a specific question, the librarian only takes the necessary ones – they don't haul everything at once. As a result, the computational load is significantly lower than one might expect from a model of this size.
This is what makes running it on a standard consumer GPU realistic – not just as a demonstration, but as a viable working option.
What the Model Can Do in Practice
Liquid AI tested LFM2-24B-A2B on several standard benchmarks for agent tasks – the kind of test sets where models need to not just answer a question, but execute a chain of actions using tools.
The results proved to be competitive with models that require significantly more resources or operate exclusively in the cloud. The model handles multi-step tasks, correctly calls tools, and maintains context throughout a dialogue.
The speed is worth a separate mention. Local execution without network latency isn't just a convenience; it's a qualitatively different user experience, especially when a task requires several sequential steps, and each one used to be «slowed down» by a cloud request.
Why This Matters for More Than Just Enthusiasts
Running powerful models locally has long been seen as a hobby for those who enjoy tinkering with hardware. But it's gradually turning into something more.
First, privacy. Data processed locally doesn't go anywhere. For corporate users, medical applications, and legal tools, this isn't just a convenience – it's often a requirement.
Second, infrastructure independence. No subscriptions, no request limits, and no risk of the service changing its terms or becoming temporarily unavailable.
Third, latency. Agent tasks often involve dozens of sequential calls to the model. Every millisecond of delay adds up, and with cloud-based solutions, this is noticeable. A local model eliminates this problem almost entirely.
When an agent model with real capabilities can fit on a device that a developer or researcher already has on their desk, the barrier to entry drops sharply. This means more people can build agent systems without needing to pay for cloud computing or gain access to corporate infrastructure.
Open Access and Where to Go from Here
The model is publicly available – it can be found and downloaded via Hugging Face. Liquid AI has also published materials on how to run LFM2-24B-A2B in agent mode, including configuration examples for working with tools.
This isn't a closed product for corporate clients but an open release – which in itself suggests that the company is betting on the developer community and wants the model to be tested, used, and built upon.
Still, open questions remain. How stably will the model perform in complex agent scenarios with non-standard tools? How will it handle long chains of reasoning? These things are always better tested in real-world conditions, not just on benchmarks.
But the very fact that an agent model of this caliber is now available for local execution marks a shift in the baseline. It's not a revolution, but it is a significant change in what has become possible without the cloud.