While most advanced AI models require powerful servers and cloud connections, Google has moved in a different direction. The Gemma 4 family consists of four open-source models that can be run locally – from a smartphone to a standard personal computer with a single graphics card.
Where Did This All Come From?
Gemma is a line of open-source models from Google that has been around for several generations. Gemma 4 is built on the technologies underlying Gemini 3, the company's proprietary flagship model. Simply put, part of what was previously only available through Google's paid services can now be downloaded and run on your own.
Since the first generation of Gemma, developers have downloaded the family's models more than 400 million times, and the community has created over 100,000 derivative versions. This is a significant signal that open-source models are genuinely being used – not just for experimentation, but in real-world projects.
Four Models, Four Use Cases
The Gemma 4 family includes models of different sizes, and it's not just a simple gradation of “weaker to stronger.” Each version is designed for a specific class of tasks and hardware.
E2B and E4B are the most compact. Developed in partnership with Qualcomm and MediaTek, they are optimized to run directly on mobile devices: Android smartphones, single-board computers like the Raspberry Pi, and similar hardware. They operate completely offline – without an internet connection and without sending data to servers. Both support not only images and text but also audio input, meaning they can perform speech recognition right on the device.
26B MoE is a model with a “Mixture of Experts” architecture. In short, although the model contains 26 billion parameters, it only activates about 4 billion of them at any given time. This allows it to run faster and more efficiently than one might expect for its size. For the user, this means a lower hardware load for comparable quality.
31B Dense is the largest model in the family, where all parameters are active simultaneously. This is the model that ranked third among open-source models in the international Arena AI leaderboard. According to Google, it competes with systems that are 20 times its size.
What All Four Can Do
All models in the family are multimodal: they accept not only text but also images and videos with variable resolution as input. The compact versions also support audio. This means you can ask the model to describe an image, transcribe a video clip, or recognize speech – all locally, without the cloud.
The context window – the amount of information the model can hold in its “working memory” during a single session – is up to 128,000 tokens for the compact versions and up to 256,000 for the larger ones. For comparison, 128,000 tokens is equivalent to several hundred pages of text.
All four models support over 140 languages, taking cultural context into account. Code generation, complex logical tasks, and multi-step reasoning are all presented as core capabilities, not add-ons.
Agent Mode Is No Longer an Experiment
It's worth noting that Gemma 4 was designed from the ground up for so-called agentic scenarios. This is when the model doesn't just answer questions but independently plans a sequence of actions, interacts with external tools, and performs multi-step tasks.
Unlike previous generations, Gemma 4 has built-in support for function calling and structured data output. Simply put, the model can “communicate” with other programs and services according to clearly defined rules – a fundamental requirement for building autonomous AI agents.
Open License – and That's Important
Previous Gemma generations were distributed under Google's own license, which came with several restrictions. Gemma 4 is being released under the Apache 2.0 license – one of the most permissive in the world of open-source software. This means the models can be used in commercial products, modified, and distributed with virtually no limitations.
For businesses, this is primarily a matter of control: data doesn't leave the company's infrastructure, there's no dependency on external APIs, and there are no subscription fees. For hobbyist developers, it's simply an opportunity to get their hands dirty and experiment without legal hurdles.
Why This Matters Beyond the Professional Sphere
Running a powerful model directly on a smartphone – without sending requests anywhere – isn't just about speed and privacy. It's about AI ceasing to be an exclusively cloud-based service. Scenarios that previously required a subscription to an expensive service can now run locally and for free.
How practical this is for everyday use is another question. The compact models are great, but the larger versions still require decent hardware. Nevertheless, the very fact that a model with hundreds of millions of parameters can fit on a single graphics card and work without an internet connection is a significant shift indicating where the industry is heading.