Published on April 9, 2026

Google Gemma 4: Open-Source Models for Local Use on Smart devices

Google Releases Gemma 4: Four Open-Source Models That Fit on a Smartphone

Google has released the Gemma 4 family of models – from compact versions for mobile devices to powerful systems capable of competing with solutions twice their size.

Products 4 – 6 minutes min read
Event Source: Carnegie Mellon University 4 – 6 minutes min read

While most advanced AI models require powerful servers and cloud connections, Google has moved in a different direction. The Gemma 4 family consists of four open-source models that can be run locally – from a smartphone to a standard personal computer with a single graphics card.

Gemma 4 Development History and Background

Where Did This All Come From?

Gemma is a line of open-source models from Google that has been around for several generations. Gemma 4 is built on the technologies underlying Gemini 3, the company's proprietary flagship model. Simply put, part of what was previously only available through Google's paid services can now be downloaded and run on your own.

Since the first generation of Gemma, developers have downloaded the family's models more than 400 million times, and the community has created over 100,000 derivative versions. This is a significant signal that open-source models are genuinely being used – not just for experimentation, but in real-world projects.

Gemma 4 Models and Their Use Cases

Four Models, Four Use Cases

The Gemma 4 family includes models of different sizes, and it's not just a simple gradation of “weaker to stronger.” Each version is designed for a specific class of tasks and hardware.

E2B and E4B are the most compact. Developed in partnership with Qualcomm and MediaTek, they are optimized to run directly on mobile devices: Android smartphones, single-board computers like the Raspberry Pi, and similar hardware. They operate completely offline – without an internet connection and without sending data to servers. Both support not only images and text but also audio input, meaning they can perform speech recognition right on the device.

26B MoE is a model with a “Mixture of Experts” architecture. In short, although the model contains 26 billion parameters, it only activates about 4 billion of them at any given time. This allows it to run faster and more efficiently than one might expect for its size. For the user, this means a lower hardware load for comparable quality.

31B Dense is the largest model in the family, where all parameters are active simultaneously. This is the model that ranked third among open-source models in the international Arena AI leaderboard. According to Google, it competes with systems that are 20 times its size.

Key Capabilities of All Gemma 4 Models

What All Four Can Do

All models in the family are multimodal: they accept not only text but also images and videos with variable resolution as input. The compact versions also support audio. This means you can ask the model to describe an image, transcribe a video clip, or recognize speech – all locally, without the cloud.

The context window – the amount of information the model can hold in its “working memory” during a single session – is up to 128,000 tokens for the compact versions and up to 256,000 for the larger ones. For comparison, 128,000 tokens is equivalent to several hundred pages of text.

All four models support over 140 languages, taking cultural context into account. Code generation, complex logical tasks, and multi-step reasoning are all presented as core capabilities, not add-ons.

Gemma 4 and Advanced Agent Mode Functionality

Agent Mode Is No Longer an Experiment

It's worth noting that Gemma 4 was designed from the ground up for so-called agentic scenarios. This is when the model doesn't just answer questions but independently plans a sequence of actions, interacts with external tools, and performs multi-step tasks.

Unlike previous generations, Gemma 4 has built-in support for function calling and structured data output. Simply put, the model can “communicate” with other programs and services according to clearly defined rules – a fundamental requirement for building autonomous AI agents.

Gemma 4 Open-Source Apache 2.0 License Benefits

Open License – and That's Important

Previous Gemma generations were distributed under Google's own license, which came with several restrictions. Gemma 4 is being released under the Apache 2.0 license – one of the most permissive in the world of open-source software. This means the models can be used in commercial products, modified, and distributed with virtually no limitations.

For businesses, this is primarily a matter of control: data doesn't leave the company's infrastructure, there's no dependency on external APIs, and there are no subscription fees. For hobbyist developers, it's simply an opportunity to get their hands dirty and experiment without legal hurdles.

Impact of Local AI Models Beyond Professional Use

Why This Matters Beyond the Professional Sphere

Running a powerful model directly on a smartphone – without sending requests anywhere – isn't just about speed and privacy. It's about AI ceasing to be an exclusively cloud-based service. Scenarios that previously required a subscription to an expensive service can now run locally and for free.

How practical this is for everyday use is another question. The compact models are great, but the larger versions still require decent hardware. Nevertheless, the very fact that a model with hundreds of millions of parameters can fit on a single graphics card and work without an internet connection is a significant shift indicating where the industry is heading.

Original Title: New NSF Institute at CMU Will Help Mathematicians Harness AI and Advance Discoveries
Publication Date: Apr 8, 2026
Carnegie Mellon University ai.cmu.edu An American research university and one of the world’s leading centers for artificial intelligence, conducting both fundamental and applied research in machine learning, robotics, and computer science.
Previous Article How PyTorch Achieved Faster Normalization and Its Impact on Neural Networks Next Article How ChatGPT Is Taught to Say 'No': Safety, Fairness, and Copyright Within the Model

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe