Published on April 7, 2026

Google Gemma 4 What It Means for On-Device AI

Google's Gemma 4: What Will It Change for On-Device AI?

Google has released Gemma 4 – a family of four open models that run directly on-device, from smartphones to laptops, without relying on the cloud.

Products 4 – 6 minutes min read
Event Source: PyTorch 4 – 6 minutes min read

Most discussions about AI focus on cloud services: a model resides in a data center, you send a request, and you get a response. However, another process has been developing in parallel for several years: the effort to run neural networks directly on a phone, laptop, or small computer, without an internet connection or third-party servers. Google has taken a significant step in this direction by releasing the Gemma 4 family of models.

What's Included in the Release

What's in the Release

Gemma 4 isn't just one model, but a family of four different variants for various tasks and devices. Two of them, E2B and E4B, are specifically designed for smartphones: they are compact enough to run autonomously, without a network connection. The other two – larger models with 26 and 31 billion parameters – are geared toward PCs and laptops but can also operate locally, without the cloud.

In short: for the first time, the Gemma lineup includes models that can actually fit on a standard phone and can do more than just answer text-based questions.

Gemma 4 Model Capabilities

What These Models Can Do

All four Gemma 4 variants are multimodal – they understand not only text but also images and video. The compact versions (E2B and E4B) go even further: they also process audio. Simply put, such a model can listen, watch, and read – all directly on the device, without sending data anywhere.

This opens up some very specific use cases: offline speech recognition, photo analysis without uploading to the cloud, and an assistant that works even without an internet connection. For those who value data privacy or simply lack a stable connection, this is a significant advantage.

It's also worth noting that Gemma 4 was designed from the ground up for so-called “agentic scenarios.” This is when the model doesn't just answer a question but performs a sequence of actions – for example, finding information, processing it, and creating a structured result. To achieve this, the model has native support for calling external functions and outputting data in a structured format.

Model Size and Architecture

Size Matters – But Not Always the One in the Name

One of the interesting aspects of Gemma 4 is the structure of its 26-billion-parameter model. It uses what's known as a “Mixture of Experts” (MoE) architecture. It sounds complicated, but the idea is simple: the model is large, but only a small portion of it – about 4 billion of the 26 billion parameters – is activated for each request. It's like having a team of 26 specialists, but for any given task, only the four who are needed at that moment step up.

Thanks to this, the model runs faster and requires fewer resources than one might expect from its full size.

The larger 31B model is structured differently – all its parameters are active simultaneously – but it achieves higher scores on independent benchmarks. According to the Arena AI Text leaderboard, it ranked third among open models, trailing only larger competitors.

Significance of the Gemma 4 Release

Why This Is More Than Just Another Release

Gemma is an open family of models: the weights are published under the Apache 2.0 license, which allows for virtually unrestricted commercial use. This is important because most powerful models at this level are either closed-source or have restrictions on their use in products.

The compact E2B and E4B versions were developed in collaboration with Qualcomm and MediaTek – the manufacturers of processors found in most modern Android smartphones. This means the models are optimized for real-world hardware, not just theoretically capable of fitting into the required memory space.

Since the release of the first Gemma generation, models in this family have been downloaded over 400 million times, and the community has created over 100,000 modifications based on them. Gemma 4 is a response to this accumulated experience: addressing what worked, what was missing, and which use cases proved to be in demand.

Limitations and Unanswered Questions

What Remains Behind the Scenes

Despite the appeal of the “AI right on your phone” idea, it's worth keeping a few things in mind.

First, compact models are always a compromise. E2B and E4B are great for basic tasks, but you shouldn't expect the same level of reasoning from them as from the 31B version. Google itself admits that on certain benchmarks, the smallest model underperforms the previous 27-billion-parameter Gemma 3.

Second, the technical documentation was not fully published at the time of release. This means that independent verification of the models' capabilities is a matter for the near future, not an established fact.

Third, the on-device AI market itself is still taking shape. There are competing solutions – such as Qwen 3, which the larger Gemma 4 models are compared against – and it's too early to say that one approach has definitively won out over another.

Nevertheless, the direction is clear: powerful language models are becoming smaller, cheaper to run, and closer to the end device. Gemma 4 is one of the most compelling arguments that this path is now very much a reality.

Original Title: ExecuTorch Becomes a Part of PyTorch Core to Expand On-Device Inference Capabilities
Publication Date: Apr 7, 2026
PyTorch pytorch.org An international open-source deep learning framework and community widely used for research and development in artificial intelligence and machine learning.
Previous Article Alibaba's Wan2.7-Video: One Prompt – and You're the Director Next Article Higress: An AI Traffic Gateway to Replace the Legacy Nginx Ingress

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Alibaba has introduced Qwen3.5, the first model in the Qwen3 family, adept at processing text, images, and audio natively, without needing additional adapters.

Alibaba Cloudwww.alibabacloud.com Feb 17, 2026

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Don’t miss a single experiment!

Subscribe to our Telegram channel —
we regularly post announcements of new books, articles, and interviews.

Subscribe