Published on April 7, 2026

Google Gemma 4 What It Means for On-Device AI

Google's Gemma 4: What Will It Change for On-Device AI?

Google has released Gemma 4 – a family of four open models that run directly on-device, from smartphones to laptops, without relying on the cloud.

Products 4 – 6 minutes min read

Event Source: PyTorch 4 – 6 minutes min read

Most discussions about AI focus on cloud services: a model resides in a data center, you send a request, and you get a response. However, another process has been developing in parallel for several years: the effort to run neural networks directly on a phone, laptop, or small computer, without an internet connection or third-party servers. Google has taken a significant step in this direction by releasing the Gemma 4 family of models.

What's Included in the Release

What's in the Release

Gemma 4 isn't just one model, but a family of four different variants for various tasks and devices. Two of them, E2B and E4B, are specifically designed for smartphones: they are compact enough to run autonomously, without a network connection. The other two – larger models with 26 and 31 billion parameters – are geared toward PCs and laptops but can also operate locally, without the cloud.

In short: for the first time, the Gemma lineup includes models that can actually fit on a standard phone and can do more than just answer text-based questions.

Gemma 4 Model Capabilities

What These Models Can Do

All four Gemma 4 variants are multimodal – they understand not only text but also images and video. The compact versions (E2B and E4B) go even further: they also process audio. Simply put, such a model can listen, watch, and read – all directly on the device, without sending data anywhere.

This opens up some very specific use cases: offline speech recognition, photo analysis without uploading to the cloud, and an assistant that works even without an internet connection. For those who value data privacy or simply lack a stable connection, this is a significant advantage.

It's also worth noting that Gemma 4 was designed from the ground up for so-called “agentic scenarios.” This is when the model doesn't just answer a question but performs a sequence of actions – for example, finding information, processing it, and creating a structured result. To achieve this, the model has native support for calling external functions and outputting data in a structured format.

Model Size and Architecture

Size Matters – But Not Always the One in the Name

One of the interesting aspects of Gemma 4 is the structure of its 26-billion-parameter model. It uses what's known as a “Mixture of Experts” (MoE) architecture. It sounds complicated, but the idea is simple: the model is large, but only a small portion of it – about 4 billion of the 26 billion parameters – is activated for each request. It's like having a team of 26 specialists, but for any given task, only the four who are needed at that moment step up.

Thanks to this, the model runs faster and requires fewer resources than one might expect from its full size.

The larger 31B model is structured differently – all its parameters are active simultaneously – but it achieves higher scores on independent benchmarks. According to the Arena AI Text leaderboard, it ranked third among open models, trailing only larger competitors.

Significance of the Gemma 4 Release

Why This Is More Than Just Another Release

Gemma is an open family of models: the weights are published under the Apache 2.0 license, which allows for virtually unrestricted commercial use. This is important because most powerful models at this level are either closed-source or have restrictions on their use in products.

The compact E2B and E4B versions were developed in collaboration with Qualcomm and MediaTek – the manufacturers of processors found in most modern Android smartphones. This means the models are optimized for real-world hardware, not just theoretically capable of fitting into the required memory space.

Since the release of the first Gemma generation, models in this family have been downloaded over 400 million times, and the community has created over 100,000 modifications based on them. Gemma 4 is a response to this accumulated experience: addressing what worked, what was missing, and which use cases proved to be in demand.

Limitations and Unanswered Questions

What Remains Behind the Scenes

Despite the appeal of the “AI right on your phone” idea, it's worth keeping a few things in mind.

First, compact models are always a compromise. E2B and E4B are great for basic tasks, but you shouldn't expect the same level of reasoning from them as from the 31B version. Google itself admits that on certain benchmarks, the smallest model underperforms the previous 27-billion-parameter Gemma 3.

Second, the technical documentation was not fully published at the time of release. This means that independent verification of the models' capabilities is a matter for the near future, not an established fact.

Third, the on-device AI market itself is still taking shape. There are competing solutions – such as Qwen 3, which the larger Gemma 4 models are compared against – and it's too early to say that one approach has definitively won out over another.

Nevertheless, the direction is clear: powerful language models are becoming smaller, cheaper to run, and closer to the end device. Gemma 4 is one of the most compelling arguments that this path is now very much a reality.

#event #technical context #neural networks #ai development #engineering #infrastructure #in-device ai #multimodal models

Link to Original: https://pytorch.org/blog/executorch-becomes-part-of-pytorch-core/

Original Title: ExecuTorch Becomes a Part of PyTorch Core to Expand On-Device Inference Capabilities

Publication Date: Apr 7, 2026

PyTorch pytorch.org An international open-source deep learning framework and community widely used for research and development in artificial intelligence and machine learning.

Previous Article Alibaba's Wan2.7-Video: One Prompt – and You're the Director Next Article Higress: An AI Traffic Gateway to Replace the Legacy Nginx Ingress

Google Gemma 4 What It Means for On-Device AI

What's Included in the Release

Gemma 4 Model Capabilities

Model Size and Architecture

Significance of the Gemma 4 Release

Limitations and Unanswered Questions

Related Publications

Gemma 4: Google DeepMind's Multimodal AI That Runs Directly On-Device

Liquid AI Releases LFM2-24B, Its Largest Language Model – And It Runs on a Regular Laptop

Qwen3.5: The First Natively Multimodal Model

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration