Published on April 8, 2026

Google Gemma 4: Open AI Models for Smartphones and Devices

Google Releases Gemma 4: Open AI Models That Run Directly on Your Smartphone

Google has introduced the fourth generation of its open-source Gemma models, ranging from compact versions for smartphones to powerful solutions for demanding computational tasks.

Products 5 – 7 minutes min read
Event Source: PyTorch 5 – 7 minutes min read

If you've been following the open-source AI model market, the landscape over the past few years has been pretty consistent: major companies release powerful models, but these models require expensive hardware and only run in the cloud. With its new Gemma 4 lineup, Google is attempting to shift this balance – and judging by the initial results, they're succeeding.

Gemma 4 Open-Source Language Model Release

What Exactly Happened?

On April 2, Google DeepMind unveiled Gemma 4, the fourth generation of its open-source language model series. This isn't just one model, but a family of four variants designed for different tasks and devices. All are released under the Apache 2.0 license, meaning they can be freely used in commercial projects with minimal restrictions.

Gemma 4 is built on the same research and technology as Gemini 3, Google's flagship closed-source model. Simply put, the open-source version has incorporated the advancements of its proprietary counterpart.

Gemma 4 Models and Their Applications

Four Sizes for Different Tasks

The family is divided into four models:

  • E2B – The most compact, with about 2.3 billion active parameters. It runs on a smartphone or a single-board computer and supports audio input.
  • E4B – Slightly larger, with about 4.5 billion active parameters. Also designed to run on-device, including on Android phones.
  • 26B MoE – A model with a “Mixture of Experts” architecture: with 26 billion total parameters, it only uses about 4 billion during operation. This helps conserve computational resources without a significant loss in quality.
  • 31B Dense – The family's flagship, with 31 billion parameters, all active simultaneously. It ranks third among open models on the international Arena AI Text leaderboard.

Running the two larger models requires a powerful GPU, such as an Nvidia H100. The compact E2B and E4B models were developed in partnership with Qualcomm and MediaTek and are specifically optimized for mobile processors, allowing them to use memory and power efficiently.

Gemma 4 Capabilities: Text, Audio, Images, and Video

Not Just Text: Audio, Images, and Video

All four models can work not only with text but also with images and video. The compact E2B and E4B models also support audio input, which opens up the possibility of on-device speech recognition without sending data to a server.

An important technical detail here is that the models can process images with variable aspect ratios and flexibly adjust how much “attention” to devote to an image. This allows them to strike a balance between speed and quality depending on the task – for instance, quickly processing a low-resolution image or meticulously analyzing a detailed one.

Practical Uses of Gemma 4 AI Models

What Is This Actually Useful For?

Gemma 4 was designed from the ground up for agentic scenarios – situations where the AI doesn't just answer a question but independently executes a sequence of actions, such as calling tools, retrieving data, and making decisions. To facilitate this, the models natively support structured output and external function calls.

In short, this is more than just a chatbot. It's a foundation for building autonomous assistants that can, for example, independently gather information from various sources and present a formatted result – all without requiring constant human intervention at every step.

Additionally, the models show significant progress in mathematical reasoning and precise instruction following. They support over 140 languages, and the context window is up to 128,000 tokens for the compact versions and up to 256,000 for the larger ones. For context, 128,000 tokens is equivalent to the text of several average-length novels.

Benefits of On-Device AI Model Processing

Why “On-Device” Matters

Most powerful AI models operate in the cloud: a request is sent to a server, processed, and a response is returned. While convenient, this creates a dependency on an internet connection, adds latency, and raises privacy concerns, as data leaves the user's device.

Models that run locally – directly on a smartphone or laptop – are free from these issues. They work offline, respond quickly, and don't transmit any data externally. This is precisely why the compact Gemma 4 variants appeal not only to enthusiasts but also to corporate developers who require control over their data.

Even the larger models in the family, for all their power, can fit on a single GPU. This also favorably distinguishes them from some competitors that require entire processing clusters.

Gemma AI Ecosystem and Community Impact

Context: The Ecosystem Is Already Huge

Since the release of the first-generation Gemma, developers have downloaded the models in the family over 400 million times and created more than 100,000 custom modifications based on them. This indicates that Gemma is not just a technological showcase but a tool actively used by a large community.

According to researchers at Google DeepMind, the team deliberately focused on maximizing “intelligence per parameter” – in other words, achieving the smartest possible model at the minimum size. Judging by its position on independent leaderboards, they succeeded: the flagship 31B model competes with models up to 20 times its size.

Architecturally, Gemma 4 is intentionally designed to be compatible with the broadest possible range of platforms and tools, which simplifies integration and lowers the barrier to entry for developers. The models also quantize well – a “compression” process that allows them to run on even more modest hardware with minimal loss of quality.

Overall, Gemma 4 is Google's attempt to provide developers with a serious tool that requires neither expensive infrastructure nor gated access. Whether they have succeeded will become clear over time, but the initial signs are compelling.

Original Title: Generating State-of-the-Art GEMMs with TorchInductor's CuteDSL backend
Publication Date: Apr 7, 2026
PyTorch pytorch.org An international open-source deep learning framework and community widely used for research and development in artificial intelligence and machine learning.
Previous Article Show It Once and You're Done: How a New Approach to Automation Learns from Humans the First Time Next Article Illustrious XL 3.5: When an Image Generator Starts Understanding Language Like a Language Model

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

AI: Events

Gemma 4 on AMD: Day-and-Date Support on Release

Technical context Infrastructure

Google has released the Gemma 4 family of open models, and AMD has provided immediate support on release day across its entire hardware spectrum, from data centers to laptops.

AMDwww.amd.com Apr 3, 2026

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!

Subscribe