Published on April 8, 2026

Google Gemma 4: Open AI Models for Smartphones and Devices

Google Releases Gemma 4: Open AI Models That Run Directly on Your Smartphone

Google has introduced the fourth generation of its open-source Gemma models, ranging from compact versions for smartphones to powerful solutions for demanding computational tasks.

Products 5 – 7 minutes min read

Event Source: PyTorch 5 – 7 minutes min read

If you've been following the open-source AI model market, the landscape over the past few years has been pretty consistent: major companies release powerful models, but these models require expensive hardware and only run in the cloud. With its new Gemma 4 lineup, Google is attempting to shift this balance – and judging by the initial results, they're succeeding.

Gemma 4 Open-Source Language Model Release

What Exactly Happened?

On April 2, Google DeepMind unveiled Gemma 4, the fourth generation of its open-source language model series. This isn't just one model, but a family of four variants designed for different tasks and devices. All are released under the Apache 2.0 license, meaning they can be freely used in commercial projects with minimal restrictions.

Gemma 4 is built on the same research and technology as Gemini 3, Google's flagship closed-source model. Simply put, the open-source version has incorporated the advancements of its proprietary counterpart.

Gemma 4 Models and Their Applications

Four Sizes for Different Tasks

The family is divided into four models:

E2B – The most compact, with about 2.3 billion active parameters. It runs on a smartphone or a single-board computer and supports audio input.
E4B – Slightly larger, with about 4.5 billion active parameters. Also designed to run on-device, including on Android phones.
26B MoE – A model with a “Mixture of Experts” architecture: with 26 billion total parameters, it only uses about 4 billion during operation. This helps conserve computational resources without a significant loss in quality.
31B Dense – The family's flagship, with 31 billion parameters, all active simultaneously. It ranks third among open models on the international Arena AI Text leaderboard.

Running the two larger models requires a powerful GPU, such as an Nvidia H100. The compact E2B and E4B models were developed in partnership with Qualcomm and MediaTek and are specifically optimized for mobile processors, allowing them to use memory and power efficiently.

Gemma 4 Capabilities: Text, Audio, Images, and Video

Not Just Text: Audio, Images, and Video

All four models can work not only with text but also with images and video. The compact E2B and E4B models also support audio input, which opens up the possibility of on-device speech recognition without sending data to a server.

An important technical detail here is that the models can process images with variable aspect ratios and flexibly adjust how much “attention” to devote to an image. This allows them to strike a balance between speed and quality depending on the task – for instance, quickly processing a low-resolution image or meticulously analyzing a detailed one.

Practical Uses of Gemma 4 AI Models

What Is This Actually Useful For?

Gemma 4 was designed from the ground up for agentic scenarios – situations where the AI doesn't just answer a question but independently executes a sequence of actions, such as calling tools, retrieving data, and making decisions. To facilitate this, the models natively support structured output and external function calls.

In short, this is more than just a chatbot. It's a foundation for building autonomous assistants that can, for example, independently gather information from various sources and present a formatted result – all without requiring constant human intervention at every step.

Additionally, the models show significant progress in mathematical reasoning and precise instruction following. They support over 140 languages, and the context window is up to 128,000 tokens for the compact versions and up to 256,000 for the larger ones. For context, 128,000 tokens is equivalent to the text of several average-length novels.

Benefits of On-Device AI Model Processing

Why “On-Device” Matters

Most powerful AI models operate in the cloud: a request is sent to a server, processed, and a response is returned. While convenient, this creates a dependency on an internet connection, adds latency, and raises privacy concerns, as data leaves the user's device.

Models that run locally – directly on a smartphone or laptop – are free from these issues. They work offline, respond quickly, and don't transmit any data externally. This is precisely why the compact Gemma 4 variants appeal not only to enthusiasts but also to corporate developers who require control over their data.

Even the larger models in the family, for all their power, can fit on a single GPU. This also favorably distinguishes them from some competitors that require entire processing clusters.

Gemma AI Ecosystem and Community Impact

Context: The Ecosystem Is Already Huge

Since the release of the first-generation Gemma, developers have downloaded the models in the family over 400 million times and created more than 100,000 custom modifications based on them. This indicates that Gemma is not just a technological showcase but a tool actively used by a large community.

According to researchers at Google DeepMind, the team deliberately focused on maximizing “intelligence per parameter” – in other words, achieving the smartest possible model at the minimum size. Judging by its position on independent leaderboards, they succeeded: the flagship 31B model competes with models up to 20 times its size.

Architecturally, Gemma 4 is intentionally designed to be compatible with the broadest possible range of platforms and tools, which simplifies integration and lowers the barrier to entry for developers. The models also quantize well – a “compression” process that allows them to run on even more modest hardware with minimal loss of quality.

Overall, Gemma 4 is Google's attempt to provide developers with a serious tool that requires neither expensive infrastructure nor gated access. Whether they have succeeded will become clear over time, but the initial signs are compelling.

#event #technical context #neural networks #ai development #infrastructure #products #open language models #in-device ai

Link to Original: https://pytorch.org/blog/__trashed/

Original Title: Generating State-of-the-Art GEMMs with TorchInductor's CuteDSL backend

Publication Date: Apr 7, 2026

PyTorch pytorch.org An international open-source deep learning framework and community widely used for research and development in artificial intelligence and machine learning.

Previous Article Show It Once and You're Done: How a New Approach to Automation Learns from Humans the First Time Next Article Illustrious XL 3.5: When an Image Generator Starts Understanding Language Like a Language Model

Google Gemma 4: Open AI Models for Smartphones and Devices

Gemma 4 Open-Source Language Model Release

Gemma 4 Models and Their Applications

Gemma 4 Capabilities: Text, Audio, Images, and Video

Practical Uses of Gemma 4 AI Models

Benefits of On-Device AI Model Processing

Gemma AI Ecosystem and Community Impact

Related Publications

Gemma 4: Google DeepMind's Multimodal AI That Runs Directly On-Device

Google's Gemma 4: What Will It Change for On-Device AI?

Gemma 4 on AMD: Day-and-Date Support on Release

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration