Published February 10, 2026

Transformers.js v4: A Major Update for Running AI Models in the Browser

Version 4 of Transformers.js has been released. Large language models can now be run directly in the browser, bypassing servers and featuring full WebGPU support.

Development
Event Source: Hugging Face Reading Time: 4 – 6 minutes

A preview version of Transformers.js v4 has landed on npm – a library that allows you to run machine learning models directly in the browser or in Node.js without needing to send any data to a server.

In short: working with neural networks used to require a backend – a server that receives a request, processes it via a model, and returns a response. Transformers.js cuts out the middleman. The model is loaded into the user's browser once and works locally. There is no data transfer lag, no costs for server power, and total privacy.

New Features and LLM Support in Transformers.js v4

What's New in Version 4

The main highlight is support for Large Language Models (LLMs). In previous versions, the library handled tasks like text classification, sentiment analysis, or working with embeddings. However, running something on a larger scale – such as Llama or Qwen – was problematic.

Now, it's a reality. V4 adds support for generative models, including Llama 3.2, Qwen 2.5, Phi-4, SmolLM2, and others. This means you can take a relatively compact version of a language model and run it right in the user's browser for chatbots, text completion, document analysis, or any other tasks.

The second key feature is WebGPU support. This is a new standard for graphics and computing in the browser that allows direct access to the graphics card. While things used to run on the CPU (which is slow) or via WebGL (which is faster but limited), you can now utilize the GPU as efficiently as desktop applications do.

The result: models run significantly faster. For instance, Qwen 2.5 0.5B on a MacBook Air M2 delivers about 50 tokens per second – a speed that's perfectly viable for real-world use.

Benefits of Local AI and Browser Based Machine Learning

Why Is This Needed?

The core idea is to reduce server dependency. Currently, most AI applications follow a «client-server» model: the user enters data, it gets beamed up to the cloud, processed there, and sent back. This requires a constant internet connection, creates latency, and, importantly, involves transferring data to a third party.

Transformers.js flips this logic. The model is downloaded to the browser once (yes, it can take time if the model is large) and then operates entirely offline. User data stays on their device. No API keys needed, no pay-per-request, and no need for a stable internet connection – just download the model once.

This is especially relevant for privacy-sensitive applications: medical services, fintech projects, or internal corporate tools. It is also ideal for tasks requiring instant feedback without network lag – like code completion right in a browser-based editor.

Technical Architecture and WebGPU Integration

Under the Hood

Technically, v4 is built on ONNX Runtime Web – a runtime environment for ONNX models optimized for WebAssembly and WebGPU. The library pulls models from the Hugging Face Hub (where there are thousands), converts them into the required format, and runs them locally.

Quantization is supported – compressing models to reduce their size and speed up performance. For example, a 7-billion parameter model in its original precision weighs dozens of gigabytes, but in quantized form, it's just a few. This is critical for the browser, where every megabyte counts.

Performance Constraints and Browser Compatibility

Limitations and Reality

It's important to understand: this isn't a silver bullet for every use case. Large models still require significant resources. You won't be able to run something like GPT-4 in the browser – the model is just too heavy. We're talking about compact models optimized for running on the user's end device.

Furthermore, WebGPU isn't supported everywhere yet. The technology works in Chrome and Edge and recently arrived in Safari, but Firefox support remains experimental. This means that for part of your audience, you'll either have to fall back to the CPU (which is slower) or restrict access to features.

Finally, loading the model takes time. Even a compact model of several hundred megabytes will take a noticeable amount of time to download. For users with slow internet, this could be a dealbreaker. Although the model is cached after the first download, the initial launch can be lengthy.

Future Prospects for Local Machine Learning Models

What's Next

Version v4 is currently in «preview» status – meaning the API might change, bugs are possible, and the documentation is still being polished. However, the core functionality is already live and can be tested right now via npm.

If this concept gains traction, it could change the way AI applications are developed. Instead of paying for every API request, a developer integrates the library once, and the model runs on the user's side. Instead of sending data to the cloud, it's processed locally. This opens up new possibilities: from offline assistants to tools that don't require registration or subscriptions.

Of course, a lot depends on how stable the models run in real-world conditions and how developer-friendly the tool proves to be. But the mere fact that running a language model in a browser is shifting from a curiosity to a standard practice already says a lot.

Original Title: Transformers.js v4 Preview: Now Available on NPM!
Publication Date: Feb 10, 2026
Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.
Previous Article ElevenLabs Launches In-Browser Audiobook Creation Tool Next Article Microsoft and Cognizant Bring Agentic AI to the Insurance Industry

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.5 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.5 Anthropic
2.
Gemini 3 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 3 Pro Google DeepMind
3.
Gemini 3 Flash Preview Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 3 Flash Preview Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!

Subscribe