Published on March 31, 2026

How AI is Learning to Use a Computer: OpenAI's GPT-5.4 Release, Agents, and a New Era of Automation

OpenAI has introduced GPT-5.4, a model capable of controlling a computer, writing code, and performing tasks in applications without human intervention.

Products 4 – 6 minutes min read
Event Source: Prime Intellect 4 – 6 minutes min read

In short: OpenAI has released a new version of its model – GPT-5.4. This is arguably the first time that the company's flagship language model can natively control a computer. Not as an experiment, not through an additional plugin, but as a built-in, native capability.

What does it mean to control a computer?

What does it mean to «control a computer»?

It sounds a bit intimidating, but in practice, it's quite straightforward. The model can analyze a screenshot, understand what's on it, and send commands – as if someone were moving the mouse and pressing keys. Simply put, the AI sees your screen and can interact with it.

This opens up possibilities for automating tasks that previously required human presence. For example, the model can go to a store's website, find the right products for a recipe, and place an order. Or it can open a spreadsheet, process the data, and save the result. All of this can be done without the user needing to perform the operations step-by-step manually.

Where did this idea come from?

The concept isn't new. For some time now, the industry has been discussing the so-called «agentic» future of AI. The idea is that instead of one large assistant you ask questions, a network of small AI agents emerges. Each one performs its part of the task: one plans, another searches for information, and a third takes action.

GPT-5.4 is a step in this direction, and a very concrete one at that: the company has integrated computer control not as a separate add-on, but as part of the model's core architecture.

The groundwork for this was laid in advance. Previously, OpenAI had already introduced the ChatGPT Agent feature, which allowed the AI to take partial control of a computer to perform specific tasks. GPT-5.4 makes this a systemic feature, not an experimental one.

What else has changed besides computer control?

Several important changes:

  • Fewer «fabrications.» The likelihood of false statements has been reduced by 33% compared to the previous version. This is important because language models have historically been prone to so-called «hallucinations» – when the AI confidently reports information that is not factually correct. The progress here is significant.
  • Better performance with multiple sources. The model can conduct several rounds of searching, gather information from different places, and provide a coherent, well-argued answer. This was previously a weak point.
  • Improvements in programming and document handling. GPT-5.4 is better at handling code, spreadsheets, presentations, and text documents. For those who use AI in their work, this is a noticeable improvement.

Versions for Every Taste and Budget

Along with the flagship GPT-5.4, the company has released several versions for different tasks.

GPT-5.4 Thinking – a version with enhanced reasoning capabilities. It can show a brief outline of its «thoughts» when working on complex tasks. Additionally, a user can adjust the prompt mid-response – without waiting for it to finish.

GPT-5.4 Pro – for the most complex tasks. It is available through corporate and educational subscriptions.

Separately, OpenAI released GPT-5.4 mini and GPT-5.4 nano – compact versions focused on speed and low cost. They are primarily intended for developers: for automating repetitive tasks, acting as «sub-agents» within more complex systems, fixing code, and processing data.

According to the company, GPT-5.4 mini runs more than twice as fast as the previous compact version while being nearly on par with the flagship model for programming tasks. This means developers can use a cheaper option without a significant loss in quality – which is quite important from a practical standpoint.

GPT-5.4 nano is even simpler and faster. Its purpose is to perform auxiliary operations: sorting, extracting data from text, and simple calculations.

And what about GPT-5?

Alongside these releases, OpenAI introduced GPT-5 – a new generation of the model with expanded long-term memory and improved accuracy. Sam Altman, the company's CEO, described the transition to GPT-5 as a quantum leap:

"If GPT-3 was like a high schooler and GPT-4 was like a college student, then GPT-5 is like an expert with a Ph.D."

GPT-5 is available to all ChatGPT users, including those with free accounts, although a usage limit has been introduced for them. Three versions are available to developers via the API: GPT-5, GPT-5 mini, and GPT-5 nano.

What does this mean for the future?

Looking at the big picture, what's happening is this: AI is gradually ceasing to be just a «chatbot you ask questions» and is becoming a tool that can do things. Not just answering, but acting.

For the average user, this means that tasks that previously required manually opening dozens of tabs, copying data, and filling out forms could potentially be accomplished simply by describing them in words. How reliably and securely this will work in real-world scenarios is a question that remains open. Controlling a computer on a user's behalf is an area where the cost of an error is much higher than just an inaccurate chat response.

For developers, the emergence of compact models with agentic capabilities means that building such systems is becoming cheaper and faster. A sub-agent architecture – where one AI manages several simpler ones – is becoming increasingly practical, not just theoretical.

The industry is clearly moving in one direction: automation through AI agents. GPT-5.4 is one of the most concrete steps in this direction to date. 🤖

Original Title: Partnering with Browserbase to Train Browser and Computer Use Agents
Publication Date: Mar 30, 2026
Prime Intellect www.primeintellect.ai An international research initiative working on decentralized infrastructure and artificial intelligence training using distributed computing.
Previous Article How AI Agents Are Changing the Approach to Databases Next Article How Mobile Networks Are Getting Smarter: SK Telecom and NTT DOCOMO Release Joint White Paper on the Future of Radio Networks

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

AI: Events

Cursor Unveils Prototype for Autonomous Codebase Editing

Technical context Development

The Cursor team has granted access to an experimental feature that allows AI to independently handle project code over several iterations without human intervention.

Cursor AIcursor.com Feb 6, 2026

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Don’t miss a single experiment!

Subscribe to our Telegram channel —
we regularly post announcements of new books, articles, and interviews.

Subscribe