In short: OpenAI has released a new version of its model – GPT-5.4. This is arguably the first time that the company's flagship language model can natively control a computer. Not as an experiment, not through an additional plugin, but as a built-in, native capability.
What does it mean to «control a computer»?
It sounds a bit intimidating, but in practice, it's quite straightforward. The model can analyze a screenshot, understand what's on it, and send commands – as if someone were moving the mouse and pressing keys. Simply put, the AI sees your screen and can interact with it.
This opens up possibilities for automating tasks that previously required human presence. For example, the model can go to a store's website, find the right products for a recipe, and place an order. Or it can open a spreadsheet, process the data, and save the result. All of this can be done without the user needing to perform the operations step-by-step manually.
The concept isn't new. For some time now, the industry has been discussing the so-called «agentic» future of AI. The idea is that instead of one large assistant you ask questions, a network of small AI agents emerges. Each one performs its part of the task: one plans, another searches for information, and a third takes action.
GPT-5.4 is a step in this direction, and a very concrete one at that: the company has integrated computer control not as a separate add-on, but as part of the model's core architecture.
The groundwork for this was laid in advance. Previously, OpenAI had already introduced the ChatGPT Agent feature, which allowed the AI to take partial control of a computer to perform specific tasks. GPT-5.4 makes this a systemic feature, not an experimental one.
Several important changes:
- Fewer «fabrications.» The likelihood of false statements has been reduced by 33% compared to the previous version. This is important because language models have historically been prone to so-called «hallucinations» – when the AI confidently reports information that is not factually correct. The progress here is significant.
- Better performance with multiple sources. The model can conduct several rounds of searching, gather information from different places, and provide a coherent, well-argued answer. This was previously a weak point.
- Improvements in programming and document handling. GPT-5.4 is better at handling code, spreadsheets, presentations, and text documents. For those who use AI in their work, this is a noticeable improvement.
Along with the flagship GPT-5.4, the company has released several versions for different tasks.
GPT-5.4 Thinking – a version with enhanced reasoning capabilities. It can show a brief outline of its «thoughts» when working on complex tasks. Additionally, a user can adjust the prompt mid-response – without waiting for it to finish.
GPT-5.4 Pro – for the most complex tasks. It is available through corporate and educational subscriptions.
Separately, OpenAI released GPT-5.4 mini and GPT-5.4 nano – compact versions focused on speed and low cost. They are primarily intended for developers: for automating repetitive tasks, acting as «sub-agents» within more complex systems, fixing code, and processing data.
According to the company, GPT-5.4 mini runs more than twice as fast as the previous compact version while being nearly on par with the flagship model for programming tasks. This means developers can use a cheaper option without a significant loss in quality – which is quite important from a practical standpoint.
GPT-5.4 nano is even simpler and faster. Its purpose is to perform auxiliary operations: sorting, extracting data from text, and simple calculations.
Alongside these releases, OpenAI introduced GPT-5 – a new generation of the model with expanded long-term memory and improved accuracy. Sam Altman, the company's CEO, described the transition to GPT-5 as a quantum leap:
"If GPT-3 was like a high schooler and GPT-4 was like a college student, then GPT-5 is like an expert with a Ph.D."
GPT-5 is available to all ChatGPT users, including those with free accounts, although a usage limit has been introduced for them. Three versions are available to developers via the API: GPT-5, GPT-5 mini, and GPT-5 nano.
Looking at the big picture, what's happening is this: AI is gradually ceasing to be just a «chatbot you ask questions» and is becoming a tool that can do things. Not just answering, but acting.
For the average user, this means that tasks that previously required manually opening dozens of tabs, copying data, and filling out forms could potentially be accomplished simply by describing them in words. How reliably and securely this will work in real-world scenarios is a question that remains open. Controlling a computer on a user's behalf is an area where the cost of an error is much higher than just an inaccurate chat response.
For developers, the emergence of compact models with agentic capabilities means that building such systems is becoming cheaper and faster. A sub-agent architecture – where one AI manages several simpler ones – is becoming increasingly practical, not just theoretical.
The industry is clearly moving in one direction: automation through AI agents. GPT-5.4 is one of the most concrete steps in this direction to date. 🤖