Published February 7, 2026

AMD Micro-World Open Source Models for Interactive Video Generation

AMD Releases Open-Source Models for Interactive Video Creation

AMD has introduced Micro-World – the first open-source «world models». They are capable of generating video based on user actions in real-time and are optimized to run on the company's graphics processors.

Technical context Products
Event Source: AMD Reading Time: 5 – 7 minutes

AMD has published Micro-World – a set of models capable of creating video not just from text prompts, but also by accounting for user actions. To put it simply, this is an attempt to teach a neural network to predict how the visual environment will change if you interfere with it.

The main feature of the project is that these are the first open-source models of this type optimized for AMD graphics cards. The code and model weights are publicly available – they can be downloaded and tested on your own hardware.

What are world models and why do they matter

A world model («world model») is an algorithm that attempts to recreate the logic behind how a specific environment works. It doesn't necessarily have to be the real world: the object could be a game, a simulation, or a video sequence. The idea is for the neural network to learn to predict the consequences of various changes.

For example, you show the model a video of a car on the road and set a condition: «Imagine the driver turned the wheel left». The model should generate a continuation of the clip where the car actually performs the maneuver. This is not just image generation, but an understanding of cause-and-effect relationships.

Such solutions are critically important for training AI agents. Instead of running them in a real environment (which is expensive, time-consuming, and often dangerous), the AI can be trained inside a simulation created by a world model. The agent performs actions, the model demonstrates the result – this is how the system's gradual learning occurs.

AMD Micro-World Features and Capabilities

What Micro-World can do

Micro-World consists of several models of varying sizes – from compact to large-scale. They all operate based on diffusion principles: the process begins with visual noise, which is gradually transformed into meaningful video by relying on context – previous frames and user commands.

The models were trained on datasets from video games and simulations. Because of this, they understand character movement mechanics, environmental changes during interaction, and the physics of virtual worlds.

Unlike traditional video generators that create an entire clip upon request, Micro-World reacts to actions in real-time. You press a button – the model instantly generates the next frame considering this input. This turns the process into more of an interactive simulation than a standard video viewing experience.

Why AMD emphasizes openness

Most major developments in the field of world models are either closed or strictly tied to the hardware of a specific vendor. AMD is choosing a different path: it is publishing the code and weights in the public domain, optimizing them for its GPUs based on ROCm – the software platform for working with the company's graphics cards.

This opens up the opportunity for researchers and developers to experiment with models without being restricted to a single ecosystem. A model can be taken as a base, refined for a specific task, and fine-tuned on custom data using a completely open technology stack.

For AMD, this is also an effective way to demonstrate that their «hardware» is perfectly suited not only for gaming but also for serious machine learning tasks. The ROCm platform is gradually becoming a full-fledged alternative to CUDA, and projects like this accelerate that process.

Use Cases for World Models and Interactive Video Generation

Where this might be useful

The most obvious scenario is training AI agents for the gaming industry and robotics. Instead of modeling physics from scratch using complex engines, one can use a neural network trained to predict results based on real data.

Another option is the creation of interactive training environments. For example, a model can visually demonstrate how a car will behave in various weather conditions or how a scene's composition will change when manipulating objects. This is extremely useful for debugging control algorithms.

There are also more futuristic ideas: generating game worlds on the fly, where the environment is created not by pre-written rules, but by a neural network that understands the logic of space. For now, these are just experiments, but the direction looks promising.

Limitations and Challenges of AMD Micro-World

What remains unclear

Micro-World is primarily a research project rather than a commercial product. At the moment, the models work with relatively simple data: games and simulations with predictable physics. How effectively they will handle complex real-world scenarios remains an open question.

The issue of resources should also be considered: interactive video generation is extremely demanding on computing power. Even with open code, not all users will be able to run the model at an acceptable speed. AMD's optimization for its GPUs helps, but it does not guarantee high performance on all hardware.

And the key nuance is prediction accuracy. If the simulation the AI agent learns on contains errors or inaccuracies, the algorithm may adopt incorrect behavioral strategies. This is the classic «sim-to-real gap» problem, typical of any virtual training.

Why it is important

World models are a major step toward AI understanding not just individual objects, but their interconnections and the dynamics of change. This brings AI closer to human perception: we see reality not as a set of static frames, but as a continuous cause-and-effect process.

Releasing Micro-World into the public domain significantly lowers the barrier to entry for the scientific community. Developers no longer need to wait for access to closed APIs or adapt to someone else's infrastructure.

Certainly, full-scale reality modeling is still a long way off. However, every project like this is an essential element in understanding how to teach machines to perceive the world not through dry rules, but through accumulated experience.

Original Title: Micro-World: First AMD Open-Source World Models for Interactive Video Generation – ROCm Blogs
Publication Date: Feb 6, 2026
AMD www.amd.com An international company manufacturing processors and computing accelerators for AI workloads.
Previous Article Hugging Face Community Evals: When the Community Decides to Test Models Itself Next Article Barcelona Supercomputing Center and ACAPPS develop AI tools for people with hearing impairments

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.5 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.5 Anthropic
2.
Gemini 3 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 3 Pro Google DeepMind
3.
Gemini 3 Flash Preview Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 3 Flash Preview Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe