Imagine this: you enter an AI-generated virtual world, press «forward», and the character walks sideways. Or you turn the camera, and the scene begins to «drift.» This isn't a bug in the code; it's a fundamental flaw in modern world models. They can generate beautiful video, but they have a poor grasp of exactly what is required of them in interactive mode. Tencent's Hunyuan team decided to tackle this issue head-on and released WorldCompass, an open-source tool specifically designed to solve it.
In short, a world model is an AI that doesn't just draw pictures but generates an interactive space. You provide it with a text description or a single image, and it begins creating a video stream – a virtual world you can navigate in real time using a keyboard or mouse. The camera moves, the space shifts, and objects stay in their places – at least, in an ideal scenario.
The task is harder than it looks. Standard video generation involves creating a single beautiful clip. A world model must generate an infinite sequence of frames in response to user actions while maintaining geometric consistency: if you walk away from a table and come back, it should still be there, not shifted or morphed into something else.
A Problem No One Had Truly Solved
Until recently, world models faced a stark dilemma: speed or memory. Fast systems generated video in real time, but the scenes lacked stability – the world would «rewrite» itself with every new glance. Systems with good memory maintained geometry but were too slow for live interaction.
The HY-World 1.5 (WorldPlay) project, previously introduced by the Hunyuan team, was an attempt to resolve this contradiction and generally succeeded: the model generates video at 24 frames per second while maintaining spatial consistency over long sequences. However, another challenge remained: even a well-trained model in interactive mode occasionally ignores commands or suffers from drops in image quality during complex maneuvers. It can generate a world, but it doesn't always accurately obey the user.
WorldCompass: Learning Through Consequences
WorldCompass is a fine-tuning framework based on reinforcement learning (RL). Simply put, it's a way to teach a model not just to «draw beautifully» but to generate content correctly – in line with user expectations.
The principle of reinforcement learning is similar to training: the model performs an action, receives a score (how well it did), and adjusts its behavior. For world models, this is non-trivial because video isn't generated all at once; it's produced sequentially, frame by frame, where each subsequent frame depends on the previous one. An error at the start can accumulate and lead to quality degradation a few seconds later.
The team solved this in several ways. Instead of evaluating long sequences, which is computationally expensive, the developers introduced segment-level rewards: the model generates short clips, each of which receives an immediate score. This speeds up training and provides a more precise signal of exactly where a failure occurred.
Furthermore, the evaluation system was split into two independent parts: one monitors the accuracy of movement commands, while the other tracks visual image quality. This is crucial: if there were only a single metric, the model might learn to «cut corners» – for example, sacrificing image quality to formally fulfill a movement command, or vice versa.
According to the team, after applying WorldCompass, the WorldPlay model showed a marked improvement in command-following accuracy and image stability. This applies to both short and long sequences, and is evident in simple actions (moving forward) as well as complex combinations (simultaneous movement with camera rotation).
Importantly, WorldCompass was designed as a universal tool; it isn't tied to a specific architecture. The authors tested it on two different types of models, and in both cases, the results improved. This means that other researchers and developers will be able to apply a similar approach to their own projects.
Open Source is More Than Just Generosity
The team has made WorldCompass open-source. This is not just an opportunity for outside specialists to replicate the results and adapt the framework for their needs, but also a signal to the entire industry: the problem of reinforcement learning for world models is no longer a closed topic restricted to a few major labs.
Until now, most work on applying RL to generative models has focused on static images or short videos. World models are a different class of problem: here, the goal isn't one successful generation, but sustained behavior during long interactive sessions. WorldCompass is the first public framework specifically adapted to these dynamics.
What Still Remains Behind the Scenes 🎬
It is worth remembering that this is specifically fine-tuning, not building a system from scratch: WorldCompass enhances an existing model but does not replace the other stages of its preparation. World models themselves still require significant computational resources, and their use is currently limited to research and professional environments – you won't be running such an «infinite world» on a standard laptop just yet.
The question of how these systems handle the physics and logic of reality also remains open: creating visually stable spaces is one thing, but reproducing cause-and-effect relationships (for instance, that water spills when a glass falls) is a different story entirely. Nevertheless, WorldCompass takes a major step toward making world models not just look, but also behave convincingly.