Two approaches are currently competing in image generation: diffusion models and autoregressive models. The former gradually remove noise from a picture, while the latter assemble it piece by piece, much like a puzzle – token by token. AMD decided to double down on the second option and released Nitro-AR, a compact transformer that operates faster and is lighter than many competitors.
What Is Autoregressive Generation Explained
What Is Autoregressive Generation?
Autoregressive models function by predicting the next element of an image based on everything they have already generated. This is similar to how language models write text – word by word. However, instead of words, these models use visual tokens that encode parts of the picture.
This approach isn't new, but for a long time, it lagged behind diffusion models in quality. The situation began to change when researchers learned how to more effectively convert images into tokens and train transformers on visual data.
How AMD's Approach Differs in Image Generation
What Did AMD Do Differently?
Nitro-AR is built upon the team's previous development – the Nitro model. The new version is more compact and quicker. The primary difference lies in its architecture and training method.
The model uses an improved tokenizer that more effectively compresses an image into a sequence of tokens. This allows the transformer to work with fewer elements and spend less time on generation.
Another key point is that Nitro-AR was trained on resolutions up to 1024x1024 pixels, yet the model can generate images of even higher resolution. This makes it flexible for various tasks.
Why Speed and Size Are Crucial for Models
Speed and Size Matter
One of Nitro-AR's strong suits is its compactness. The model occupies less memory than many diffusion counterparts and operates faster during the generation stage. This is crucial for practical applications, especially when there's a need to deploy the model on limited hardware or generate many images in a short amount of time.
AMD notes that Nitro-AR shows competitive quality with lower computational costs. Simply put, you get a similar result, but faster and with fewer resource requirements.
Applications for Compact Autoregressive Models
Where This Can Be Useful
Compact autoregressive models are suitable for scenarios where speed is paramount: real-time content generation, embedding into applications, and running on devices with limited memory. Another advantage of the autoregressive approach is that it is easier to scale and combine with other tasks, such as text generation.
However, there are limitations. Autoregressive models are harder to train, they are sensitive to errors at early stages of generation, and it is more challenging to control the creation process on the fly – unlike diffusion models, where you can intervene at different steps.
Future of Autoregressive Image Generation Models
What's Next?
Nitro-AR represents another step in the development of autoregressive generation. While this approach hasn't yet supplanted diffusion models, it is becoming increasingly competitive. Perhaps in the future, we will see hybrid architectures that combine the strengths of both methods.
For now, AMD is demonstrating that autoregressive generation can be not only high-quality but also practical – fast and lightweight.