Reinforcement Learning (RL) is one of the key methods that make modern language models smarter and more useful after their initial training. This 'fine-tuning' stage is what ensures a model doesn't just generate text, but does so intelligently: following instructions, avoiding incorrect answers, and solving problems step-by-step. Simply put, RL is what turns a 'knowledgeable' model into a 'useful' one.
Until recently, the infrastructure for this type of training was almost entirely tailored for NVIDIA GPUs. The Miles framework – one of the most advanced tools for large-scale RL training – was no exception. The LMSYS team, in collaboration with AMD, has changed this: Miles now officially supports AMD Instinct series GPUs running on the ROCm platform.
What is Miles and Why is it Important
Miles is a system for the so-called post-training of already prepared language models using reinforcement learning. This is the exact approach used to create 'reasoning' models – those that analyze a task step-by-step before providing an answer.
The main feature of Miles is its ability to work in a distributed mode: training can run simultaneously on multiple GPUs spread across several servers. This is critically important when working with large models that simply do not fit on a single accelerator.
Until now, this level of scaling was available primarily on NVIDIA hardware. AMD's support changes this situation.
Technically – Almost No Losses
Adapting to ROCm required significant engineering work. AMD's platform is structured differently than NVIDIA's CUDA, and not all code can be ported automatically. The team had to handle compatibility at the low-level operations, debug interactions between GPUs on different nodes, and ensure that performance did not decrease.
The result was encouraging: Miles on AMD Instinct demonstrates performance comparable to NVIDIA for large-scale RL training. This isn't a case of 'it works, but it's slower' – this is full-fledged support.
To understand the scale: tests were conducted on models like DeepSeek-R1 – one of the most resource-intensive open models available today. These are the very models that actively use RL in training and require the coordinated work of dozens of GPUs simultaneously.
Why AMD Needs This – and Why Everyone Else Does Too
AMD is consistently investing in the development of its ecosystem for AI computing. The release of ROCm 7.1 brought official support for the MI350X and MI355X, and version ROCm 7.2.0 significantly improved performance on inference tasks for large models. In parallel, AMD open-sourced the ROCprof Trace Decoder – a tool for in-depth analysis of GPU performance that was previously closed-source.
Support for Miles is part of the same logic. Previously, a developer wanting to train a model using RL was forced to work exclusively on NVIDIA; now, they have a real alternative.
This is important not just for large companies. Research groups, universities, and small teams often use the hardware that is available, not necessarily the hardware they want. Expanding compatibility means that the barrier to entry for serious RL training is lowered.
Openness as a Strategy
It's also significant that all of this is happening within an open ecosystem. ROCm is an open platform, Miles is being developed by the LMSYS team as a research project, and AMD itself is actively publishing test results and sharing code. For example, the ATOM engine, optimized for inference on the MI355X, was made publicly available on GitHub.
This approach – open source code, open benchmarks, open tools – is gradually changing the perception of AMD in the community. For a long time, NVIDIA was seen as the only viable choice for serious AI tasks, largely due to the maturity of its ecosystem. Now, that gap is closing.
What This Changes in Practice
In short: language model developers now have another genuinely working option for large-scale reinforcement learning – and this option is not dependent on NVIDIA.
This doesn't mean that everyone will immediately switch to AMD. NVIDIA's ecosystem is still deeper, with more tools and significantly more community experience. But the existence of a working alternative is valuable in itself: it creates competition, stimulates development, and gives freedom of choice to those who need it.
Miles on ROCm is not an announcement of a future possibility; it is a working tool available today. And that is, perhaps, the most important thing.