Published on March 24, 2026

AMD support for RL training on GPUs

AMD Opens Access to Powerful RL Training on Its GPUs: What This Means for Developers

AMD has adapted the Miles framework for large-scale reinforcement learning on Instinct GPUs – now it works without NVIDIA hardware.

Infrastructure / Technical context 4 – 5 minutes min read
Event Source: LMSYS ORG 4 – 5 minutes min read

Reinforcement Learning (RL) is one of the key methods that make modern language models smarter and more useful after their initial training. This 'fine-tuning' stage is what ensures a model doesn't just generate text, but does so intelligently: following instructions, avoiding incorrect answers, and solving problems step-by-step. Simply put, RL is what turns a 'knowledgeable' model into a 'useful' one.

Until recently, the infrastructure for this type of training was almost entirely tailored for NVIDIA GPUs. The Miles framework – one of the most advanced tools for large-scale RL training – was no exception. The LMSYS team, in collaboration with AMD, has changed this: Miles now officially supports AMD Instinct series GPUs running on the ROCm platform.

What is Miles framework and its importance for RL

What is Miles and Why is it Important

Miles is a system for the so-called post-training of already prepared language models using reinforcement learning. This is the exact approach used to create 'reasoning' models – those that analyze a task step-by-step before providing an answer.

The main feature of Miles is its ability to work in a distributed mode: training can run simultaneously on multiple GPUs spread across several servers. This is critically important when working with large models that simply do not fit on a single accelerator.

Until now, this level of scaling was available primarily on NVIDIA hardware. AMD's support changes this situation.

Miles performance on AMD Instinct GPUs

Technically – Almost No Losses

Adapting to ROCm required significant engineering work. AMD's platform is structured differently than NVIDIA's CUDA, and not all code can be ported automatically. The team had to handle compatibility at the low-level operations, debug interactions between GPUs on different nodes, and ensure that performance did not decrease.

The result was encouraging: Miles on AMD Instinct demonstrates performance comparable to NVIDIA for large-scale RL training. This isn't a case of 'it works, but it's slower' – this is full-fledged support.

To understand the scale: tests were conducted on models like DeepSeek-R1 – one of the most resource-intensive open models available today. These are the very models that actively use RL in training and require the coordinated work of dozens of GPUs simultaneously.

Impact of AMD's RL training support

Why AMD Needs This – and Why Everyone Else Does Too

AMD is consistently investing in the development of its ecosystem for AI computing. The release of ROCm 7.1 brought official support for the MI350X and MI355X, and version ROCm 7.2.0 significantly improved performance on inference tasks for large models. In parallel, AMD open-sourced the ROCprof Trace Decoder – a tool for in-depth analysis of GPU performance that was previously closed-source.

Support for Miles is part of the same logic. Previously, a developer wanting to train a model using RL was forced to work exclusively on NVIDIA; now, they have a real alternative.

This is important not just for large companies. Research groups, universities, and small teams often use the hardware that is available, not necessarily the hardware they want. Expanding compatibility means that the barrier to entry for serious RL training is lowered.

Open source strategy for AMD AI ecosystem

Openness as a Strategy

It's also significant that all of this is happening within an open ecosystem. ROCm is an open platform, Miles is being developed by the LMSYS team as a research project, and AMD itself is actively publishing test results and sharing code. For example, the ATOM engine, optimized for inference on the MI355X, was made publicly available on GitHub.

This approach – open source code, open benchmarks, open tools – is gradually changing the perception of AMD in the community. For a long time, NVIDIA was seen as the only viable choice for serious AI tasks, largely due to the maturity of its ecosystem. Now, that gap is closing.

Practical changes with AMD RL support

What This Changes in Practice

In short: language model developers now have another genuinely working option for large-scale reinforcement learning – and this option is not dependent on NVIDIA.

This doesn't mean that everyone will immediately switch to AMD. NVIDIA's ecosystem is still deeper, with more tools and significantly more community experience. But the existence of a working alternative is valuable in itself: it creates competition, stimulates development, and gives freedom of choice to those who need it.

Miles on ROCm is not an announcement of a future possibility; it is a working tool available today. And that is, perhaps, the most important thing.

Original Title: ROCm Support for Miles: Large-Scale RL Post-Training on AMD Instinct™ GPUs
Publication Date: Mar 17, 2026
LMSYS ORG lmsys.org A U.S.-based non-profit research organization studying scalable language models and distributed training systems.
Previous Article GitHub Taught Its Security Scanner to Understand Code Like a Human Next Article Reinforcement Learning: Expensive in Name Only

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe