Large language models are quite good at writing code. This is hardly news – millions of developers use AI assistants in their daily work. But it's one thing to write a functional snippet and quite another to write the fastest possible code. This is especially true for computations where every millisecond counts: scientific simulations, data processing, and machine learning tasks.
This is precisely what the new experiment from the AMD team addresses. In the second part of their series on AI agents for high-performance computing (HPC), researchers showed how a language model can be made to not just generate code, but to iteratively improve it – repeatedly, until performance reaches the desired level.
The Problem: Writing Isn't Optimizing
When we ask an AI to write code, it usually produces something that works. But 'working' and 'fast' are not the same thing. In high-performance computing, code must make the most of the hardware's capabilities: parallel computing, memory features, and the specifics of a particular processor or graphics card.
A human expert handles this through iterations: writing, running, measuring, and rewriting. And the rewriting isn't random; it's based on an understanding of what is slowing the program down. Can an AI be taught to do the same?
The answer proposed by the authors of the publication is yes – if you give the model the right tools and the right feedback.
OpenEvolve: Code Evolution as a Strategy
At the core of the experiment is a tool called OpenEvolve. Simply put, it's a system that approaches code optimization in the same way biological evolution approaches survival: through numerous attempts, selection of the best variants, and gradual improvement.
Here's how it works in practice:
- The AI agent receives the source code and the task – to make it faster.
- The model suggests changes: rewriting snippets, trying different approaches.
- Each variant is run and measured to see how fast it is compared to the previous one.
- The best variants 'survive' and become the basis for the next round of changes.
- The cycle repeats.
This isn't just 'asking the AI to rewrite the code.' It's a guided iterative process, where each iteration is based on actual performance measurements, not on the model's intuition.
MCP: How the Agent 'Talks' to Tools
A separate part of the experiment deals with how precisely the AI agent interacts with optimization tools. It uses an approach called MCP (Model Context Protocol), which is, in short, a standardized way for a language model to call external tools and receive results from them.
Imagine the agent has 'hands': it can run the code, get the measurement result, pass it back to the model, and the model then decides on the next step. MCP describes precisely how these 'hands' are structured and how to use them.
This is a crucial point because, without this connection, the agent would be limited to its own internal assumptions about what works well and what doesn't. With MCP, it receives real data from a real environment, and this fundamentally changes the quality of the optimization.
The End Result
The experiment was conducted on tasks from the field of high-performance computing – those where GPU speed is critical. The results showed that the iterative approach with OpenEvolve genuinely allows the agent to find more performant solutions than what it would have proposed on the first try.
This doesn't mean the AI has surpassed an experienced optimization engineer. But it does demonstrate a working concept: automated optimization through iterations and measurements is a feasible idea, not just a theory.
For developers working on computationally intensive tasks, this could mean a new tool in their arsenal: not 'ask an AI to write fast code,' but 'launch an agent that will improve the code until it reaches the goal.'
Why This Is Interesting Beyond HPC
High-performance computing is a rather specific field. But the logic demonstrated by this experiment has broader applications.
Iterative improvement with feedback is, in essence, how any good engineering process works. And if an AI agent can integrate into this process not as a generator of ready-made solutions, but as a participant in the 'try-measure-improve' cycle, it changes how we think about development automation as a whole.
For now, such systems require configuration, contextual understanding, and a clear definition of what exactly we want to optimize. The agent doesn't know on its own that 'faster' is better until we tell it how to measure it. But this is a reasonable limitation, not a fundamental barrier.
Open Questions
Like any experiment, this work leaves some questions open.
How well does the approach scale to more complex codebases? What happens when optimizing one snippet degrades another? How does the agent handle situations where 'faster' is difficult to measure directly?
These questions don't devalue the result – they simply point the way for the next steps. And the very fact that AMD is publicly sharing such experiments suggests that the topic of AI agents for development is moving beyond research labs and becoming part of the practical engineering agenda.