Published on January 28, 2026

How to Run an AI Coding Agent on AMD Instinct GPUs

AMD has demonstrated how to deploy OpenHands – an agent for automating code writing – on its server GPUs using the vLLM engine.

Infrastructure / Technical context 3 – 4 minutes min read

Event Source: AMD 3 – 4 minutes min read

AMD has published a technical guide on deploying OpenHands on its Instinct server GPUs. OpenHands is an agent based on large language models that helps automate developer tasks: it writes code, fixes bugs, and works with repositories.

What OpenHands Is and Why You Need It

Simply put, OpenHands is not just a chatbot for answering questions. It is an agent that can execute a sequence of actions: open a file, edit it, run tests, and commit changes. It works like a virtual programmer's assistant, capable of taking on routine tasks.

To operate, such an agent needs a language model – in this case, AMD uses Qwen3-Coder-30B-A3B-Instruct, a specialized model for code generation. And to ensure the model processes requests quickly, it is run via vLLM, an engine for accelerated GPU inference (computing on the graphics processor).

How It Works in Practice

How It Works in Practice 🔧

AMD showed the basic command for launching the model on its Instinct GPUs:

vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct --max-model-len 32000 --enable-auto-tool

There are a few important details here. The max-model-len parameter limits the context length – to 32,000 tokens in this case. This means the agent can work with fairly large fragments of code or documentation in a single request.

The enable-auto-tool flag activates tool support – a mechanism allowing the model to not just generate text, but call functions: read files, execute terminal commands, and access APIs.

Once vLLM is running on the GPU, OpenHands connects to it like a standard inference server and starts sending requests.

Why AMD Is Highlighting This

For AMD, this is part of a broader strategy: to show that their Instinct server GPUs can handle not only model training but also inference in production (operation). Especially in such demanding scenarios as developer agents, where not only speed but also stability matters.

vLLM was originally developed with NVIDIA in mind, but it is being actively ported to other platforms, including AMD ROCm. AMD's publication is a signal to developers: yes, you can use the same tools as on NVIDIA, just on different hardware.

What Remains Behind the Scenes

The guide is technical in nature, so AMD does not disclose certain practical details. For example, how stable vLLM is on ROCm compared to CUDA, whether there are compatibility issues, and which models are supported better or worse.

It is also unclear how fast Qwen3-Coder-30B works on Instinct in real-world tasks – AMD provides no benchmarks (performance tests). For developers choosing between platforms, this is important information.

Nevertheless, the very fact that such a guide was published suggests that the ecosystem of tools for AI agents on AMD is gradually maturing. While previously the choice of GPU for large model inference came with almost no alternatives, now more options are appearing – and that is generally good for the market.

#applied analysis #technical context #ai development #engineering #computer systems #business #development tools #generative agents

Link to Original: https://www.amd.com/en/developer/resources/technical-articles/2026/deploying-openhands-coding-agents-on-amd-instinct-gpus.html

Original Title: Deploying OpenHands Coding Agents on AMD Instinct GPUs

Publication Date: Jan 28, 2026

AMD www.amd.com An international company manufacturing processors and computing accelerators for AI workloads.

Previous Article MiniMax M2-her: How the Voice Model That Speaks 39 Languages Works Next Article Trinity Large: What's Inside and Why Arcee Released Three Versions of the Same Model

How to Run an AI Coding Agent on AMD Instinct GPUs

What OpenHands Is and Why You Need It

How It Works in Practice

Why AMD Is Highlighting This

What Remains Behind the Scenes

Related Publications

How Mistral AI Found a Memory Leak in vLLM – And Why It Wasn't Where They Were Looking

How LinkedIn Trained Its Code-Generating GPT-OSS Using Agentic Reinforcement Learning

AMD Introduces GPU Partitioning for Concurrent LLM Execution

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration