AMD has published a technical guide on deploying OpenHands on its Instinct server GPUs. OpenHands is an agent based on large language models that helps automate developer tasks: it writes code, fixes bugs, and works with repositories.
What OpenHands Is and Why You Need It
Simply put, OpenHands is not just a chatbot for answering questions. It is an agent that can execute a sequence of actions: open a file, edit it, run tests, and commit changes. It works like a virtual programmer's assistant, capable of taking on routine tasks.
To operate, such an agent needs a language model – in this case, AMD uses Qwen3-Coder-30B-A3B-Instruct, a specialized model for code generation. And to ensure the model processes requests quickly, it is run via vLLM, an engine for accelerated GPU inference (computing on the graphics processor).
How It Works in Practice
How It Works in Practice 🔧
AMD showed the basic command for launching the model on its Instinct GPUs:
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct --max-model-len 32000 --enable-auto-tool
There are a few important details here. The max-model-len parameter limits the context length – to 32,000 tokens in this case. This means the agent can work with fairly large fragments of code or documentation in a single request.
The enable-auto-tool flag activates tool support – a mechanism allowing the model to not just generate text, but call functions: read files, execute terminal commands, and access APIs.
Once vLLM is running on the GPU, OpenHands connects to it like a standard inference server and starts sending requests.
Why AMD Is Highlighting This
For AMD, this is part of a broader strategy: to show that their Instinct server GPUs can handle not only model training but also inference in production (operation). Especially in such demanding scenarios as developer agents, where not only speed but also stability matters.
vLLM was originally developed with NVIDIA in mind, but it is being actively ported to other platforms, including AMD ROCm. AMD's publication is a signal to developers: yes, you can use the same tools as on NVIDIA, just on different hardware.
What Remains Behind the Scenes
The guide is technical in nature, so AMD does not disclose certain practical details. For example, how stable vLLM is on ROCm compared to CUDA, whether there are compatibility issues, and which models are supported better or worse.
It is also unclear how fast Qwen3-Coder-30B works on Instinct in real-world tasks – AMD provides no benchmarks (performance tests). For developers choosing between platforms, this is important information.
Nevertheless, the very fact that such a guide was published suggests that the ecosystem of tools for AI agents on AMD is gradually maturing. While previously the choice of GPU for large model inference came with almost no alternatives, now more options are appearing – and that is generally good for the market.