One of the common challenges of working with large language models is the gap between “download and try” and “actually run it on proper hardware.” Cloud platforms are gradually bridging this gap, and the AMD Developer Cloud is one such example. They recently released a detailed guide on deploying the Qwen3-5 model using the SGLang framework, which is a good reason to take a closer look at what is happening and why it might be interesting.
Qwen3-5 – What Is This Model?
Qwen3-5 is a language model from the Chinese company Alibaba, part of the Qwen3 series. Simply put, it's a large, well-trained, general-purpose model that can answer questions, write code, reason, and perform a wide variety of text-based tasks. The version with 122 billion parameters is one of the most powerful in its family.
The model is open-source, which makes it attractive to developers and researchers who want to deploy something substantial on their own rather than relying entirely on external APIs.
AMD Developer Cloud and the OpenCLaw Project
AMD Developer Cloud is a cloud platform that provides access to GPU servers powered by AMD graphics cards. This might sound simple, but it is important in the context of AI: most cloud solutions for running models have historically been geared toward NVIDIA hardware. AMD is actively playing catch-up, and the Developer Cloud is part of that effort.
OpenCLaw is a project within the AMD ecosystem aimed at simplifying the deployment of open-source language models on AMD hardware. In short, it is a set of tools and best practices that lowers the barrier to entry for those who want to run a modern model but don't want to deal with the intricacies of hardware and software stack compatibility from scratch.
SGLang – What Is It For?
SGLang is a framework for running language models in server mode. To put it simply, it turns a model into a service: requests come in, responses go out, and everything works like an API. This is especially important in scenarios where the model needs to handle multiple requests at once – for example, in applications or automation pipelines.
SGLang is performance-oriented and supports AMD hardware, making it a logical choice for this setup. In the described configuration, the model runs as a server that is accessible over the network, with authorization support and a configurable backend for computations.
How It Works in Practice
The core idea is to run a container (an isolated environment) with the model and SGLang inside, which starts accepting requests at a specified address and port. The model is loaded into GPU memory, and after that, you can work with it just like any other language service: send requests and receive responses.
All of this is deployed on AMD's cloud servers, which means you don't need to own expensive hardware. You just need to get access to the AMD Developer Cloud, follow the instructions, and a 122-billion-parameter model becomes available as a local service.
Why This Is Interesting
A few years ago, running a model of this scale on your own was practically impossible without specialized infrastructure. Now, it is becoming an increasingly routine task, provided you have the right tools and access to the cloud.
For developers, this means greater independence: you can take an open-source model, deploy it on a controlled infrastructure, customize it to your needs, and avoid paying for every request to a third-party API. For AMD, it is a demonstration that its platform is perfectly capable in scenarios that were once exclusively associated with competing hardware.
A number of practical questions remain: performance on AMD hardware versus alternatives, the ease of setup for those unfamiliar with containers and configuration files, and the cost of cloud usage under sustained loads. But the very fact that such detailed guides are being published is a good sign for those who are betting on open-source models and alternative hardware.