Published on April 1, 2026

Red Hat AI Achieves Top Results in MLPerf Inference v6.0 – Key Insights

Red Hat AI Achieves Top Results in MLPerf Inference v6.0 – Here's What's Behind It

Red Hat AI has secured top spots in the latest round of the MLPerf Inference v6.0 benchmark, testing three models on both NVIDIA and AMD GPUs.

Infrastructure 4 – 6 minutes min read

Event Source: Red Hat 4 – 6 minutes min read

There's an industry benchmark for AI systems called MLPerf Inference. In short, it's a kind of official test: companies take real models, run them on their hardware, and publish the results publicly. No closed-door demos – just numbers that can be compared. Rounds are held several times a year, and each new release showcases the industry's progress.

In the sixth round – MLPerf Inference v6.0 – Red Hat AI secured top spots in several categories. This is remarkable in itself because hardware manufacturers are usually the ones leading the pack. Here, a company that focuses on its software stack and open-source tools came to the forefront.

Three AI Models, Three Performance Stories

Three Models, Three Stories

Red Hat AI tested three models, each with a different profile.

The first is Whisper. It's a speech recognition model that transcribes audio into text. The task might seem simple, but in practice, it requires fast processing of a data stream, especially when requests are coming in continuously. It was in this category that Red Hat achieved one of its best results.

The second is Qwen3-VL. This is a multimodal model: it can work not only with text but also with images simultaneously. Simply put, you can show it a picture and ask a question – it will understand both. Such models are more complex to serve because they need to process different data types coherently.

The third is GPT-OSS-120B. This is a large language model with 120 billion open-source weights. The more parameters, the higher the memory and speed requirements. Keeping such a model within acceptable limits for latency and throughput is a non-trivial engineering challenge.

Why These MLPerf Results Go Beyond Simple Benchmarks

Why This Is More Than Just “Good Numbers”

Many MLPerf participants often optimize for a specific test: they take one model, one piece of hardware, one scenario – and squeeze out the maximum performance there. Red Hat took a slightly different path: three different models, two different GPU manufacturers – NVIDIA and AMD – and a unified software approach.

This is important because, in real-world deployments, companies rarely operate with a perfectly homogeneous infrastructure. Some use NVIDIA, while others are starting to consider AMD as an alternative. If your toolkit works well on both, that's a practical advantage, not just a line in a press release.

Technical Details of Red Hat's MLPerf Inference Setup

How It Worked Under the Hood – Just What You Need to Know

Red Hat AI used vLLM, an inference engine for running large language models that is optimized for high throughput. It can efficiently manage memory and process many requests in parallel without sacrificing speed.

Additionally, they used llm-d, a distributed request scheduler that allows for scaling inference horizontally – in other words, distributing the load across multiple nodes without manually configuring each one.

All of this ran on top of OpenShift AI, a platform for running AI workloads in enterprise environments. Its role here wasn't so much about acceleration itself, but rather the ability to deploy such systems reproducibly and manageably in real-world conditions, not just in a lab.

Simply put, the team didn't invent specialized solutions just for impressive numbers in a benchmark; they used the same stack that is applied in real products. This changes the meaning of the result slightly: it's not a “synthetic record,” but a demonstration that existing tools are genuinely competitive.

Open Source as a Core Strategy in AI Infrastructure

Openness as a Strategy

Another point worth noting is that all the components used by Red Hat are open source. vLLM, llm-d, and the models are not proprietary developments kept closed within the company. Participating in MLPerf with an open stack is both a demonstration of capabilities and an argument that open source in AI infrastructure is no longer just a “budget option.”

For the industry, this is no small matter. For a long time, the unwritten rule was: if you want the best performance, use closed, proprietary solutions optimized for specific hardware. Results like these are gradually blurring that line.

What Remains Behind the Scenes

MLPerf is a good guidepost, but it's not the absolute truth. The test measures performance under strictly defined conditions: specific models, specific load scenarios, and specific metrics. In real-world systems, the conditions are always different – different requests, different usage profiles, and different constraints.

Furthermore, optimizing for a benchmark and optimizing for production are not the same thing. The teams participating in MLPerf know the rules of the game and prepare for them. How well these same results can be reproduced “in the wild” is a separate question that no test can definitively answer.

Nevertheless, MLPerf remains one of the few places where different approaches can be fairly compared under more or less controlled conditions. And Red Hat AI's appearance there with an open stack, multiple models, and two GPU platforms is, at the very least, a signal that this direction was chosen deliberately.

#event #analysis #machine learning #ai development #engineering #infrastructure #open technologies #ai benchmarks #model optimization

Link to Original: https://www.redhat.com/en/blog/red-hat-ai-tops-mlperf-inference-v60-vllm-qwen3-vl-whisper-and-gpt-oss-120b

Original Title: Red Hat AI tops MLPerf Inference v6.0 with vLLM on Qwen3-VL, Whisper, and GPT-OSS-120B

Publication Date: Apr 1, 2026

Red Hat www.redhat.com Global company developing open software platforms and infrastructure solutions with AI support.

Previous Article AI Governance: Why Rules Without Adaptation Become a List of Workarounds Next Article AMD at MLPerf Inference 6.0: A Million Tokens Per Second and a Debut in Video Generation

Red Hat AI Achieves Top Results in MLPerf Inference v6.0 – Key Insights

Three AI Models, Three Performance Stories

Why These MLPerf Results Go Beyond Simple Benchmarks

Technical Details of Red Hat's MLPerf Inference Setup

Open Source as a Core Strategy in AI Infrastructure

What Remains Behind the Scenes

Related Publications

AMD and Artificial Intelligence: How the Company is Catching Up to Market Leaders in Inference Performance

AMD Opens Access to Powerful RL Training on Its GPUs: What This Means for Developers

Voice Showdown: The First Open Arena for Voice AI Models

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration