Published on April 3, 2026

Red Hat and NVIDIA Achieve Record AI Performance in MLPerf Inference Benchmarks

Red Hat and NVIDIA Show Record-Breaking Results in AI Performance Tests

Red Hat and NVIDIA have jointly achieved leading results in the independent MLPerf Inference v6.0 test, which covers image recognition, speech, and reasoning tasks.

Infrastructure 4 – 6 minutes min read
Event Source: Red Hat 4 – 6 minutes min read

There's an industry-standard benchmark for AI systems – MLPerf Inference. This is an independent set of tasks used to measure how quickly and efficiently a given hardware platform handles real-world usage scenarios. The benchmark doesn't evaluate synthetic workloads, but rather tasks that real systems encounter: image recognition, speech processing, and working with large language models. The results are published openly, and companies rely on them when choosing infrastructure for deploying AI.

In the latest round – MLPerf Inference v6.0 – Red Hat and NVIDIA collaborated and demonstrated some of the best results across several categories.

Why Is AI Performance Testing Essential?

Why Is Such a Test Even Needed?

As long as AI remains something abstract, no one really questions how it works «under the hood.» But as soon as it comes to real-world deployment – in the cloud, in a corporate environment, or in production – very specific requirements immediately arise: how many requests the system processes per second, how quickly it delivers the first response, and how stably it performs under load.

MLPerf is designed precisely to provide comparable and verifiable answers to these questions. The benchmark covers several scenarios: a model can be run in maximum throughput mode (how many requests it can process per unit of time) or in a mode with strict latency constraints (as in real-world applications where a user expects an immediate response).

What AI Models and Tasks Were Evaluated?

What Exactly Was Tested?

In this round, the task set covered several areas. First, vision – image classification and object detection tasks. Second, speech – automatic recognition and audio transcription. And, perhaps the most interesting area today, reasoning: this includes large language models, particularly Llama 3.1 405B, one of the most demanding open models available today.

Llama 3.1 405B became one of the main challenges of the round: the MLPerf organizers added it specifically to assess how platforms handle models that require a colossal amount of computation for each generated token.

How Collaboration Drives Superior AI Performance

Collaboration as a Key to Results

The unique aspect of this participation wasn't just running a pre-built stack on powerful hardware, but a deep, collaborative engineering effort between Red Hat and NVIDIA. Simply put, the teams worked together to ensure the software and hardware components were tuned for maximum synergy.

Red Hat is responsible for the enterprise Linux platform and the software stack on which AI services are deployed. NVIDIA is responsible for the hardware infrastructure and optimized computing libraries. When these two layers are designed in tandem, rather than separately, the benchmark results are fundamentally different – and this is precisely what the v6.0 figures confirmed.

This approach isn't just about getting a nice entry in the results table. For companies deploying AI in a production environment, it sends a signal: the Red Hat + NVIDIA combination was tested and optimized not in isolation, but in the exact configuration that can be replicated in a real-world infrastructure.

Key Performance Results from MLPerf Inference v6.0

What the Numbers Say

The results were recorded across several categories – for throughput and latency, on various models. In tasks related to language models and reasoning, as well as speech and image recognition, the partners demonstrated leading performance among the published participants.

The performance on Llama 3.1 405B deserves special attention. This model requires processing hundreds of billions of parameters, and even on flagship hardware, achieving both a fast first-response time and high throughput simultaneously is a non-trivial task. Nevertheless, the results on this model were among the best of all who published official data for this benchmark.

The broader impact of MLPerf benchmarks on AI adoption

Why This Matters Beyond the Scoreboard

MLPerf is more than just a competition. It's a way for the industry to agree on a common language for evaluation. When different teams publish results according to the same rules, customers and developers can compare platforms without marketing distortions.

Red Hat's participation in this round is also notable because the enterprise Linux environment has historically been seen as a neutral foundation rather than an active participant in the race for AI performance. The joint results with NVIDIA are changing this picture: the software stack is becoming as significant a factor as the hardware.

This is especially relevant in the context of growing interest in open models like Llama. Companies are increasingly deploying them on their own, rather than through cloud APIs. And in this case, the question of how efficiently a specific software-hardware stack handles the load becomes very practical – it directly impacts the cost of operation.

Limitations and Scope of MLPerf Benchmarking

What's Left Out of the Picture?

It's worth noting: MLPerf measures performance under strictly defined conditions and on specific models. Real-world usage scenarios are more diverse: they can involve mixed workloads, non-standard configurations, and additional security and reliability requirements. A benchmark is a good guide, but not a universal guarantee.

Nevertheless, publishing official results in MLPerf is a deliberate step towards transparency. And the fact that Red Hat and NVIDIA did it jointly speaks to the serious level of engineering integration between the two platforms.

Original Title: Red Hat and NVIDIA: Setting standards for high-performance AI inference
Publication Date: Apr 2, 2026
Red Hat www.redhat.com Global company developing open software platforms and infrastructure solutions with AI support.
Previous Article Google Vids: Free AI Video and Music Generation – What's New in the Editor Next Article Stop Teaching Everything at Once: Why AI Models Perform Better When Trained for Specific Tasks

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!

Subscribe