Published on April 3, 2026

Red Hat and NVIDIA Achieve Record AI Performance in MLPerf Inference Benchmarks

Red Hat and NVIDIA Show Record-Breaking Results in AI Performance Tests

Red Hat and NVIDIA have jointly achieved leading results in the independent MLPerf Inference v6.0 test, which covers image recognition, speech, and reasoning tasks.

Infrastructure 4 – 6 minutes min read

Event Source: Red Hat 4 – 6 minutes min read

There's an industry-standard benchmark for AI systems – MLPerf Inference. This is an independent set of tasks used to measure how quickly and efficiently a given hardware platform handles real-world usage scenarios. The benchmark doesn't evaluate synthetic workloads, but rather tasks that real systems encounter: image recognition, speech processing, and working with large language models. The results are published openly, and companies rely on them when choosing infrastructure for deploying AI.

In the latest round – MLPerf Inference v6.0 – Red Hat and NVIDIA collaborated and demonstrated some of the best results across several categories.

Why Is AI Performance Testing Essential?

Why Is Such a Test Even Needed?

As long as AI remains something abstract, no one really questions how it works «under the hood.» But as soon as it comes to real-world deployment – in the cloud, in a corporate environment, or in production – very specific requirements immediately arise: how many requests the system processes per second, how quickly it delivers the first response, and how stably it performs under load.

MLPerf is designed precisely to provide comparable and verifiable answers to these questions. The benchmark covers several scenarios: a model can be run in maximum throughput mode (how many requests it can process per unit of time) or in a mode with strict latency constraints (as in real-world applications where a user expects an immediate response).

What AI Models and Tasks Were Evaluated?

What Exactly Was Tested?

In this round, the task set covered several areas. First, vision – image classification and object detection tasks. Second, speech – automatic recognition and audio transcription. And, perhaps the most interesting area today, reasoning: this includes large language models, particularly Llama 3.1 405B, one of the most demanding open models available today.

Llama 3.1 405B became one of the main challenges of the round: the MLPerf organizers added it specifically to assess how platforms handle models that require a colossal amount of computation for each generated token.

How Collaboration Drives Superior AI Performance

Collaboration as a Key to Results

The unique aspect of this participation wasn't just running a pre-built stack on powerful hardware, but a deep, collaborative engineering effort between Red Hat and NVIDIA. Simply put, the teams worked together to ensure the software and hardware components were tuned for maximum synergy.

Red Hat is responsible for the enterprise Linux platform and the software stack on which AI services are deployed. NVIDIA is responsible for the hardware infrastructure and optimized computing libraries. When these two layers are designed in tandem, rather than separately, the benchmark results are fundamentally different – and this is precisely what the v6.0 figures confirmed.

This approach isn't just about getting a nice entry in the results table. For companies deploying AI in a production environment, it sends a signal: the Red Hat + NVIDIA combination was tested and optimized not in isolation, but in the exact configuration that can be replicated in a real-world infrastructure.

Key Performance Results from MLPerf Inference v6.0

What the Numbers Say

The results were recorded across several categories – for throughput and latency, on various models. In tasks related to language models and reasoning, as well as speech and image recognition, the partners demonstrated leading performance among the published participants.

The performance on Llama 3.1 405B deserves special attention. This model requires processing hundreds of billions of parameters, and even on flagship hardware, achieving both a fast first-response time and high throughput simultaneously is a non-trivial task. Nevertheless, the results on this model were among the best of all who published official data for this benchmark.

The broader impact of MLPerf benchmarks on AI adoption

Why This Matters Beyond the Scoreboard

MLPerf is more than just a competition. It's a way for the industry to agree on a common language for evaluation. When different teams publish results according to the same rules, customers and developers can compare platforms without marketing distortions.

Red Hat's participation in this round is also notable because the enterprise Linux environment has historically been seen as a neutral foundation rather than an active participant in the race for AI performance. The joint results with NVIDIA are changing this picture: the software stack is becoming as significant a factor as the hardware.

This is especially relevant in the context of growing interest in open models like Llama. Companies are increasingly deploying them on their own, rather than through cloud APIs. And in this case, the question of how efficiently a specific software-hardware stack handles the load becomes very practical – it directly impacts the cost of operation.

Limitations and Scope of MLPerf Benchmarking

What's Left Out of the Picture?

It's worth noting: MLPerf measures performance under strictly defined conditions and on specific models. Real-world usage scenarios are more diverse: they can involve mixed workloads, non-standard configurations, and additional security and reliability requirements. A benchmark is a good guide, but not a universal guarantee.

Nevertheless, publishing official results in MLPerf is a deliberate step towards transparency. And the fact that Red Hat and NVIDIA did it jointly speaks to the serious level of engineering integration between the two platforms.

#event #systemic analysis #ai development #engineering #computer systems #infrastructure #ai benchmarks #ai ecosystems #open language models #inference optimization #continuous benchmarking

Link to Original: https://www.redhat.com/en/blog/red-hat-and-nvidia-setting-standards-high-performance-ai-inference

Original Title: Red Hat and NVIDIA: Setting standards for high-performance AI inference

Publication Date: Apr 2, 2026

Red Hat www.redhat.com Global company developing open software platforms and infrastructure solutions with AI support.

Previous Article Google Vids: Free AI Video and Music Generation – What's New in the Editor Next Article Stop Teaching Everything at Once: Why AI Models Perform Better When Trained for Specific Tasks

Red Hat and NVIDIA Achieve Record AI Performance in MLPerf Inference Benchmarks

Why Is AI Performance Testing Essential?

What AI Models and Tasks Were Evaluated?

How Collaboration Drives Superior AI Performance

Key Performance Results from MLPerf Inference v6.0

The broader impact of MLPerf benchmarks on AI adoption

Limitations and Scope of MLPerf Benchmarking

Related Publications

Red Hat and NVIDIA: Nemotron Models Available in AI Factory from Day One

A Thousand GPUs, One Cluster, and an Award for Best Cloud Solution: How SK Telecom Built «Haein»

GGML and llama.cpp Join Hugging Face: What This Means for Local AI

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration