Published February 21, 2026

DeepSeek on New NVIDIA Hardware: What's Changed for Long-Text Processing

NVIDIA and LMSYS tested the DeepSeek model on the latest GB300 accelerator. The results showed a significant improvement compared to the previous generation.

Technical context Infrastructure
Event Source: LMSYS ORG Reading Time: 4 – 6 minutes

When people discuss large language models, the conversation usually revolves around how 'smart' their answers are. However, for those who use these models in their work, another question arises: how fast and cost-effective are they? This is particularly relevant when processing long texts – such as large documents, lengthy dialogues, and complex tasks with contexts spanning thousands of words.

This very question prompted a new study from the LMSYS team, which tested the DeepSeek model on the new NVIDIA GB300 NVL72 accelerator. The results were significant enough to warrant sharing.

Long Context Is Its Own Challenge

In short, the longer the text a model processes, the more memory and computational resources it requires. This isn't just 'a little more' – the relationship is non-linear. When processing long sequences, the model must hold vast amounts of intermediate data in memory, and this is where standard configurations begin to struggle.

Simply put, if you want a model to read an entire book or a large technical document and answer questions about it, the workload is fundamentally different from that of answering a short query.

DeepSeek is an interesting model in this regard: it supports a very large context window, which makes it particularly attractive for such scenarios. But for this window to be truly practical, it requires the right hardware.

GB300 NVL72 What It Is and Why It Matters

GB300 NVL72: What It Is and Why It Matters

The NVIDIA GB300 NVL72 is the latest accelerator configuration designed for large-scale inference tasks (that is, running already-trained models, not training them). The main difference from the previous generation is a significantly larger amount of memory and faster memory performance.

For long contexts, this is critical: memory capacity and bandwidth are most often the bottleneck. The GB300 NVL72 alleviates some of these constraints.

In their study, LMSYS compared DeepSeek's performance on the GB300 NVL72 with the previous generation, the H100 NVL8. This is a fair comparison, as the H100 is a widely used configuration that many are currently relying on.

What the Tests Showed

The results were notable in several areas.

First, generation speed on long contexts increased significantly. For short queries, the difference between hardware generations is usually not as dramatic. However, the longer the context, the more the GB300 pulls ahead. This is exactly the kind of situation where new hardware solves a real problem, rather than just adding percentage points to a benchmark.

Second, the system's throughput – that is, how many requests it can handle simultaneously – also increased. This is important for practical deployment: if a model handles each request faster, it can serve more users in parallel.

Third, the researchers noted improvements in the so-called prefill stage – the phase where the model 'reads' the input text before starting to generate a response. For long contexts, this stage can consume a significant amount of time, and this is where the GB300 showed a particularly noticeable boost.

Why It's Not Just About Speed

Speed is convenient, but behind it lies something more practical: cost.

When a model runs faster and processes more requests on the same hardware, the cost per query decreases. For services that handle large volumes of text – legal documents, medical records, code, long support dialogues – this translates to direct savings.

Furthermore, a long context enables scenarios that were previously unrealistic in real time. For example, analyzing a large contract with an immediate response or an agent-based system that maintains a long history of interactions without losing context.

A Few Nuances to Consider

The results look convincing, but there is some important context to keep in mind.

The GB300 NVL72 is very expensive and not yet widely available hardware. Most companies are currently working with H100 or earlier configurations. So, this is more about the future outlook than a sign that everyone will be switching to the new infrastructure tomorrow.

It's also worth noting that the tests were conducted under specific conditions – on a particular model (DeepSeek) and in a specific configuration. How applicable these results are to other models and other workloads is a separate question that will require further investigation.

Finally, the very fact that LMSYS and NVIDIA are publishing these results is more than just a technical report. It's part of a broader conversation about how the industry will handle the growing demands for long contexts. The demand for this is increasing: models are getting smarter, tasks are becoming more complex, and documents are getting longer.

Conclusion Hardware Is Catching Up to Model Ambitions

Conclusion: Hardware Is Catching Up to Model Ambitions

For a long time, there has been a somewhat paradoxical situation: models were theoretically capable of handling very long texts, but in practice, it was too slow or too expensive to be truly viable.

The GB300 NVL72 takes a step toward closing this gap. It's not a complete solution, and it's not for everyone just yet, but the direction is clear. Long context is ceasing to be an exotic feature and is gradually becoming a standard that can be supported by real-world infrastructure with reasonable performance.

For those building products on top of language models, this is a positive signal: scenarios that seemed premature just a year ago are now becoming technically feasible.

Original Title: Deploying DeepSeek on GB300 NVL72: Big Wins in Long-Context Inference
Publication Date: Feb 19, 2026
LMSYS ORG lmsys.org A U.S.-based non-profit research organization studying scalable language models and distributed training systems.
Previous Article GGML and llama.cpp Join Hugging Face: What This Means for Local AI Next Article OpenHands Index: How Developers Are Improving the Evaluation of AI Coding Agents

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!

Subscribe