Published February 13, 2026

AI Agents for Generating GPU Code and CUDA Kernels

AI Agents Write CUDA Kernels: GPT and Claude Learn to Generate GPU Code

Two AI agents can create optimized CUDA kernels to speed up operations straight from a task description. Let's dive into what this means for people working with models.

Technical context Development
Event Source: Hugging Face Reading Time: 6 – 8 minutes

Typically, if you want to accelerate data processing on a GPU, you either need to use pre-built libraries or write low-level CUDA code yourself. The second option requires serious expertise: you need to understand the GPU architecture, manage memory, and handle thread synchronization. This is a specialized profession, and not every model developer knows how to do it.

Now, there's another way: describe the task in natural language, and an AI agent will generate an optimized CUDA kernel for you. Two such agents are already available – one uses GPT-4o, the other Claude 3.5 Sonnet. Both are integrated into the Hugging Face ecosystem and accessible through the Transformers interface.

Что такое CUDA-кернел и зачем его писать

What is a CUDA Kernel and Why Write One

When you work with a neural network, most of the computations happen on the GPU. Libraries like PyTorch or cuDNN provide ready-made operations: matrix multiplications, convolutions, activations. They work fast, but they are general-purpose. If you have a specific task – for example, you need to combine several operations into one or implement a non-standard function – pre-built blocks can be inefficient.

In such cases, you write your own CUDA kernel – a function that runs directly on the GPU and does exactly what you need. This can provide a significant speed boost, especially if the operation is repeated frequently. But writing such code requires a deep understanding of the hardware and the C++ language.

Как работают AI-агенты

How the Agents Work

Both agents are structured similarly. You describe the operation in text – for example, “implement a LayerNorm layer with GELU activation.” The agent analyzes the request, generates CUDA code, compiles it, and returns a ready-to-use function that can be called from Python.

Internally, the process looks like this: the agent first generates the kernel code, then tries to compile it. If compilation fails or the results are incorrect, the agent receives an error message and attempts to fix the code. This is an iterative process – the agent may make several attempts until it gets a working version.

At their core are language models: GPT-4o for one agent, Claude 3.5 Sonnet for the other. Both models can generate code and work with technical descriptions, but their approaches differ slightly. Claude shows more stable results on tasks involving matrix operations, while GPT-4o is sometimes faster at handling non-standard requests.

Что уже можно сделать с помощью агентов

What You Can Already Do

The agents can create kernels for basic operations: normalization, activations, element-wise transformations, and simple matrix operations. They handle typical tasks encountered when working with transformers or convolutional networks.

For example, you can ask it to implement a custom activation function that isn't in the standard library, or combine several operations into one to avoid extra memory accesses. The agent will generate code that does exactly that, and you can use it like a regular function in PyTorch.

Important: the agents do not replace manual optimization. The code they generate works, but it's not always maximally efficient. If you need performance on par with industrial-grade libraries, you'll likely need to refine it. But for prototyping, experiments, or tasks where speed is not critical, it's a perfectly viable tool.

Ограничения AI-агентов

What Are the Limitations

First – reliability. The agent might generate code with errors, especially if the task is vaguely formulated or requires non-trivial logic. Sometimes, it can't compile the result even after several attempts. In such cases, you have to either clarify the request or fix the code manually.

Second – performance. The agent doesn't know all the intricacies of a specific GPU's architecture. It might miss optimization opportunities, for example, by not using shared memory effectively or failing to account for data alignment. The generated code usually runs slower than what an experienced CUDA programmer would produce.

Third – task complexity. The agents handle relatively simple operations. If you need to implement a complex algorithm with non-trivial thread management or a multi-level memory hierarchy, the agent will likely fail without significant assistance.

Кому будут полезны AI-агенты

Who Might Find This Useful

First and foremost – researchers and model developers who don't specialize in low-level programming. If you're working with PyTorch and need to quickly test an idea that requires a non-standard operation, an agent can save you time. Instead of studying CUDA or searching for a pre-existing implementation, you just describe the task and get working code.

It's also useful for learning. You can see how the agent implements a particular operation and use that as a starting point to understand CUDA mechanics. Of course, the generated code isn't always perfect, but it can show you the basic structure and logic.

For tasks where maximum performance is crucial – like in production or when training large models – agents don't yet replace manual work. But they can speed up prototyping and lower the entry barrier for those who haven't worked with GPU programming before.

Значение AI-агентов в более широком контексте

What This Means in a Broader Context

This is another example of how language models are starting to assist with technical tasks that previously required narrow specialization. We've already seen AI assistants write Python code, generate SQL queries, and help with debugging. Now, they've reached low-level programming.

Of course, this doesn't mean CUDA programmers are no longer needed. Auto-generated code doesn't yet match the quality of professionally written kernels. But tools like these can change how tasks are distributed: routine operations are delegated to the agent, while specialists focus on truly complex optimization.

Another point is accessibility. Previously, creating custom kernels was the domain of a small group of developers. Now, it's becoming more accessible. While the results won't always be optimal, the barrier to entry is lowered. This can speed up experiments and allow more people to try non-standard approaches.

Стоит ли попробовать AI-агенты для CUDA-кернелов?

Is It Worth a Try?

If you work with models and encounter situations where pre-built operations aren't suitable, it's worth a try. Both agents are available through Transformers and are easy to run. Don't expect perfect results on the first try, but for rapid prototyping, it's a perfectly viable option.

If you're just starting to get the hang of GPU programming, the agents can help you understand the basic principles. You'll see how CUDA kernels are structured and can experiment with different operations without having to immediately dive into hundreds of pages of documentation.

For tasks where performance is critical, agents won't replace manual work just yet. But they can be useful during the research phase, when iteration speed is more important than the absolute efficiency of the code.

Original Title: Custom Kernels for All from Codex and Claude
Publication Date: Feb 13, 2026
Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.
Previous Article MiniMax Introduces Forge: A Platform for Training AI Agents on Powerful Computing Clusters Next Article Olmix: Allen AI's Approach to Data Mixing Across All Stages of Language Model Training

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.5 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.5 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!

Subscribe