Published on January 28, 2026

Claude Taught to Write CUDA Kernels and Train Open Models

Anthropic has enhanced Claude's capabilities in handling low-level code and transferring knowledge to other models through its new «Extended Thinking» feature.

Development / Technical context 4 – 5 minutes min read

Event Source: Hugging Face 4 – 5 minutes min read

Anthropic has added the Extended Thinking feature to Claude – a mode in which the model can reflect in detail before answering. The Hugging Face team decided to test how this works in practice and asked Claude to write CUDA kernels, and, at the same time, to teach open models to do the same.

What Is Extended Thinking and Why Is It Needed?

Extended Thinking is a mode of operation where the model doesn't give an answer immediately but first «thinks out loud». It shows intermediate reasoning, tests hypotheses, and returns to the beginning if something goes wrong. Simply put, it is similar to how a human solves a complex problem: not on the first try, but through several approaches.

This approach is particularly useful for tasks where precision is required and where an error is costly. For example, when writing low-level code that works with the GPU. There, every detail matters, and incorrect indexing can break everything.

CUDA Kernels: When Python Falls Short

CUDA is Nvidia's technology for GPU programming. While regular Python code runs on the CPU, CUDA kernels work on the video card, where calculations can be parallelized across thousands of threads. This is critical for training and running neural networks.

Writing CUDA kernels is difficult. You have to manage memory manually, monitor thread synchronization, and understand GPU architecture. Errors here aren't always obvious: code might compile but work incorrectly or slowly.

The Hugging Face team asked Claude with Extended Thinking to write several CUDA kernels: for matrix multiplication, convolution, and Softmax. The result turned out to be functional. The model not only generated code but also explained its decisions, pointed out possible bottlenecks, and suggested optimizations.

How Claude Teaches Other Models

The researchers went even further. They decided to test whether Claude could transfer its knowledge to open models that initially don't know how to write CUDA code.

The idea is simple: let Claude with Extended Thinking generate examples of problems and solutions, and then another model trains on this data. This approach is called knowledge distillation. The large model acts as the teacher, and the small one as the student.

For the experiment, they took Qwen 2.5 Coder – an open model tailored for writing code. It was fine-tuned on synthetic data generated by Claude. After this, Qwen became better at handling CUDA tasks, although it originally lacked such skills.

Why This Is Important

Usually, labeled data is needed to train models. It is collected manually, specialists are hired, and time and money are spent. If a model can generate training examples itself and they are of sufficiently high quality, the process is greatly simplified.

This is especially useful for niche fields where data is scarce. CUDA programming is just such a field. There are almost no open datasets, and there are few specialists who can write effective code for GPUs.

Extended Thinking helps improve the quality of synthetic data. The model doesn't just give an answer but shows how it arrived at it. This makes the training examples more diverse and substantive.

What Questions Remain

The approach works, but there are nuances. First, Extended Thinking is slow and expensive. The model spends more tokens on reasoning, which means the cost of requests increases.

Second, the quality of synthetic data still depends on the teacher. If Claude makes a mistake or gives a suboptimal solution, the error will pass into the training set and then into the student model.

Finally, it is unclear how well this method scales. Does it work for other programming languages, for other types of tasks? So far, there are few examples, and they are all from one field.

Nevertheless, the direction looks promising. If a model can not only solve problems but also teach others to do so, this opens up new possibilities for creating specialized tools without large costs for data collection.

#applied analysis #research review #neural networks #ai training #engineering #open technologies #hybrid intelligence #open language models #gpu optimization

Link to Original: https://huggingface.co/blog/upskill

Original Title: We Got Claude to Build CUDA Kernels and teach open models!

Publication Date: Jan 28, 2026

Hugging Face huggingface.co A U.S.-based open platform and company for hosting, training, and sharing AI models.

Previous Article AMD Quark ONNX: Automated Search for Optimal Quantization Strategies Next Article Theorizer: How AI Learns to Formulate Scientific Laws from Thousands of Papers

Claude Taught to Write CUDA Kernels and Train Open Models

What Is Extended Thinking and Why Is It Needed?

CUDA Kernels: When Python Falls Short

How Claude Teaches Other Models

Why This Is Important

What Questions Remain

Related Publications

How LinkedIn Trained Its Code-Generating GPT-OSS Using Agentic Reinforcement Learning

Open Coding Agents: AI Code Assistants That Work With Any Repository

How Mistral AI Found a Memory Leak in vLLM – And Why It Wasn't Where They Were Looking

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration