Anthropic has added the Extended Thinking feature to Claude – a mode in which the model can reflect in detail before answering. The Hugging Face team decided to test how this works in practice and asked Claude to write CUDA kernels, and, at the same time, to teach open models to do the same.
Что такое Extended Thinking и зачем он нужен
What Is Extended Thinking and Why Is It Needed?
Extended Thinking is a mode of operation where the model doesn't give an answer immediately but first «thinks out loud». It shows intermediate reasoning, tests hypotheses, and returns to the beginning if something goes wrong. Simply put, it is similar to how a human solves a complex problem: not on the first try, but through several approaches.
This approach is particularly useful for tasks where precision is required and where an error is costly. For example, when writing low-level code that works with the GPU. There, every detail matters, and incorrect indexing can break everything.
CUDA ядра когда Python не справляется
CUDA Kernels: When Python Falls Short
CUDA is Nvidia's technology for GPU programming. While regular Python code runs on the CPU, CUDA kernels work on the video card, where calculations can be parallelized across thousands of threads. This is critical for training and running neural networks.
Writing CUDA kernels is difficult. You have to manage memory manually, monitor thread synchronization, and understand GPU architecture. Errors here aren't always obvious: code might compile but work incorrectly or slowly.
The Hugging Face team asked Claude with Extended Thinking to write several CUDA kernels: for matrix multiplication, convolution, and Softmax. The result turned out to be functional. The model not only generated code but also explained its decisions, pointed out possible bottlenecks, and suggested optimizations.
Как Claude обучает другие модели
How Claude Teaches Other Models
The researchers went even further. They decided to test whether Claude could transfer its knowledge to open models that initially don't know how to write CUDA code.
The idea is simple: let Claude with Extended Thinking generate examples of problems and solutions, and then another model trains on this data. This approach is called knowledge distillation. The large model acts as the teacher, and the small one as the student.
For the experiment, they took Qwen 2.5 Coder – an open model tailored for writing code. It was fine-tuned on synthetic data generated by Claude. After this, Qwen became better at handling CUDA tasks, although it originally lacked such skills.
Почему это важно
Why This Is Important
Usually, labeled data is needed to train models. It is collected manually, specialists are hired, and time and money are spent. If a model can generate training examples itself and they are of sufficiently high quality, the process is greatly simplified.
This is especially useful for niche fields where data is scarce. CUDA programming is just such a field. There are almost no open datasets, and there are few specialists who can write effective code for GPUs.
Extended Thinking helps improve the quality of synthetic data. The model doesn't just give an answer but shows how it arrived at it. This makes the training examples more diverse and substantive.
Какие вопросы остаются
What Questions Remain
The approach works, but there are nuances. First, Extended Thinking is slow and expensive. The model spends more tokens on reasoning, which means the cost of requests increases.
Second, the quality of synthetic data still depends on the teacher. If Claude makes a mistake or gives a suboptimal solution, the error will pass into the training set and then into the student model.
Finally, it is unclear how well this method scales. Does it work for other programming languages, for other types of tasks? So far, there are few examples, and they are all from one field.
Nevertheless, the direction looks promising. If a model can not only solve problems but also teach others to do so, this opens up new possibilities for creating specialized tools without large costs for data collection.