Published on March 5, 2026

Helion Autotuning for ML Code Explained

How Helion Automatically Tunes Itself for a Task

Helion, a DSL for writing fast ML kernels, has gained a new automatic tuning mechanism based on Bayesian optimization that saves developers' time.

Development 5 – 7 minutes min read
Event Source: PyTorch 5 – 7 minutes min read

When you write machine learning code, sooner or later you run into performance issues. The algorithm is correct, the results are accurate, but it runs slowly. And that's when manual tuning begins: iterating through parameters, taking measurements, and iterating again. It's tedious, time-consuming, and there's never a guarantee that you'll find the best option.

This is precisely the problem Helion, a tool from PyTorch, is designed to solve. It allows developers to write high-performance compute kernels for ML in a familiar style, similar to regular Python code. Helion handles the complex part: it automatically generates optimized low-level code for specific hardware. However, selecting the internal parameters for this optimization – a process known as autotuning – still used to take time. Now, this process has been significantly accelerated.

What Is Autotuning and Why Is It Needed

What Is Autotuning and Why Is It Needed?

Simply put, autotuning is the automatic search for the best settings to perform a specific task on specific hardware. Imagine you have a recipe, but you don't know the exact cooking time and temperature – they depend on your particular oven. You have to try different options and see which one works best.

In the world of ML kernels, there can be a vast number of such “settings”: data block sizes, computation order, and ways of using GPU memory. Each combination yields a different speed. It's impossible to go through all the options manually – there can be thousands of them.

Previously, Helion used a simple random search: it took random combinations of parameters, tested them, and picked the best one. This worked, but it was inefficient – good options could be found late in the process, or not at all within a reasonable timeframe.

Bayesian Optimization Smart Search for Autotuning

Bayesian Optimization: Smart Search Instead of Random

The new approach is Bayesian optimization. It sounds complex, but the idea is surprisingly intuitive.

Imagine you're looking for the best café in an unfamiliar city. The random method is to walk into the first place you come across. The smart method is to look at reviews, location, and cuisine type, and make an educated guess: “This looks like a good spot, I'll try it first.” Then, based on the experience of each visit, you refine your assumptions.

Bayesian optimization works exactly like this. After each measurement, it updates its internal “map” of the parameter space: where it has already looked, what it has found, and where it should look next. The next option to test is not chosen randomly but deliberately – where the probability of finding something better is highest.

This allows it to find good configurations in significantly fewer attempts. There's no need to check thousands of combinations – a few dozen smart steps are enough.

Autotuning Speed Comparison How Much Faster

In Practice: How Much Faster?

According to tests conducted by the Helion team on real-world tasks, the new approach finds near-optimal settings approximately 5–10 times faster than random search. That means if finding good parameters used to take, for example, an hour or two, it now takes just minutes.

This is especially important in situations where you need to quickly adapt a kernel to a new task or new hardware. Running a long search every time is impractical. But a couple of minutes of smart search is quite acceptable.

Challenges of Helion Autotuning

Why It's Not as Simple as It Seems

There's a subtlety here worth understanding. The parameter space in Helion is not numerical in the classic sense. It's not a matter of “pick a temperature between 150 and 250 degrees.” There are categorical parameters (e.g., choosing from several fixed options), conditional dependencies (the value of one parameter affects the permissible values of another), and parameters that only work in specific combinations.

Classical Bayesian optimization methods struggle with this type of space. That's why a special version was implemented in Helion, one that can work with these “awkward” spaces. This required dedicated engineering work – it wasn't possible to just take an off-the-shelf library and plug it in.

Impact of Faster Autotuning for ML Developers

What This Means for Those Who Write ML Code

If you develop models or work with high-performance computing on GPUs, faster autotuning is a direct time-saver. You wait less, iterate faster, and get results quicker.

But there's a broader meaning as well. Helion was originally conceived as a tool to lower the barrier to entry for writing high-performance kernels. This used to require deep knowledge of GPU architecture and the intricacies of compilers. Helion allows you to write in a familiar style while achieving performance comparable to manual optimization. Accelerated autotuning reinforces this idea: now, not only the writing but also the tuning of the kernel happens quickly and effortlessly.

This is a step towards allowing developers to focus on the task at hand, rather than on how to make the hardware work efficiently. Helion takes on this responsibility – and now does it significantly faster.

Helion Autotuning Open Questions and Future

Open Questions

To be fair, Bayesian optimization is not a silver bullet. It works well when the search space is sufficiently structured and when each measurement takes a relatively short amount of time. In some edge cases, a random search or grid search might be simpler and not significantly worse.

Furthermore, how well the new approach generalizes to entirely new types of tasks or exotic hardware remains to be seen. The test results are promising, but real-world use is always richer than any benchmark.

Nevertheless, the direction seems sensible. Instead of going through thousands of options by trial and error, the system learns from its own attempts and gets smarter with each step. This is a good engineering philosophy – and it's gratifying to see it in action in such a concrete and practical tool as Helion.

Original Title: Accelerating Autotuning in Helion with Bayesian Optimization
Publication Date: Feb 24, 2026
PyTorch pytorch.org An international open-source deep learning framework and community widely used for research and development in artificial intelligence and machine learning.
Previous Article When AI Meets the Humanities: What's Happening in University Labs Next Article How AI Learns to Improve Its Own Code: An Experiment in Self-Optimization

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!

Subscribe