Published on March 25, 2026

How One Tool United Slurm and Kubernetes for AI

How One Tool United Two AI Infrastructures

French AI startup H Company explains how they used SkyPilot to merge incompatible compute management systems into a single workflow.

Infrastructure / Technical context 5 – 7 minutes min read

Event Source: H Company 5 – 7 minutes min read

If you've been following the development of large AI models even a little, you've probably heard two names: Slurm and Kubernetes. These are two different tools for managing computing resources, and historically, they have existed in entirely separate worlds.

Slurm is a system from the world of supercomputers. It first appeared back in 2002 and is still used on approximately 65% of the world's most powerful computing clusters. Its philosophy is simple: there's a queue of tasks, there's hardware, and tasks are executed in order. Wait for your turn and run at full capacity.

Kubernetes came later; it was developed by Google and has become the standard in the world of cloud services. It doesn't manage individual tasks, but entire applications: it ensures that the necessary components are always running, scales them to handle load, and restarts them in case of failures.

The problem is that training modern AI models requires both. Training a large model is a job for Slurm: you need to exclusively occupy several hundred GPUs for hours or even days. But online reinforcement learning is a whole different story.

What Is Online Reinforcement Learning and Its Challenges

What Is Online Reinforcement Learning and Why Is It Difficult?

In short: classic model training is like studying from a textbook. There's a dataset, the model trains on it, and everything is straightforward. Reinforcement learning is more like training with a live partner: the model does something, receives feedback, adjusts its behavior, and tries again.

In online mode, this process is continuous: one model generates responses, another model (or a set of rules) evaluates them, and the results are immediately used for the next training step. All of this happens in parallel and simultaneously.

This is precisely where the infrastructure headache begins. Generating responses requires one resource configuration, evaluating them requires another, and the training itself requires a third. And all of this must work in concert, within a single pipeline, without downtime.

Previously, teams solved this problem manually: some processes ran on Slurm, others on Kubernetes, and the whole setup was held together by custom scripts and constant supervision from engineers.

SkyPilot Unifies Slurm and Kubernetes Management

SkyPilot as a Common Language for Two Systems

H Company, a French AI startup, faced this problem directly while developing its models. And they found a solution with a tool called SkyPilot.

Simply put, SkyPilot is a layer that knows how to speak to both Slurm and Kubernetes in their native languages, while presenting a single interface to the researcher. You describe the task once, and SkyPilot figures out where and how to send it.

For H Company, this meant they could run the entire online reinforcement learning cycle as a single, coherent process, without needing to manually coordinate two different clusters.

Generation, evaluation, and model weight updates now operate within a single pipeline. Each component gets exactly the resources it needs – no more, no less.

Why This Integration Is Crucial for AI Development

Why This Is More Important Than It Seems at First Glance

Online reinforcement learning is one of the key methods that make modern language models useful. It's thanks to this method that models learn to provide more accurate, safe, and relevant responses. But for a long time, this method remained difficult to implement in production precisely because of infrastructure limitations.

When response generation and their training evaluation live in separate systems, delays, desynchronization, and data loss are inevitable. This isn't just an inconvenience; it directly affects the quality of training.

H Company has shown that merging these processes into a single, manageable stream yields real benefits: scalability increases, while the operational load on the team decreases. Instead of monitoring two different systems, engineers work with just one.

Industry Impact of Unified AI Infrastructure

What This Means for the Industry

The story of H Company is not just a tale about one startup and its infrastructure solutions. It's a symptom of a broader shift.

The line between the HPC world (supercomputers, Slurm, physical clusters) and the cloud-native world (Kubernetes, containers, elastic scaling) is rapidly blurring. As experts from zenml.io note, the emergence of large language models and generative AI has been the main driver behind this collision of two worlds.

Previously, the team that trained models and the team that deployed them could live in different technological universes. Now, that's just inconvenient and inefficient.

Tools like SkyPilot are an attempt to build bridges between these universes without having to completely abandon one in favor of the other. Not to replace Slurm with Kubernetes or vice versa, but to teach them to work together.

Remaining Questions and Limitations of the Approach

Open Questions Remain

For all the appeal of this approach, it has its limitations and uncertainties.

Any abstraction is a compromise. When one tool manages two different systems, some flexibility and control are inevitably lost. In some scenarios, this is acceptable; in others, it's not.

Furthermore, online reinforcement learning at a production scale is still relatively new territory. H Company is one of the few teams to publicly share their experience. How this approach will perform with further scaling, model changes, or when transitioning to fundamentally new architectures, only time will tell.

But the very fact that the company managed to bring online reinforcement learning into real production using existing tools, without reinventing the wheel, is a significant achievement in itself. Sometimes, progress doesn't look like a new technology, but a new way to make old ones work together.

#applied analysis #technical context #ai development #ai training #engineering #infrastructure #model integration #ai-infrastructure integration #online reinforcement learning

Link to Original: https://hcompany.ai/unlocking-online-rl-skypilot

Original Title: SkyPilot

Publication Date: Mar 24, 2026

H Company hcompany.ai French AI company developing intelligent agents and models for automating complex digital and business processes.

Previous Article How Voice AI Knows When You've Finished Speaking – and Why It's More Important Than You Think Next Article JetBrains Central: When AI Agents Become Too Many for Manual Control

How One Tool United Slurm and Kubernetes for AI

What Is Online Reinforcement Learning and Its Challenges

SkyPilot Unifies Slurm and Kubernetes Management

Why This Integration Is Crucial for AI Development

Industry Impact of Unified AI Infrastructure

Remaining Questions and Limitations of the Approach

Related Publications

Zero Bubbles and Flexible Pipelines: How AMD Accelerates Large Language Model Training

AMD Shows How to Train Large Models Without the Fear of Losing Progress to a Single Crash

How to Train Large Language Models Without Constantly Babysitting the Terminal

From Source to Analysis

Neural Networks Involved in the Process

1. Analyzing the Original Publication and Writing the Text

2. step.translate-en.title

3. Text Review and Editing

4. Preparing the Illustration Description

5. Creating the Illustration