When people talk about training the most advanced AI models, an image immediately comes to mind: huge data centers, thousands of GPUs, and multi-billion-dollar investments. This very notion has become something of an axiom in the industry: if you want to create powerful models, you must build a large cluster.
Where did the idea of megaclusters come from?
The basic logic in recent years has been something like this: to train a better model, you need more data and more computation. This rule worked well in the era of so-called pre-training, when models were drilled on vast amounts of text, and quality improvements were directly linked to scale.
This is when the culture of megaclusters was formed. The largest companies began competing not only in the quality of their models but also in the size of their computing infrastructure. Thousands, and even tens of thousands, of GPUs came to be seen as a prerequisite for being on the cutting edge.
But the situation is changing – and it's changing right now.
Reinforcement Learning (RL) – Another Way to Make Models Smarter
If pre-training is when a model reads vast amounts of text and learns to predict the next word, then reinforcement learning (RL) is something different. Simply put, the model tries to do something, receives feedback – right or wrong – and gradually learns to perform better.
This is precisely how modern 'thinking' models work – the ones that can reason, self-correct, and break down tasks into steps. And this approach has fundamentally different computational requirements.
The key point is this: RL doesn't require the same scale as pre-training. Tasks are solved iteratively – in small sessions with frequent updates to the model's weights. This means that even a relatively small cluster can participate in cutting-edge training, provided its infrastructure is properly configured.
But There's a Catch: The Infrastructure Must Be Different
This is where it gets interesting. Fireworks AI points out that standard large clusters – for all their power – are not well-suited for RL training. The reason lies in the workload architecture.
In pre-training, everything is quite uniform: data is loaded, the model computes, and weights are updated. With RL, the picture is different: the model spends part of its time generating responses (a relatively light load) and part of its time updating based on feedback (a heavy load). These phases alternate, and if the cluster can't switch flexibly between them, expensive GPUs simply sit idle for much of the time.
Simply put, a large cluster purchased for pre-training will operate at low efficiency for RL tasks – while still costing as much as a large cluster.
What Does This Change in Practice?
If RL training truly becomes the primary method for developing frontier models (and the trend points in this direction – just look at the success of models like DeepSeek R1 or OpenAI's series of 'thinking' models), then it changes the economics of the entire industry.
First, the barrier to entry is lowered. A team without the resources to build a giant data center can still train powerful models – if they properly organize their computational process for RL tasks.
Second, the focus shifts from 'hardware' to algorithms. The ability to skillfully structure the reinforcement learning process – selecting tasks, correctly evaluating model responses, managing computational phases – becomes more important than simply having a lot of GPUs.
Third, it changes how we should think about investment. Building a megacluster for the sake of RL is not the best idea. It's far more effective to have a flexible infrastructure that can dynamically allocate workloads between the generation and update phases.
This Doesn't Mean Large Clusters Are Dead
It's important to clarify: this isn't to say that scale is no longer needed. Pre-training hasn't gone away, and large clusters still make sense for it. And RL tasks themselves can also be scaled if desired.
But Fireworks AI's thesis is different: if you want to be on the cutting edge, specifically in terms of reasoning and agentic capabilities, you don't necessarily need to build a megacluster. It is an expensive and not the most efficient solution for this type of task.
In other words, the industry is beginning to bifurcate. The race for the 'biggest' cluster is one story. The ability to efficiently train models with reinforcement learning is another. And the second one, it seems, is becoming increasingly important.
Why Is This Important to Know?
If you're following the developments in the AI market, this idea challenges several established notions.
First: 'The best AI belongs to whoever spent the most on hardware' is an oversimplification that's ceasing to be true. Training strategy and computational architecture are starting to play a comparable role.
Second: small and medium-sized teams are getting a real chance to compete in certain niches – not because they've suddenly become rich, but because the rules of the game are changing.
Third: the expected market 'consolidation' around the five largest players with the biggest clusters is not as certain a scenario as it seemed just a couple of years ago.
Of course, this idea has its limitations. Frontier RL is still complex and expensive, just not to the same extent as pre-training at the same scale. And the question of how far one can go without a high-quality pre-trained foundation remains open.
But on the whole, this is one of those ideas worth keeping in mind as we watch events unfold in the AI industry in the near future.