Chinese company MiniMax, known for its developments in generative AI, has released Forge – an open platform for training intelligent agents. Simply put, it's a tool that helps teach models not just to generate text, but to perform tasks, including reasoning, planning actions, and interacting with their environment.
Forge is built around the idea of reinforcement learning – an approach where a model learns by trial and error, receiving feedback for its actions. This is the same principle used to train AlphaGo or ChatGPT in dialogue mode. The key difference here is the emphasis on making this process scalable: running it on hundreds or thousands of graphics processing units (GPUs) simultaneously.
Why a New Platform for AI Agent Training
Why Another Platform?
Training agents is not the same as training a language model in the traditional sense. An agent must not only understand text but also make decisions: which function to call, what request to send, or how to interpret the result. This requires a different approach to training.
Existing solutions are either tailored for small-scale experiments or require significant modifications to run on large clusters. According to its developers, Forge was specifically created to allow agents to be trained on thousands of GPUs without the need to rewrite code or reinvent the wheel for task distribution.
The platform supports popular reinforcement learning algorithms and allows for the integration of custom methods. Its code is open, giving researchers and developers the ability to adapt the system to their specific needs.
Forge Platform: Algorithm and Architecture Explained
What's Inside: Algorithm and Architecture
Along with the platform, MiniMax also released its own training algorithm, also named Forge. It is based on a method similar to PPO – one of the standard approaches in reinforcement learning – but with enhancements that, according to the team, make it more stable and efficient when working with language models.
The core idea is to divide the process into several stages: data collection (the model tries different actions), results evaluation (how well each action performed), and model weight updates. All of this happens in parallel across multiple devices, which can speed up the process tenfold.
Forge supports various types of tasks, from simple text-based ones to complex scenarios where the agent interacts with external systems, databases, or APIs. Developers can define their own reward functions – in other words, describe what constitutes success and what is considered an error.
Forge Open Source Availability
Open Source and Availability
Forge's code has been made publicly available. This means anyone can download the platform, run it on their own servers, and start experimenting. MiniMax has also provided documentation and usage examples, which lowers the barrier to entry.
Openness is a key point. In the field of agent training, there are no established standards yet, and many teams develop their own solutions from scratch. Forge could become a common foundation that allows them to save time and focus on the algorithms themselves, rather than on the infrastructure.
Moreover, the platform is not tied to specific MiniMax models. It can be used with any language models that support the required interaction format.
Who Can Use the Forge Platform
Who Is This For?
First and foremost, it's for research teams and companies developing agents for real-world tasks: process automation, document handling, and user interaction through complex scenarios.
Forge can also be useful for those studying reinforcement learning as it applies to language models. This is an active area of research, and having ready-made infrastructure simplifies conducting experiments.
The platform may also be helpful for teams looking to train models for specific tasks that require not just text generation, but executing a sequence of actions with result verification.
Future of AI Agent Training with Forge
What's Next?
The release of Forge is another step towards agents becoming a practical tool rather than an experimental technology. For now, training such systems remains a complex and resource-intensive process, and not all teams can afford to allocate thousands of GPUs for experiments.
An open platform lowers this barrier. But questions remain: How well will Forge perform with different types of tasks? How will it handle tasks where feedback is not obvious or is delayed over time? And most importantly, will the community truly adopt it as a common foundation, or will each team continue to build its own solutions anyway?
Time and practical use will provide the answers to these questions. For now, developers have another tool that's worth trying.