Published on April 2, 2026

Optimizing AI Agent Training Costs

How Salesforce Trains AI Agents Without Huge Costs

Salesforce AI Research explains how it is restructuring language model training for the agentic era – and why old approaches no longer work.

Research / Technical context 5 – 7 minutes min read
Event Source: Salesforce 5 – 7 minutes min read

Training language models through feedback – whether from humans or other AI – has long been the standard. This is how models learn to provide helpful, safe, and accurate responses. Simply put: a model does something, receives feedback, and becomes slightly better based on that feedback. Repeat this thousands of times, and the result is an aligned, «well-behaved» model.

But now, the industry is entering a new phase. Models are increasingly working not as chatbots answering a single question, but as agents – systems that perform long chains of actions: they search for information, run tools, make intermediate decisions, and only then produce a result. And this is where the old training scheme starts to fail.

Agent Training: From Single Step to Marathon

When a Single Step Becomes a Marathon

In a classic scenario, a model generates a response and immediately receives a signal: good or bad. Everything is quick and clear. In an agentic scenario, there can be dozens of steps between the first action and the final result. The model calls an external service, gets data, processes it, calls another service, processes it again – and only then does it become clear whether it has completed the task.

This changes everything. Training becomes significantly more expensive: it's necessary to store the context of the entire chain and evaluate the whole reasoning path, not just a single response. The computational load grows non-linearly. This means researchers need new, more efficient approaches that don't require vast resources for each training step.

This is precisely what the Salesforce AI Research team has tackled. They described how they are redesigning the model training process for an agentic reality – and the specific problems they had to solve.

Key Bottlenecks in AI Agent Training

Three Bottlenecks That Slow Down Agent Training

The researchers identified several key challenges that reinforcement learning faces in an agentic context.

First, context length. An agent operates with a long history of interactions. The longer the chain, the more information it needs to keep «in mind» at each training step. This directly affects memory usage and processing speed.

Second, the sparsity and delay of the reward signal. In typical tasks, the model gets feedback almost immediately. In agentic ones, the final result might only appear after many steps. This complicates figuring out which specific actions led to success or failure. Imagine trying to teach someone to cook a dish, but only giving your assessment of whether it's «tasty or not» after the guest has already left the table.

Third, the cost of a single training example. To train a model on a single agentic episode, you need to run the entire chain of actions, collect signals, and calculate gradients. This is significantly more expensive than training on a single response. At an industrial scale, such costs become a major limitation.

Salesforce Solutions for Agent Training

What Salesforce Proposes

The team is working on several fronts simultaneously, trying to make agent training more practical – without sacrificing quality for speed or breaking the bank on computation.

One idea is to more intelligently manage which of the agent's steps are used in training. Not every intermediate step is equally useful for the feedback signal. If we can learn to select the most informative moments, it's possible to significantly reduce the load without losing training quality.

Another direction involves working on how the reward signal is formed and transmitted. In agentic tasks, instead of waiting for the signal at the very end, it's possible to construct intermediate evaluations – a kind of «checkpoint» – that give the model more frequent and accurate feedback at each stage of the journey.

In parallel, they are exploring how to better distribute computation across multiple agents or runs so the system can learn more concurrently without creating bottlenecks.

All of this sounds like engineering optimization – and in a way, it is. But behind it lies a fundamental question: Can we even train agents on realistic tasks if we don't solve the efficiency problem? Without this, agentic AI risks remaining the domain of companies with unlimited computing budgets.

The Broader Impact of Efficient Agent Training

Why This Matters Beyond Salesforce

The topic of agentic reinforcement learning is currently a hot one across the industry. Major labs – from OpenAI to DeepMind – are all facing the same limitations one way or another. Agents based on language models are already being used in business process automation, coding, and research tasks. And the more complex the task, the longer the chain of actions – and thus, the more acute the problem of efficient training becomes.

At the same time, the issue of safety hasn't taken a backseat. When an agent performs dozens of actions in a row, the cost of an error increases, as a single wrong decision early on can trigger a whole chain of consequences. This makes the careful tuning of training signals not just a technical issue, but a substantive one. Incidentally, this very problem – how to prevent an agent from «breaking something» in its pursuit of a result – is addressed by a separate field known in academia as Safe Reinforcement Learning. Its essence is to define constraints alongside the training's objective function: the agent must not only achieve the goal but do so within the bounds of acceptable behavior.

The work by Salesforce AI Research is one of the public examples of how research teams are trying to make agentic training scalable. It's not a revolution, but it's an important step toward making AI agents practically applicable tools – not just impressive conference demos.

Future Challenges in AI Agent Development

What Remains an Open Question

Despite the progress, there are still more questions than answers. How can we evaluate an agent's quality on tasks where there's no single «correct» answer? How can we ensure training stability when the external environment is unpredictable? How do we transfer approaches that work in lab conditions to real-world products?

These questions aren't unique to Salesforce – they face the entire industry. And the fact that major companies are starting to speak openly about their approaches to solving these problems is itself a signal: the agentic era is coming, and preparations for it are beginning in earnest.

Original Title: How Salesforce AI Research is Building Efficient RL Training for the Agentic Era
Publication Date: Apr 1, 2026
Salesforce www.salesforce.com An international company integrating AI into enterprise platforms and data management systems.
Previous Article AEC-Bench: How to Test AI's Readiness for the Construction Industry Next Article Sony AI in March: A Book on Diffusion Models, Over Ten Accepted Papers, and a Researcher's Recognition

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Fireworks AI explains why the race for megaclusters isn't the only path to powerful AI models and how reinforcement learning (RL) is changing the equation.

Fireworks AIfireworks.ai Mar 23, 2026

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!

Subscribe