One of the most challenging questions that arises when working with AI agents is: how can you trust a system that, by design, isn't required to perform the same action twice? This isn't due to a malfunction, but rather its very architecture. It's not a bug, but a fundamental property.
This is the exact topic of the eighteenth episode of the Human in the Loop podcast by Scale AI. The conversation touches on trust in agentic systems and what «reliability» even means for something that operates probabilistically rather than deterministically.
What Does «Non-Deterministic» Mean?
In short, a deterministic system is one where the same input always yields the same output. Think of a calculator: 2 + 2 always equals 4. Neural agents don't work this way. For the very same request, they might give different answers, choose different steps, and reach different conclusions.
This isn't always a bad thing. In fact, this property is what makes agents seem alive, flexible, and creative. But when it comes to trust – entrusting an agent with an important task and needing to be sure of the result – this instability becomes a problem.
Simply put: we can't test an agent once and declare, «It works.» Tomorrow, it might work differently.
Trust Isn't About Accuracy, It's About Predictable Behavior
An interesting logical shift emerges in this discussion: trust in an agent isn't the same as confidence in the correctness of its every answer. Rather, it's confidence that the agent behaves in an understandable manner – that its actions fall within an expected range and that it won't do something surprising at a critical moment.
This is closer to how we trust people. We don't expect a colleague to always make the perfect decision. We expect them to operate within a framework of understandable principles, to let us know when they are uncertain, and not to silently exceed their authority.
Agents, in this regard, should operate similarly: their goal isn't to be omniscient, but to be readable.
Human in the Loop: Not a Crutch, but an Architectural Choice
This brings us to the central idea of the «human in the loop» concept. It doesn't mean a person must approve every step the agent takes. It means the system is designed in such a way that the agent knows when it's time to pause and ask for guidance.
This sounds simple, but is hard to implement. An agent must be able to recognize its own uncertainty – to know when it is facing a high-stakes situation with low confidence. And it is at these critical points that it should hand over control to a human.
This requires the system to have a certain «awareness» of its own limitations – a non-trivial task for the current generation of models.
What Makes an Agent Reliable in Practice?
The discussion offers several practical benchmarks.
The first is transparency of actions. An agent must leave an audit trail: what it did, why, and based on what data. This allows humans not just to accept the outcome, but to understand how it was achieved and to intervene or correct course if needed.
The second is limited authority. A reliable agent doesn't do everything it is technically capable of. It acts strictly within the scope of its explicit permissions. This mitigates the risk of unintended consequences.
The third is the ability to stop. If an agent is uncertain or faces a situation outside its competence, it must be able to hit «stop» and transfer control, rather than trying to force a result at any cost.
The fourth is consistent behavior. Even if specific outputs vary, the agent's overall style of behavior must be stable and predictable. The user should know what to expect – not a particular answer, but a consistent manner of acting.
Why This Matters Right Now
Agentic systems are moving from the experimental stage to real-world application. They are beginning to be used for process automation, decision-making, and data management – in contexts where the cost of failure is tangible.
This is precisely where the question of trust shifts from being philosophical to being an engineering challenge. It's no longer possible to just «try it and see»; instead, we need structures that make an agent's behavior auditable, manageable, and correctable.
Non-determinism is here to stay. It isn't something that will be «patched» in a future model update. It is a core property of this class of systems. This means trust must be built not in spite of this trait, but alongside it – by designing agents so that their unpredictability stays within manageable limits.
Open Questions – And There Are Many
What's left unsaid is that there's no universal solution here. Different use cases demand different levels of control. An agent that helps draft texts operates with a fundamentally different level of responsibility than one that makes financial decisions.
Moreover, a major open question remains: how do we systematically evaluate an agent's reliability? Traditional quality metrics, such as accuracy and completeness, fall short when a task is ambiguous, context is ever-changing, and a single right answer may not even exist.
This is perhaps the key takeaway: the industry is rapidly moving toward agentic systems, while the tools to assess and govern them are only now starting to emerge. The challenge of building trust is a continuous one. 🔄