From «Answer the Question» to «Solve the Task on Your Own»
Just a few years ago, the conversation around AI in business followed a single script: the user asks a question, and the model provides an answer. It was simple, clear, and predictable. Today, the landscape is shifting: companies are rapidly moving toward agentic AI – systems that don't just generate text, but autonomously set goals, select tools, and execute complex tasks step-by-step.
To put it simply: AI used to be a smart assistant you'd go to for advice. Now, it's becoming an employee you delegate work to.
This very theme took center stage at AAAI 2026, one of the world's leading AI conferences. LG AI Research presented four papers there, covering everything from causal analysis to reinforcement learning. But the real heart of the event lay elsewhere: not in how powerful these agents have become, but in whether they can be trusted and exactly how that trust should be built.
Autonomy Is Great. Opacity Is a Problem.
When an agent acts autonomously, it makes a multitude of intermediate decisions: which data to access, which tool to trigger, and how to interpret the results. This is where a serious bottleneck appears: tracing the logic behind these steps is incredibly difficult.
Imagine a new hire who completes a task but can't explain why they chose a particular approach. Or worse, an error creeps into their work, but it's impossible to pinpoint the stage where it happened. It's the same story with agentic systems, only the scale of potential consequences is significantly higher.
By nature, language models work probabilistically: they generate what «sounds plausible.» Most of the time, this aligns with the correct answer. But in business processes, «plausible» and «correct» are not synonyms. An agent can violate a protocol, bypass a restriction, or make a decision that contradicts company policy – and it will do so very convincingly.
That is why AAAI 2026 focused so heavily on architectural approaches that make agent behavior observable and explainable, rather than just boosting model performance.
When Knowledge Is Structured, Not Just «Known»
One of the key tools for solving this problem is the use of ontologies and knowledge graphs. It sounds complex, but the gist is simple: it's a way of organizing knowledge through explicit connections between concepts, rather than just storing data in a model's «memory.»
While a standard language model «knows» something because it saw it in a training set, a knowledge graph stores facts and relationships in a structured format – like a database made of connections between entities rather than tables. This allows the agent to do more than just spit out an answer; it can reason logically: «If A leads to B, and B is connected to C, then...»
The conference featured two studies advancing this approach.
PathMind tackles the problem of «noisy» reasoning: when an agent searches for an answer, it can sift through a massive number of paths in a knowledge graph, most of which are irrelevant. PathMind proposes selecting the most promising routes first before building logical chains. This reduces system load and increases accuracy, especially in tasks involving many sequential steps.
DoM addresses a different hurdle: what if the knowledge graph is incomplete? Real-world databases are rarely perfect. The solution was found in «debates» between agents: one relies on the knowledge graph, another on standard text search, and a third acts as an arbiter to compare the arguments. As a result, the final answer is more reliable than if each agent worked in a vacuum.
Three Tools That Make an Agent «Glassy»
The conference's demo track was dedicated to a practical question: how to turn an agent from a «black box» into a system that can be monitored and verified.
Three presented tools offer different approaches to this task.
AgentGraph: A Decision Map Instead of Thousands of Log Lines
When an agent executes a task, it generates a massive log of actions. Reading through it to find the moment something went sideways is nearly impossible. AgentGraph converts this log into a visual graph: nodes represent tasks and tools, while edges represent the connections between them. A developer can literally «click» on any point to see exactly what action lies behind it.
Beyond visualization, the tool allows for testing an agent's resilience – checking, for instance, how it reacts to manipulation attempts or malformed requests.
AgentSeer: Safety Checks at Every Step
The standard approach to AI safety is to verify the agent's final output. AgentSeer goes deeper: it analyzes every intermediate action – memory access, tool calls, and interactions with other agents.
This is crucial because a final result might look safe, even if the agent made a critical error halfway through the process. AgentSeer models attacks at every stage and identifies vulnerabilities that traditional testing simply misses.
Omega: «Explain Why You Decided That»
Omega helps in complex operational environments where you need to understand an agent's motives: why did it choose this specific route or action?
Originally developed for tasks where multiple agents move simultaneously in a shared space (like robots in a warehouse), the approach is universal. Omega takes the action log, overlays it with an ontological structure (explicit domain rules), and generates an explanation in plain English. Not just «the agent chose path A», but «the agent chose path A because path B was blocked, and the priority of task C was higher.»
This is particularly valuable for post-incident analysis, where you need to reconstruct the chain of causality and provide a clear answer on what happened and why.
To sum up the conference, the main takeaway is this: agent autonomy without transparency is a high managerial risk.
As long as an agent is just helping to write an email or draft a report, the cost of an error is low. Но when it independently interacts with external systems, makes decisions in business processes, or coordinates other agents, the question «why did it decide that?» stops being theoretical.
This is why the central theme of AAAI 2026 was not «make the agent smarter», but «make it explainable.» Ontologies, knowledge graphs, and observability tools are not just add-ons; they are fundamental parts of the architecture. Without them, trusting AI agents with serious tasks is simply not an option.
Agents capable of not only acting but also justifying their decisions are the next practical frontier. And judging by the discussions at the conference, work in this direction has already moved into the realm of application.