When an AI agent starts not just answering questions but autonomously performing tasks – writing code, running tests, making commits – a natural question arises: just how secure is it? What happens if the agent makes a mistake? Or if someone tries to trick it?
GitHub recently published a detailed breakdown of how security is structured in their Agentic Workflows, and it's a good opportunity to understand what's really behind this concept and why it requires a special approach.
An Agent Is More Than Just a Chatbot
A standard language model responds to text with text. An agent goes further: it receives a task and begins to take real actions – accessing tools, reading files, modifying code, and interacting with external services. Simply put, it does things, rather than just saying them.
In the context of GitHub, these agents operate within GitHub Actions – the system that automates development processes like building, testing, deployment, and more. An agent can, for example, take a task from an issue, independently write a solution, create a pull request, and hand it over to a human for review.
This is convenient. But it's also risky – if proper limitations aren't established.
What GitHub Is Worried About
Before building defenses, you need to understand what you're defending against. GitHub approached this through what's known as a threat model – a systematic analysis of what could go wrong.
Here are a few key scenarios they consider:
- Prompt injection – an attack where malicious instructions are hidden in the data an agent processes. For example, an issue's text or a comment might contain a hidden command: “Ignore previous instructions and do this.” An unprotected agent might execute it.
- Excessive permissions – if an agent has access to everything, a single mistake or successful attack can lead to serious consequences. The principle of least privilege applies here just as it does in standard development.
- Unpredictable actions – an agent might do something unexpected. Not due to malicious intent, but simply because language models don't always behave predictably.
- Data leaks – an agent could accidentally send sensitive information to external services or record it in a public log.
The Three Pillars of Defense
Based on its threat model, GitHub has built its security architecture around three core principles.
Isolation
The agent operates in an isolated environment – it doesn't have access to everything in the repository or organization. Each run receives only what it truly needs for a specific task.
It's like giving a contractor a key to only the room they need to work in – not to the entire building.
Restricted Outputs
The agent can't do just anything. The set of available actions is predefined and limited. If the task is to create a pull request, the agent shouldn't have the ability to, say, change repository settings or delete branches.
To put it simply, the agent is given a specific tool for a specific job, not a Swiss Army knife.
Logging
Everything the agent does is recorded. Every action, every tool call, every change – it all goes into a log that can be reviewed. This is important for two reasons: first, if something goes wrong, you can understand exactly what happened and when. Second, it creates accountability – the agent isn't operating in the dark.
The Human Remains in the Loop
One of the fundamental points in GitHub's approach is that the agent doesn't make final decisions on its own. More precisely, it can propose and execute intermediate steps, but key actions require human confirmation.
For example, an agent can write code and create a pull request, but it cannot merge it into the main branch without a developer's approval. This is the “human-in-the-loop” principle, and here it's not just a declaration but a built-in constraint.
This approach reduces the risk of a single agent error leading to irreversible consequences. The agent can make a mistake – but a human will see it before that mistake becomes a problem.
Why This Matters Beyond GitHub
GitHub is far from the only platform implementing agentic capabilities. This is a general trend in the industry: AI is increasingly gaining access to real tools and starting to act, not just respond.
And this raises a systemic problem: most existing security practices were designed for humans or deterministic programs. Agents are something in between: they act autonomously, but their behavior is probabilistic, not predictable.
The fact that GitHub is publicly describing its threat model and architectural decisions is beneficial for the entire industry. Not because their approach is the only right one, but because it provides a concrete example of how to think about these problems systematically.
Open Questions
Despite the well-thought-out architecture, a number of questions remain open – and this is openly acknowledged.
Prompt injection is one of the most complex threats because a universal defense against it doesn't yet exist. The agent processes text, and text can contain hidden instructions. This isn't a bug in a specific implementation; it's a fundamental characteristic of language models.
Furthermore, the more complex the task, the more permissions the agent needs – and the harder it becomes to adhere to the principle of least privilege. The balance between utility and security isn't static here: it has to be fine-tuned for each scenario.
Finally, logging helps to understand what happened after the fact – but it doesn't always prevent the problem. It's an investigation tool, not a barrier.
In Summary
GitHub's Agentic Workflows are an attempt to give developers a powerful automation tool without sacrificing control and security. Isolation, limited permissions, detailed logging, and mandatory human involvement in key decisions are all parts of a single system.
Perfect security doesn't exist, and GitHub doesn't hide this fact. But a systematic approach to security is important in itself – especially when it comes to agents that act on our behalf in real-world workflows.