There's one thing developers are reluctant to do, even when they're eager to write good code: write tests. It's not that they don't understand their value – they understand it perfectly. It's just that once a feature is ready and working, sitting down to methodically cover it with tests is psychologically tough. Especially when the deadline was yesterday.
Mistral decided to look at this problem through the lens of AI agents and described how to build a system that takes on this work. The focus is on Rails projects – that is, applications written in Ruby on Rails, a popular web framework.
Why Tests, and Why It's More Complicated Than It Seems
Writing a test isn't just about reproducing a function's logic. A good test checks the code's behavior under various conditions: what happens with valid data, with invalid data, and in edge cases. You need to understand what the function is supposed to do, how it interacts with the rest of the system, and what dependencies it has.
Simply put, writing tests requires understanding the context. And this is precisely where AI agents become truly useful – not as an autocomplete tool, but as a system capable of reasoning about the code.
An Agent Is Not Just a Model
It's important to clarify one thing from the start: an AI agent is not the same as a language model you ask questions of in a chat. An agent is a system that can perform a sequence of actions: study the code, run commands, observe the result, adjust its behavior, and try again.
In the case of tests, this means something like this: the agent reads the existing application code, understands its structure, generates tests, runs them – and if something goes wrong, it tries to figure out why and fix the situation. It's a cycle, not a one-off response.
This is precisely the architecture that Mistral describes in its article. At its core is the Mistral Small 3.1 model, which manages this process: it analyzes the codebase, decides what tests are needed, generates them, and interacts with the environment through a set of tools.
How the Agent «Sees» a Project
One of the non-trivial tasks here is to give the agent enough context about the project so that the tests are meaningful, not just formal. Rails applications are structured according to certain conventions: models, controllers, routes, and relationships between database tables. The agent must be able to navigate all of this.
To do this, the system uses a set of tools: it can read project files, study the database schema, look at the application's routes, and analyze existing tests – if there are any. In essence, the agent first «gets acquainted» with the project before it starts writing.
This is a crucial point. Without understanding the application's structure, a test might be technically correct but useless – it would either test something that will never break or simply fail because it doesn't account for real dependencies.
Run It and See What Happens
Another key feature is that the agent doesn't just generate a test file and stop. It runs the tests and analyzes the results. If a test fails with an error, that's a signal: something went wrong, and it needs to figure it out.
The agent sees the error output, tries to understand its cause, and makes corrections. It's an iterative process – much like how a developer works when writing tests manually. The difference is that the agent doesn't get tired and doesn't put it off for later.
This «write → run → fix» cycle makes the result significantly more reliable than if the model simply generated code in a single pass without any feedback from the real environment.
What the Developer Gets in the End
The idea isn't to completely replace the developer in writing tests. Rather, it's to remove the most painful barrier: the need to start from scratch and spend time on the routine task of covering obvious logic.
The agent handles the foundational layer: covering models, controllers, and typical scenarios. The developer can then refine the result, add specific cases, and account for business logic the agent couldn't know. But the starting point is already there – and that changes the entire feel of the task.
There's also a practical aspect: even imperfect tests are better than no tests at all. If the agent covers 70% of the logic, that's already a real safety net for future code changes.
Current Limitations
The system operates in a fairly controlled environment: a standard Rails project structure, clear dependencies, and a predictable environment. The more complex and non-standard the project, the harder it is for the agent to navigate.
Complex business logic, unconventional architectural decisions, and tangled dependencies between components all degrade the quality of the generated tests. The agent might not understand what a test is supposed to check and write something that is technically correct but substantively empty.
Furthermore, the agent doesn't know what's important from a product perspective. It sees the code but doesn't see which scenarios are critical for the business and which are secondary. That still remains a task for a human.
Why This Is Interesting Beyond Rails
Rails is a specific example here, but the idea itself is much broader. Testing is a universal pain point in development, regardless of the language or framework. And the approach of «an agent that can read code, run it, and iteratively improve the result» is applicable in many different contexts.
What Mistral is demonstrating with Rails is more of a pattern: how to build agents that operate not in isolation, but in a real environment, with real tools and feedback from code execution.
This is one of the signs of where the practical application of AI in development is heading: from «suggesting the next line» to «taking a task and seeing it through to completion.» For now, it comes with caveats and and limitations – but the direction is clear.