Documentation Isn't Boring. It's a Matter of Survival
Imagine you're building a massive city. Hundreds of buildings, thousands of pipes, miles of cables. Everything works. Everything is connected. But you have no blueprints. Not a single one. You know that one building is powered by this substation, and that pipe goes somewhere toward the center – but no one remembers exactly where. What happens when something breaks? Or when an inspector shows up and asks to see the documentation?
This is exactly the situation modern AI system developers find themselves in. Smart cities, autonomous cars, medical platforms, intelligent manufacturing – all of these are ecosystems where dozens of algorithms communicate with each other, exchange data, and make decisions. And they do it with almost no proper documentation. Not because developers are lazy, but because the tools to adequately describe AI systems simply didn't exist.
This is the very problem a group of researchers set out to solve by introducing the RAD-AI framework – a set of extensions and methods that, for the first time, gives the industry real tools for documenting AI ecosystems. And it does so with a very specific deadline in mind: August 2, 2026 – the date from which the European Union will begin to fully apply its AI Act to high-risk systems.
Old Maps for a New World
Software architects have their favorite tools. Two of the most popular are arc42 and the C4 model. They've been around for a long time, are well-understood, and are integrated into thousands of workflows. And they do an excellent job of describing conventional software systems: banking apps, corporate portals, ticket booking services.
The essence of these systems is determinism. You press a button – an action occurs. Always the same, predictable, repeatable. Such systems can be drawn on a diagram, and that diagram will be accurate.
But an AI system is a different beast. Its behavior isn't just complex. It's probabilistic. The same input data can produce different results. It changes over time – not because a programmer changed something in the code, but because the data it's trained on has changed. It has two parallel lifecycles: one like conventional software, and another specific to machine learning, with training, retraining, model versioning, and drift monitoring.
Trying to describe such a system using arc42 or C4 is like trying to draw a map of a living forest that grows and changes every day. You'll technically have a diagram. But it will be outdated before the ink is dry. And most importantly, it won't answer the most critical questions: Where did the data come from? How does the model make a decision? What happens if the data starts to «drift?»
What Is Drift – And Why It's Scarier Than It Sounds
The term «model drift» sounds technical, but the idea behind it is very simple. Imagine you've trained an algorithm to recognize fraudulent transactions on 2022 data. Fraudsters are inventive people. By 2024, their schemes have changed. The data entering the system no longer resembles what it was trained on. The model starts making mistakes – without «knowing» it's making them. That's drift.
Now, scale that up to an ecosystem. One model predicts traffic in a smart city. Its results are used by a second model that plans bus routes. Those routes influence a third model that calculates delivery times for goods. If the first model starts to drift, errors cascade through the entire chain, like a domino effect. This is called cascading drift. And it's something standard documentation methods simply miss – because they describe each component in isolation, without showing how they influence one another.
The Law That's Already Written
In 2024, the European Union passed Regulation 2024/1689 – the Artificial Intelligence Act. This is the world's first comprehensive law regulating AI at the state level. It divides systems by risk level and, for high-risk systems, sets strict technical documentation requirements listed in Annex IV.
What do these requirements include? Among other things:
- a detailed description of the system and its intended purpose;
- documentation on data provenance and quality – where it was sourced, how it was collected, and how it was labeled;
- a description of the model's architecture, algorithms, and parameters;
- metrics for accuracy and robustness;
- a description of the risk management system;
- requirements for monitoring under real-world operating conditions;
- cybersecurity measures;
- explainability tools – so the system can explain why it made a particular decision;
- post-market monitoring plans.
Sounds reasonable. The problem is that when researchers analyzed popular documentation frameworks to see how well they covered these requirements, the result was disappointing: about 36%. In other words, existing tools don't cover more than half of what the regulator requires at all.
Starting in August 2026, this stops being an abstract problem and becomes a very concrete risk: a high-risk system without proper documentation is a system that won't be allowed on the market.
RAD-AI: An Extension, Not a Revolution
The creators of RAD-AI made a smart move: they didn't invent a new framework from scratch. They extended existing ones – arc42 and C4 – by adding AI-specific blocks. This is fundamentally important: teams already using these tools can gradually adopt the new elements without tossing out what already works.
Eight new sections are added to arc42. Here's what they mean in practice:
- Data Quality and Provenance. Where did the data come from? How was it collected? How was its cleanliness verified? Was there bias, and how was it addressed? These aren't just technical details – they're the basis for understanding whether the model can be trusted at all.
- Model Lifecycle. A description of the entire journey from training to deployment. How is the model retrained? How are its artifacts versioned? Can a training result from six months ago be reproduced?
- Performance and Decision Criteria. How accurate is the model? Under what conditions does it «decide» to act? What are its known limitations?
- Ethical Assessment and Risk Mitigation. Where might the model discriminate or produce unfair results? What has been done to prevent this?
- In-Operation Monitoring and Maintenance. How does the system monitor itself in real time? How is drift detected? What happens in case of an incident?
- Interpretability and Explainability. How does the system explain its decisions? What methods are used – for example, LIME or SHAP? How understandable are these explanations to users or regulators?
- Security and Robustness. How is the system protected from targeted attacks – for instance, when an attacker deliberately crafts inputs to trick the model? How does it behave during component failures?
- Regulatory Compliance. A direct cross-check against Annex IV of the AI Act, plus other applicable standards and audit procedures.
Three new diagram types are added to the C4 model, which were not there before:
- Machine Learning Pipeline Diagram – a visualization of the entire data journey: from collection through training to deployment and monitoring.
- Data Flow Diagram – a detailed map showing where data originates, how it is transformed, and where it goes, with notes on sensitive data and protection measures.
- AI Component Interaction Diagram – a map of how different models communicate with each other, where dependencies arise, and where cascading drift might originate.
Testing on Real-World Platforms
The authors didn't stop at theory. They applied RAD-AI to two well-known production machine learning platforms: Uber Michelangelo and Netflix Metaflow. Both platforms are publicly documented, large-scale, and complex – an excellent testing ground.
The analysis revealed eight categories of problems that standard frameworks simply overlook. Among them are the management of feature stores, ensuring model fairness, metadata management, conflict resolution strategies between models, and the reproducibility of experiments.
An important conclusion: these problems are not specific to Uber or Netflix. They don't arise because the companies are in the taxi or streaming business. They arise because they are AI systems. The authors termed this a structural documentation deficit: the gaps in documentation are built into the very tools used to create it, regardless of the subject domain.
The Smart City as a Stress Test
For a final check, the authors took a hypothetical but realistic ecosystem – a city's smart transportation system. It included several interconnected AI components: traffic prediction, route planning, delivery logistics, and public transport management.
What did the analysis through the lens of RAD-AI reveal?
First, the aforementioned cascading drift – when an error in one model silently propagates through the entire chain. Second, differentiated regulatory obligations: an autonomous car is subject to much stricter legal requirements than an algorithm that recommends subway transfers. Standard documentation doesn't distinguish this – RAD-AI makes these boundaries explicit.
Third – the complexity of data ownership. Data comes from road sensors, user smartphones, and city infrastructure. Who is responsible for what? Who has the right to use what? This is not only a technical but also a legal question, and it must be documented explicitly, not just implied.
Finally, ecosystem-level security: when there are many interacting components, the number of vulnerability points becomes incomparably larger than if each operated in isolation. This also requires separate architectural attention.
What the Practitioner Review Showed
The authors enlisted six experienced software architects for an independent evaluation: how well does RAD-AI cover the requirements of Annex IV of the AI Act compared to standard frameworks?
The result: average coverage increased from 36% to 93%. This isn't just an improvement – it's a quantum leap. The difference between «the system is formally documented» and «the system is documented in a way a regulator can accept.»
Of course, six experts are preliminary data, not a final verdict. The authors honestly call this «preliminary evidence.» But the direction is clear: the existing toolkit leaves huge blank spots where the law demands black and white.
Why This Matters Right Now
This research was published against the backdrop of a very specific regulatory timer. The AI Act has been passed. The documentation requirements for high-risk systems are written. August 2026 is the date when real enforcement begins.
For companies that develop or deploy high-risk AI systems in the European market – medical diagnostic tools, credit scoring systems, hiring algorithms, autonomous transport components – this isn't an abstract academic discussion. It's a practical question: do you have the tools to prepare the documentation the law requires?
RAD-AI offers a concrete answer: extend what you already use. Don't discard arc42 – add eight sections to it. Don't abandon C4 – add three new diagram types. It's not a revolution for revolution's sake. It's a pragmatic engineering response to a real problem.
The next steps planned by the authors include developing tools for automated generation and verification of specific documentation aspects, as well as applying the framework to a broader range of production scenarios and studying its role in the audit and certification processes for AI systems.
Documentation isn't bureaucracy for bureaucracy's sake. It's the map that allows us to understand what's happening inside a system when something goes wrong. And in a world where algorithms make decisions about loans, diagnoses, and routes, such a map isn't an option – it's a necessity.