Imagine this: several hospitals want to collaborate on training a diagnostic model, but none can transfer patient data to another party. Or consider banks that need to fight fraud together, but are legally barred from sharing customer transaction data. The classical approach to machine learning fails here, as it requires gathering all data in one place. Federated AI offers a different logic: the model «travels» to the data, rather than the data to the model.
The model goes to the data – not the other way around
Simply put, in federated learning, each participant trains the model locally on their own data. Only model weight updates are transmitted outward – small numerical adjustments from which the original data cannot be reconstructed. A central coordinator collects these updates from all participants and forms a shared, improved version of the model. The data itself never leaves its source.
This approach aligns well with regulatory requirements – such as the European GDPR and the American HIPAA – and allows for operations in cases where data cannot legally be moved outside a country or institution. This is precisely why federated AI is being actively adopted in healthcare, finance, and other sensitive industries.
Flower: «write once, run anywhere»
Among the tools for federated learning, Flower stands out as one of the most popular open-source frameworks in the field. Its core idea sounds simple: a developer writes the model code in a familiar way and then «wraps» it in a federated shell with minimal changes. The exact same code can be run in simulation mode for experiments or in full-scale industrial production.
This simplicity has attracted a wide range of users. Samsung and Nokia Bell Labs use Flower to train models directly on devices. JP Morgan and Banking Circle use it for privacy-preserving fraud detection. The British National Health Service (NHS) and the medical company Owkin work with Flower in research projects. Stanford, Cambridge, MIT, Harvard, and other universities apply it for inter-institutional collaboration.
The Flower repository on GitHub has garnered over 6,600 stars, with more than 170 contributors working on the project. The framework is distributed under the Apache 2.0 license and supports all major machine learning libraries – so organizations can transition existing projects to federated mode without rewriting the model code.
When it works – and when the challenges begin
Internally, Flower is straightforward: there is a central server that coordinates the training and clients – the nodes where data is stored locally. The server sends out model weights and tasks; clients train the model on their data and send back only the results. Connections are initiated by the clients – meaning only the server needs to be exposed externally, which simplifies setup and enhances security.
This is sufficient for small-scale experiments. However, when it comes to real-world corporate deployment – hundreds of devices or data nodes, multiple parallel projects, strict connection security requirements, and automatic recovery from failures – operational complexities arise that Flower cannot handle on its own.
There is a need to manage certificates for secure connections, automatically select which nodes should run a specific task, monitor the state of the entire fleet of devices, and react to failures. Essentially, these are infrastructure tasks rather than machine learning tasks.
Where Flower meets infrastructure ☁️
This is where Open Cluster Management (OCM) enters the picture – an open-source tool for managing multiple Kubernetes clusters, which serves as the foundation for the Red Hat Advanced Cluster Management for Kubernetes product. Notably, the architecture of OCM follows the same logic as Flower: there is a central hub and peripheral nodes that connect to the hub themselves. This makes them natural partners.
The integration is called the flower-addon. In essence, it is a bridge between two systems: OCM takes over everything related to infrastructure – deploying Flower agents to the necessary nodes, issuing and renewing certificates for secure connections, automatically selecting devices for specific tasks (for example, only those with GPUs), monitoring system health, and scaling as needed.
Meanwhile, Flower continues to do what it does best: coordinating the federated learning process and aggregating model updates. The division of responsibility is clear: one system is responsible for how the infrastructure is set up, while the other handles what happens within it.
What this changes in practice
For teams looking to launch federated learning in real-world conditions, this is important for several reasons.
First, it lowers the barrier to entry. Organizations already using Red Hat Advanced Cluster Management don't need to build infrastructure from scratch – they can plug Flower in on top of what they already have.
Second, it addresses the very problems that usually stall the transition from experiment to full-scale implementation: manual configuration of hundreds of nodes, certificate complexities, and the lack of centralized monitoring.
Third, it opens up federated AI for the industries that need it most – healthcare and finance – where security and regulatory compliance requirements are particularly high.
Of course, open questions remain. Federated learning is inherently more complex than the classical kind: one must consider how to aggregate updates from nodes with varying data volumes and distributions, how to ensure resilience against individual participant failures, and how to verify the quality of updates. Infrastructure tools do not eliminate these questions – they simply remove the operational barriers that previously made it impossible even to approach these tasks.
But that in itself is significant: once infrastructure problems are solved, teams can focus on what truly matters – model quality and federation architecture.