Imagine you have data on sales, weather, and store traffic. How do you tell what influences what? Does weather affect sales or do sales affect weather? It sounds simple, but for a machine, recovering the order of causes and effects from observations is one of the toughest tasks.
LG AI Research has released SciNO – a new model that learns to find causal relationships in data. Simply put, it tries to understand which variable affects another, rather than just correlating with it.
Why Causal Discovery Matters in Data Analysis
Why is this needed?
Causality is not the same as correlation. Two phenomena can happen simultaneously, but that doesn't mean one causes the other. A classic example: ice cream sales and the number of drownings rise simultaneously in summer, but ice cream has nothing to do with it – simply more people go swimming.
Understanding causality is critical for many fields: medicine, economics, climatology. If we know that A causes B, we can intervene and change A to influence B. If it is just a correlation – intervention might not yield results.
Existing methods often work with pre-existing causal graphs or require large volumes of data and prior knowledge about the system's structure. SciNO offers a different approach.
How SciNO Discovers Causal Relationships
What does SciNO do?
SciNO is a diffusion model that works with functions rather than individual data points. Instead of analyzing specific variable values, it looks at their behavior as continuous functions.
The main idea: the model learns to generate an order of variables so that it matches the causal structure. If variable X influences Y, then X must precede Y in this order. The model does this through a diffusion process – gradually transforming noise into a meaningful order of variables.
Neural operators are needed here to work with functions directly. This allows the model to capture patterns at the level of trajectories and dynamics, rather than just comparing numbers.
How does it work in practice?
The model is trained on data where causal ties are known. It learns to recognize patterns: how functions behave when one variable influences another. After training, it can be applied to new data where the order is unknown.
SciNO demonstrated good results on synthetic data and real-world tasks. For example, it successfully restored order in systems with nonlinear dependencies and time delays – where traditional methods often make mistakes.
It is important that the model works without needing to know the system structure or number of connections in advance. It infers the order from the data itself.
Limitations of SciNO Causal Discovery
Limitations and questions
Like any machine learning method, SciNO requires high-quality training data. If there are few examples in the data or they are noisy, the model may make mistakes.
Furthermore, the model restores the order of variables, but cannot always determine all details of the causal structure – for example, the connection strength or the presence of hidden variables. This is rather a first step toward a full understanding of the system.
Another point: diffusion models usually require significant computational resources. It is not yet entirely clear how well SciNO scales to very large systems with tens or hundreds of variables.
Industry Applications of SciNO
What does this mean for the industry?
If the approach proves stable and scalable, it could simplify the analysis of complex systems in science and business. Instead of manually building hypotheses about connections between variables, it will be possible to use the model for automatic causal structure search.
This is particularly useful in fields where data exists, but understanding of mechanisms is limited: biology, economics, climatology. The model can suggest what to pay attention to, where to look for causes, and which interventions might be effective.
For now, SciNO is a research project, and its widespread application is still a long way off. But the approach itself – using diffusion models and neural operators for causality – looks promising.