Published on April 25, 2026

Limits of Large Deviation Theory in Optimal Transport with Heavy Tails

When the Map Fails at the Edges: The Limits of Large Deviation Theory in Optimal Transport

Mathematicians have discovered that the standard tool for estimating rare events 'breaks down' where distributions stretch to infinity – and that changes everything.

Mathematics & Statistics 13 – 19 minutes min read

Author: Professor Lars Nielsen 13 – 19 minutes min read

«While working on this text, I caught myself thinking: we've grown so accustomed to trusting tools that have 'always worked,' we rarely ask where exactly they stop working. This result seems particularly honest to me: it doesn't tear down the theory; it clarifies its boundaries. I find myself wondering how many applied models with heavy tails are silently relying on estimates that are no longer accurate – and no one has yet peeked at those very edges of the map.» – Professor Lars Nielsen

When the Map Fails at the Edges

The Map That 'Lies' at the Edges

Imagine a city map. In the center, it's precise: every street, every intersection is plotted with meticulous accuracy. But the farther you go toward the edges, the more blank spots you find. At some point, the map just ends, and beyond it lies terra incognita. You're confident the map works in the center. But would you be just as confident heading to the outskirts?

This is precisely the situation mathematicians found themselves in when working with a tool called the large deviation principle. This principle is one of the most powerful ways to estimate the probability of rare events. It works brilliantly as long as you stay in the 'center' of the mathematical space, where everything is neatly bounded and compact. But once you venture to the 'edges' – where distributions have 'long tails' and stretch to infinity – the map starts to 'lie.'

This is exactly what the study in question demonstrates. The authors constructed a concrete example where the tool, which worked reliably on 'compact' territories, fails as soon as we step beyond them. And this isn't an abstract mathematical curiosity – it's a warning for everyone who uses similar methods in real-world problems.

What Is Optimal Transport And Dust Analogy

What Is Optimal Transport – And What Does Dust Have to Do with It?

Before we dive into what exactly 'broke,' let's first understand what optimal transport is. It's a mathematical problem best explained through the analogy of moving goods – or, in its classic formulation, through the imagery of dust and pits.

Imagine you have a pile of sand in one place and a pit in another. You need to move all the sand into the pit. But you want to do it with minimal effort – that is, by transporting each grain of sand along the shortest possible path. Mathematically, this is known as the Monge–Kantorovich problem, and it has numerous applications, from logistics and economics to computer vision and image generation.

Now, let's make it more complex. In real life, we rarely know the exact location of every grain of sand and every hollow in the pit. We only know their probability distributions – roughly speaking, a density map of where the sand is and a map of the pits. The optimal transport problem in this case seeks the best 'plan' – a description of which part of the sand moves where – to minimize the total cost of transportation.

But there's a subtlety. The pure optimal transport problem sometimes yields overly 'rigid' solutions: all sand grains move strictly along a single route, without the slightest deviation. Such a solution is mathematically beautiful but computationally inconvenient and physically unrealistic. That's why in the mid-20th century, largely thanks to work tracing back to the ideas of Erwin Schrödinger, entropic regularization was proposed.

Entropic Regularization Adding Chaos

Entropic Regularization: Let's Add a Little Chaos

The idea is simple: a penalty term is added to the original cost minimization problem, which 'punishes' overly deterministic plans. Mathematically, this looks like adding a measure of randomness – entropy – to the objective function.

Using our sand analogy: imagine you're not transporting perfectly ordered grains of sand, but rather a cloud of dust, with each particle scattering slightly. The larger the regularization parameter (let's call it ε), the greater the 'blurriness' of the plan. As ε approaches zero, the plan becomes rigid again and converges to the classical optimal one.

This approach – entropic optimal transport – became extremely popular in machine learning in the 2010s, primarily in tasks related to generative models and comparing distributions. It is computationally efficient, mathematically elegant, and allows for the construction of smooth, 'blurry' transport plans that are convenient to optimize.

But now a question arises: how well do we understand the behavior of these plans as ε → 0? How accurately can we describe how quickly and how reliably they converge to the classical solution? This is where the large deviation principle comes in.

The Large Deviation Principle for Rare Events

The Large Deviation Principle: A Thermometer for Rare Events

The large deviation principle is a mathematical tool that allows us to estimate the probabilities of rare events with exponential precision. In simple terms, it answers the question, 'How unlikely is it that the system will end up far from its typical behavior?'

A classic example is flipping a coin. If you flip a fair coin 1,000 times, the number of heads will be close to 500 in the vast majority of cases. The probability of getting, say, 900 heads is vanishingly small. But how small, exactly? The large deviation principle provides a precise exponential answer: this probability decreases as e^−n·I, where n is the number of flips and I is the so-called 'rate function,' which characterizes the 'cost' of the deviation.

In the context of entropic optimal transport, for a small ε, the entropic minimizer π_ε should, with high probability, be close to the classical optimal plan π₀. The large deviation principle describes how quickly the probability of being 'far' from π₀ decreases.

For the principle to work correctly, the upper bounds for these probabilities must be accurate – and not just for 'nice,' bounded (compact) regions, but for any closed set in the space of measures. And this is precisely where the problem begins.

Compact vs Closed Sets Mathematical Difference

Compact vs. Closed: What's the Difference and Why Is It Critical?

To grasp the essence of the discovery, we need to briefly pause on two mathematical concepts: compact and closed sets. They sound similar, but the difference between them is fundamental – and the entire problem is hidden within this difference.

A compact set is, roughly speaking, a set that is 'bounded and doesn't lose its boundary points.' On the number line, a compact set is, for example, the interval [0, 1]. It's finite, closed, and nothing 'escapes' to infinity.

A closed set is a broader concept. It also 'contains its boundary points,' but it can be infinite. For example, the entire number line is a closed set. The ray [0, +∞) is also closed. The line x = y in a two-dimensional space is closed but not compact.

An analogy from daily life: imagine a compact set is a park in the city center. It's enclosed by a fence, clearly defined, and you know exactly where it begins and ends. A closed set, on the other hand, is like a riverbank that stretches to the horizon and beyond. You know for sure that every point on the bank belongs to the set, but the set itself is infinite and extends into the distance.

In large deviation theory, it has long been known that upper bounds on probabilities work well on 'parks' – compact sets. But extending them to 'riverbanks' – closed but unbounded sets – is only possible under special conditions. Without these conditions, the estimates can be incorrect.

For a long time, this was considered more of a theoretical subtlety that didn't arise in 'reasonable' problems. The study we are discussing has shown the opposite.

Counterexample: When Heavy Tails Break Rules

The Counterexample: When 'Tails' Break All the Rules

The authors constructed a specific model – not an abstract one, but one with an explicitly defined cost function and marginal distributions – in which everything goes wrong according to standard theory.

The essence of the construction is as follows. Consider the problem of transporting 'sand' on a number line, where the 'cost' of moving a particle from point x to point y is determined by the distance between them. The marginal distributions – that is, the initial and final 'density maps' of the sand – are chosen to have heavy tails: a significant portion of the 'sand' is concentrated far from the origin, at large values.

In such a situation, the classical optimal plan π₀ – the 'rigid' one that the entropic minimizers approach as ε → 0 – has a non-compact support. In other words, a significant part of the transport plan 'lives' at infinity: the pairs of points (x, y) along which mass is moved extend arbitrarily far.

Picture it this way: you're organizing package deliveries across a country where some senders and recipients live in extremely remote regions – so far away that no finite map can contain them. The optimal delivery plan must account for these extreme routes, and therefore the plan itself doesn't 'fit' into any bounded region.

Now, the authors asked a question: do the upper bounds of the large deviation principle hold for arbitrary closed sets in the space of transport plans? The answer turned out to be no. They found a specific closed set F – roughly speaking, the set of all plans with 'enough mass in the tails' – for which the upper bound is violated.

What does this mean in practice? The large deviation principle says, 'The probability of π_ε ending up in F decays no faster than e^−I/ε.' But in this counterexample, the probability decays slower than the theory predicts. In other words, the system 'leaks' into the tails much more often than it should according to the standard estimate.

It's as if you predicted that a downpour in a specific region happens once every 100 years – but it happens once every 10. The formula worked for 'normal' regions but proved incorrect for those lying at the edge of the map.

The Tail Criterion for Closed Sets: A Solution

The Tail Criterion: When the Leap from Compact to Closed Is Still Possible

It would be unfair to end on a pessimistic note. The authors didn't just show that something 'breaks.' They also figured out under what conditions the transition from compact to closed sets still works.

The key tool is the so-called 'tail criterion.' Its idea, setting aside the formulas, is this: the upper bounds of the large deviation principle extend to a closed set F if and only if the probability of π_ε being in the 'distant' part of F (that is, outside any predefined compact set) decays sufficiently quickly.

Another analogy. Imagine you are studying a bird population in a certain country. You can accurately count the birds in the central regions – where you have observers. But how can you be sure your estimates are correct for the entire country, including the most remote and little-studied corners? If the birds almost never fly to these remote areas, then everything is fine: your estimates remain accurate. But if the birds regularly and unpredictably migrate to the edge of the map, your central observations cease to provide a reliable overall picture.

Similarly, in the transport problem: if the entropic minimizer π_ε 'migrates' into the tails with increasing probability as ε decreases, no estimate built solely on its 'central' behavior will be accurate for closed sets.

Formally, the criterion looks like this: we take larger and larger 'compact cores' of the space (say, balls of radius R) and see how the probability of the plan being in F, but outside this core, behaves. If this probability decays faster than any exponential as R → ∞, the transition is valid. If not, the upper bounds for F may not hold.

In the constructed counterexample, this criterion is violated: the tail probability decays insufficiently fast precisely because the optimal plan π₀ itself has a non-compact support and 'lives' in the tails.

Why a Full-Fledged Large Deviation Principle Is Impossible

A Full-Fledged Large Deviation Principle: Why It's Impossible

An even stronger conclusion follows from all of the above. A full-fledged large deviation principle – one that works simultaneously for all closed and open sets with a single rate function – simply does not exist in this situation.

This sounds like a harsh verdict. But what does it mean in substance?

The rate function I(π) is a 'penalty list': it tells you how 'costly' each specific deviation from the optimum is. A full-fledged LDP requires this list to work correctly for both upper estimates (not too optimistic) and lower ones (not too pessimistic), while also being lower semi-continuous – that is, behaving 'nicely' at the limit.

The authors showed that when the optimal plan has a non-compact support, it's impossible to construct such a single, correct 'penalty list.' Any candidate rate function either underestimates the probability of tail events (making the upper bound incorrect) or has unsuitable analytical properties.

This is like trying to create a single delivery tariff for an entire country, including the most remote islands. Any single tariff would be either unfair to central residents (too high) or economically unfeasible for remote areas (too low). There is no universal solution – only approximations that work in limited contexts.

It's important to clarify: this does not mean the large deviation principle is useless or inapplicable to optimal transport problems in general. It works perfectly in 'compact' territories – where the supports of the distributions are bounded. It can also work in weaker topologies or on special subspaces of measures. But it cannot claim universality in the full space of measures with non-compact supports.

Practical Implications Beyond Pure Mathematics

Why This Matters Beyond Pure Mathematics

You might ask: so what? This is beautiful mathematics, but does it have any practical significance?

It does, and quite a bit. Entropic optimal transport is not just a theoretical construct. Since around 2013–2015, it has become a workhorse tool in machine learning: distances between distributions used in Generative Adversarial Networks (GANs) and related architectures are calculated precisely through transport problems. The large deviation principle is applied to assess algorithm stability, error probabilities, and behavior with rare input data.

If the distributions an algorithm works with have heavy tails – and this is exactly the case with financial data, medical images, and climate time series – then standard estimates of rare event probabilities can be systematically wrong. The system will appear more stable than it actually is. The risk will be underestimated.

An analogy from insurance: if an actuary assesses risks only based on the central part of a loss distribution – say, from 'normal' years – and ignores the tails, they will inevitably underprice insurance for catastrophic events. The formula worked. But not where it was needed most.

In statistical physics, non-compact supports appear everywhere – in problems of diffusion, the behavior of particles in unbounded potential fields, and phase transitions in infinite systems. The large deviation principle is a standard tool there for estimating fluctuation probabilities. Understanding its limitations is critical for correctly interpreting the results.

Finally, in stochastic optimization – a field where optimal transport meets decision-making problems under uncertainty – the study's results highlight the need to carefully check compactness conditions before applying standard theoretical tools.

Open Questions and Future Research Directions

What's Next: Open Questions

The authors outline several directions for future mathematical research.

First is the search for weaker topologies or special subspaces where a full-fledged large deviation principle still holds even with non-compact supports. Perhaps by relaxing some requirements on the rate function, some of the lost universality can be restored.

Second is the development of more refined 'tail' estimates that explicitly account for the behavior of distributions at infinity. This will require a new mathematical apparatus that goes beyond classical large deviation theory.

Third is the extension of these results to dynamic models of entropic optimal transport, where the 'transport' occurs not between two fixed distributions but unfolds over time. This is directly related to stochastic differential equations and Schrödinger bridges – models describing how physical systems transition from one state to another in the most 'economical' way.

Fourth is applied research in finance and climatology, where heavy tails are the norm, not the exception. Understanding exactly when and by how much standard estimates 'break' will allow for the construction of more honest and robust risk models.

Lessons from the Edges of the Mathematical Map

A Lesson from the Edges of the Map

This research is a reminder that mathematical tools, even the most powerful ones, have a scope of applicability. The large deviation principle works magnificently where the space doesn't 'escape' to infinity, where distributions are well-behaved, and where tails are sufficiently thin. But once you venture to the 'edges of the map' – where the optimal transport plan 'lives' on non-compact sets and tails carry real weight – the familiar estimates cease to be accurate.

This is not a catastrophe. It is knowledge. Knowing where a tool works and where it doesn't is the very foundation of using mathematics wisely.

Data doesn't 'lie.' But it knows how to whisper in a language we must learn to hear – especially when that whisper comes from the farthest edges of the space, where we've grown accustomed to not looking.

#analysis #methodology #machine learning #ai safety #mathematics #data #optimal transport #distribution analysis

Source: https://arxiv.org/abs/2604.20827v1

Original Title: Failure of ambient closed-set large-deviation upper bounds in entropic optimal transport

Article Publication Date: Apr 22, 2026

Original Article Author : Maja Gwozdz

Professor Lars Nielsen View Profile

«The data doesn't lie. But it can whisper in a language you have to learn before you can truly hear it.»

View Profile

I'm Lars, a mathematician who believes numbers make sense to everyone – if you talk with people, not at them. To me, one good graph can be more persuasive than a hundred equations.

Previous Article How AI Is Searching for an Alzheimer's Cure Among Thousands of Plants Next Article Ribbon Bridges Between Worlds: How Geometry Connects Spaces of Different Dimensions

Limits of Large Deviation Theory in Optimal Transport with Heavy Tails

When the Map Fails at the Edges

What Is Optimal Transport And Dust Analogy

Entropic Regularization Adding Chaos

The Large Deviation Principle for Rare Events

Compact vs Closed Sets Mathematical Difference

Counterexample: When Heavy Tails Break Rules

The Tail Criterion for Closed Sets: A Solution

Why a Full-Fledged Large Deviation Principle Is Impossible

Practical Implications Beyond Pure Mathematics

Open Questions and Future Research Directions

Lessons from the Edges of the Mathematical Map

Related Publications

Model Uncertainty as a Signal: What Happens When AI Encounters the Unknown

How Economists 'Hear' the Silence in Data: Markov Regimes and the Secret of Monetary Shocks

It Doesn't Matter Who You Are: The Math of Betting Is the Same for Everyone

From Research to Understanding

Neural Networks Involved in the Process

1. Research Summarization

2. Creating Text from Summary

3. step.translate-en.title

4. Editorial Review

5. Preparing Description for Illustration

6. Creating Illustration