Can you make the relevance of this research more tangible by giving an informal example of Alice and Bob "observing" two different correlated variables, as well as a shared "latent variable"? I have difficulty imagining what those variables would model in practice.
First post in a planned cluster on exact results for natural latents. Here, I connect some established results in classical information theory to natural latents.
Suppose Alice observes
It turns out "the thing they share" has two classical formalizations in information theory, they disagree with each other, and they both disagree with mutual information. The specific pattern of this disagreement is (I claim at the end) exactly the subject matter of natural latents.
This post is a quick introduction.
What can both parties extract? (Gács–Körner, 1973)
The most literal reading of "the thing they share" is a random variable that Alice can compute from alone, and Bob can compute from alone, with certainty. Both parties, looking at their own observation, write down the same .
is the entropy of the largest such .
The Gács–Körner common information
There's a nice picture of what can be. For finite-valued , draw the bipartite graph with an edge wherever a pair has positive probability. Any common variable must be constant on connected components of this graph (follow the edges: if and are both possible, then Bob's function must give the same answer on and , and so on). And the component label itself is a common variable.
The only extractable common randomness is the name of the connected component they already know they are in. This immediately reveals the property that makes simultaneously rigorous and useless: it only sees the zero pattern of the joint distribution. If the support graph is connected, that is, if every pair of values is reachable from every other, then , no matter how strong the correlation.
Worked Example: Brittleness
Let be a fair bit, , and except that with probability it's replaced by an independent fair bit.
ε
0
1 bit
1.000 bits
0.01
0 bits
0.955 bits
0.1
0 bits
0.714 bits
At ε = 0 the shared bit is extractable and bit. At , every combination is possible and the support graph becomes complete. crashes to 0, while mutual information degrades slowly[1]. One percent noise destroys extractable common structure entirely, while destroying almost none of the statistical common structure.
This is essentially the tiny-mixtures problem. It's why the natural latents framework had to be built on approximations. Exact common variables are measure-zero objects.
What does it take to simulate the correlation? (Wyner, 1975)
Wyner approaches "the thing they share" from the opposite direction. Instead of asking what can be extracted from the correlation, ask what it takes to simulate it: find a latent such that and are conditionally independent given , so that is the entire explanation of their dependence, and make as simple as possible.
The Wyner common information is
In natural-latents language, this should look familiar: the constraint is exactly zero mediation error. Wyner's quantity is the minimum complexity of an exact mediator.
Notice is always true: extractable structure is at most the mutual information, and any full explanation of the dependence must carry at least the mutual information. Also, the right inequality is typically strict. Explaining a correlation completely costs more bits than the correlation contains.
Worked Example: Binary
Let be a fair bit and flipped with probability .
The optimal mediator is pretty: let be a fair bit and set
where and are independent with , which comes to . Given , the two views independent by construction, and the construction reproduces the joint distribution exactly. Its complexity (that Wyner proved is optimal) is:
quantity
value
0 bits
0.531 bits
0.873 bits
So for a 10%-noisy bit: nothing is extractable, ~half a bit is shared statistically, and ~7/8 of a bit is needed to explain the sharing.
Worked Example: Gaussian
Consider unit-variance jointly Gaussian with correlation . Here, = 0 always holds for : Witsenhausen showed a non-constant common variable forces maximal correlation 1, and Gaussians have maximal correlation .
The optimal mediator is a standard normal with
Then the mediation is exact by construction, and the complexity works out to
At : , bits, bits. Explaining the correlation costs more than twice the correlation.
As , both and diverge, but the gap stays put: bit[2].
The sandwich, and when there's actually a "thing"
So we have, for every pair of variables,
with both gaps typically open. When does the sandwich collapse? Exactly when there's a variable that is simultaneously a common part (computable from each view) and a mediator (explains all the dependence): with extractable from both sides. In that case, and only in that case, "the thing they share" is unambiguous: all three notions name the same object. The shared-bit-plus-private-noise example at is the canonical case.
The collapse condition is structurally fragile. It requires the support to decompose and the dependence to be carried entirely by the decomposition. Generic distributions, and all nondegenerate Gaussian ones, fail it. "The thing two variables share" does not, in general, exist; what exist are the two ends of a sandwich and the gap between them.
The sandwich is the natural latents problem
Recall the natural latent conditions on a latent over , written as conditional mutual informations: mediation error , and redundancy errors and . Then:
I think this demonstrates why the natural latents framework is necessarily a theory of approximation: exact objects require the collapse of an inequality that is generically open. I claim this sharpens the question the framework needs answered. If exact naturality means sitting at both ends at once, then approximate naturality is about how close you can get to both ends at once, and the two gaps become two error floors:
Is half a bit actually the floor, or just what this particular construction gives? Is there a whole tradeoff curve between the two errors, and what does it look like? Can the floor be beaten by allowing a little of both errors at once? Those are the questions addressed in the next post. It turns out that in the Gaussian case, the entire tradeoff curve has a closed form. The half-bit floor is real, and the curve has some surprises in it (it never touches zero, and its minimax point is about a fifth of a bit). The post after that does vector-valued views.
Pointers
Gács & Körner (1973) and Wyner (1975) of course; Witsenhausen (1975) for the maximal-correlation characterization of common variables; the bivariate Gaussian is due to Xu, Liu & Chen; the double-Markov structure theorem is Problem 16.25 in Csiszár–Körner's textbook. The natural latent conditions are from Wentworth & Lorell (arXiv:2509.03780). And the verification script for every number in this post.
Next post: Approximate Natural Latents have Exact Prices.
I share a short script (linked at the end) that reproduces the numerics of all worked examples in this post.
Convention note: in this post, logarithms are base 2 throughout. All quantities are in bits.