Two Classical Answers to "What do Two Variables Share?"

Haru

Read the introduction for the motivation.

This is the first content post in a planned cluster on exact results for natural latents. Here, I connect some established results in classical information theory to natural latents.

Suppose Alice observes and Bob observes , and the two variables are correlated. We'd like to talk about the thing they share—the common ingredient, the shared concept, the latent that both of them can see. Mutual information tells us how much they share, in bits. It does not tell us whether there is any thing—any actual random variable—that carries those shared bits.

It turns out "the thing they share" has two classical formalizations in information theory, they disagree with each other, and they both disagree with mutual information. The specific pattern of this disagreement is (I claim at the end) exactly the subject matter of natural latents.

This post is a quick introduction.

What can both parties extract? (Gács–Körner, 1973)

The most literal reading of "the thing they share" is a random variable that Alice can compute from alone, and Bob can compute from alone, with certainty. Both parties, looking at their own observation, write down the same .

The Gács–Körner common information is the entropy of the largest such .

There's a nice picture of what can be. For finite-valued , draw the bipartite graph with an edge wherever a pair has positive probability. Any common variable must be constant on connected components of this graph (follow the edges: if and are both possible, then Bob's function must give the same answer on and , and so on). And the component label itself is a common variable.

is the entropy of the connected-component label of the support graph.

The only extractable common randomness is the name of the connected component they already know they are in. This immediately reveals the property that makes simultaneously rigorous and useless: it only sees the zero pattern of the joint distribution. If the support graph is connected, that is, if every pair of values is reachable from every other, then , no matter how strong the correlation.

Worked Example: Brittleness

Let be a fair bit, , and except that with probability it's replaced by an independent fair bit.

ε
0	1 bit	1.000 bits
0.01	0 bits	0.955 bits
0.1	0 bits	0.714 bits

At ε = 0 the shared bit is extractable and bit. At , every combination is possible and the support graph becomes complete. crashes to 0, while mutual information degrades slowly.^[1]

One percent noise destroys extractable common structure entirely, while destroying almost none of the statistical common structure.

This is essentially the tiny-mixtures problem. It's why the natural latents framework had to be built on approximations. Exact common variables are measure-zero objects.

What does it take to simulate the correlation? (Wyner, 1975)

Wyner approaches "the thing they share" from the opposite direction. Instead of asking what can be extracted from the correlation, ask what it takes to simulate it: find a latent such that and are conditionally independent given , so that is the entire explanation of their dependence, and make as simple as possible.

The Wyner common information is

In natural-latents language, this should look familiar: the constraint is exactly zero mediation error. Wyner's quantity is the minimum complexity of an exact mediator.

Notice is always true: extractable structure is at most the mutual information, and any full explanation of the dependence must carry at least the mutual information. Also, the right inequality is typically strict. Explaining a correlation completely costs more bits than the correlation contains.

Worked Example: Binary

Let be a fair bit and flipped with probability .

The optimal mediator is pretty: let be a fair bit and set

where and are independent with , which comes to . Given , the two views independent by construction, and the construction reproduces the joint distribution exactly. Its complexity (that Wyner proved is optimal) is:

quantity	value
	0 bits
	0.531 bits
	0.873 bits

So for a 10%-noisy bit: nothing is extractable, ~half a bit is shared statistically, and ~7/8 of a bit is needed to explain the sharing.

Worked Example: Gaussian

Consider unit-variance jointly Gaussian with correlation . Here, = 0 always holds for : Witsenhausen showed a non-constant common variable forces maximal correlation 1, and Gaussians have maximal correlation .

The optimal mediator is a standard normal with

Then the mediation is exact by construction, and the complexity works out to

At : , bits, bits. Explaining the correlation costs more than twice the correlation.

As , both and diverge, but the gap stays put: bit.^[2]

The sandwich, and when there's actually a "thing"

So we have, for every pair of variables,

with both gaps typically open. When does the sandwich collapse? Exactly when there's a variable that is simultaneously a common part (computable from each view) and a mediator (explains all the dependence): with extractable from both sides. In that case, and only in that case, "the thing they share" is unambiguous: all three notions name the same object. The shared-bit-plus-private-noise example at is the canonical case.

The collapse condition is structurally fragile. It requires the support to decompose and the dependence to be carried entirely by the decomposition. Generic distributions, and all nondegenerate Gaussian ones, fail it. "The thing two variables share" does not, in general, exist; what exist are the two ends of a sandwich and the gap between them.

The sandwich is the natural latents problem

Recall the natural latent conditions on a latent over , written as conditional mutual informations: mediation error , and redundancy errors and . Then:

Zero mediation error is Wyner's constraint. The minimal complexity of any latent achieving it is , and the excess is an unavoidable surcharge.
Zero redundancy errors say is conditionally independent of each view given the other: the classical "double Markov" conditions. A structure theorem (Csiszár–Körner, Problem 16.25) says all of 's information about factors through the Gács–Körner common part. So zero-redundancy latents can carry at most bits about the system.
An exact natural latent demands both at once. So an exact natural latent exists iff the sandwich collapses: . Which, per the brittleness example, is an idealized condition that one percent of noise destroys.

I think this demonstrates why the natural latents framework is necessarily a theory of approximation: exact objects require the collapse of an inequality that is generically open. I claim this sharpens the question the framework needs answered. If exact naturality means sitting at both ends at once, then approximate naturality is about how close you can get to both ends at once, and the two gaps become two error floors:

at zero redundancy, the minimal mediation error is (everything, in the Gaussian case).
at zero mediation, redundancy errors are forced. In the Gaussian example above, the optimal Wyner latent can be checked directly: its redundancy error from each view is : half a bit per view as . The views can be almost identical, and the best exact mediator still can't be pinned down from one view to better than half a bit.
Is half a bit actually the floor, or just what this particular construction gives? Is there a whole tradeoff curve between the two errors, and what does it look like? Can the floor be beaten by allowing a little of both errors at once? Those are the questions addressed in the next post. It turns out that in the Gaussian case, the entire tradeoff curve has a closed form. The half-bit floor is real, and the curve has some surprises in it (it never touches zero, and its minimax point is about a fifth of a bit). The post after that does vector-valued views.

Pointers

Gács & Körner (1973) and Wyner (1975) of course; Witsenhausen (1975) for the maximal-correlation characterization of common variables; the bivariate Gaussian is due to Xu, Liu & Chen; the double-Markov structure theorem is Problem 16.25 in Csiszár–Körner's textbook. The natural latent conditions are from Wentworth & Lorell (arXiv:2509.03780). And the verification script for every number in this post.

Next post: Approximate Natural Latents have Exact Prices.

^{^}
I share a short script (linked at the end) that reproduces the numerics of all worked examples in this post.
^{^}
Convention note: in this post, logarithms are base 2 throughout. All quantities are in bits.

[-]cubefox2mo30

Can you make the relevance of this research more tangible by giving an informal example of Alice and Bob "observing" two different correlated variables, as well as a shared "latent variable"? I have difficulty imagining what those variables would model in practice.

[-]Haru1mo30

Thanks for the question, I should have been thinking more about pedagogy. Consider a situation where Alice and Bob have two different camera feeds observing the same room, as below.

Alice's feed is , Bob's is . They are correlated, and the shared latent variable is whatever's actually going on inside the room.

Gács–Körner asks if Alice and Bob can each apply some function to their own camera feed—without communicating—and be certain they get the same answer? The answer is basically no. Alice sees the front of the elephant; Bob sees the back. Any image Alice sees is compatible with many possible images on Bob's screen, and vice versa, because the feeds are a lossy projection of the underlying latent. The only functions guaranteed to agree are constant ones which carry no information. That's what means here.

Mutual Information measures how much one feed tells you about the other statistically. In this example, MI is large: if Alice see the front of an elephant through her feed, she can infer a lot about what Bob's feed should look like. It quantifies how many bits of uncertainty about Bob's feed get eliminated by seeing Alice's.

Wyner asks: what's the simplest explanation that fully accounts for the correlation? I.e., what variable do you need such that once you know , Alice's feed tells you nothing further about Bob's? Here, I think you need the full data-generating process (exact position of the elephant to a high degree of precision) which includes information hidden from both feeds. This is way more information than the two feeds share, because the explanation has to cover everything that could have generated correlated observations, not just what did.

I just published another post, the introduction to my research direction, where I talk about why I think this is relevant!

[-]cubefox1mo20

Thank you, this clarifies things a lot!

24