This post is a comment on Natural Latents: Latent Variables Stable Across Ontologies by John Wentworth and David Lorell. It assumes some familiarity with that work and does not attempt to explain it. Instead, I present an alternative proof that was developed as an exercise to aid my own understanding. While the original theorem and proof are written in the language of graphical models, mine instead uses the language of information theory. My proof has the advantage of being algebraically succinct, while theirs has the advantage of developing the machinery to work directly with causal structures. Very often, seeing multiple explanations of a fact helps us understand it, so I hope someone finds this post useful.

Specifically, we are concerned with their Theorem 1 (Mediator Determines Redund): both the older Iliad 1 version for stochastic latents, and the newer arXiv version for deterministic latents. I will translate each theorem into the language of information theory: Wentworth & Lorell's assumptions will imply mine, while their conclusions will be equivalent to mine. The equivalences follow from the d-separation criterion and the fact that independence is equivalent to zero mutual information.

In the restated new theorem, the latent variable $Λ$ is a mediator between subsets A and B of the data, meaning that it contains essentially all of the information in common between A and B, whereas the latent variable $Λ^{'}$ is a redund between A and B, meaning it essentially only contains information that is in common between A and B.^[1]

New Theorem 1 (deterministic latents)

Let A,B be disjoint subsets of {1,...,n}.

Suppose the random variables $X_{1}, \dots X_{n}, Λ, Λ^{'}$ satisfy the following:

$Λ$ Mediation: $I (X_{A} : X_{B} ∣ Λ) \leq ϵ_{m e d}$ ,

$Λ^{'}$ Redundancy: $H (Λ^{'} ∣ X_{A}) \leq ϵ_{r e d}$ and $H (Λ^{'} ∣ X_{B}) \leq ϵ_{r e d}$ .

Then, $H (Λ^{'} ∣ Λ) \leq ϵ_{m e d} + 2 ϵ_{r e d}$ .

Proof

$H (Λ^{'} ∣ Λ)$

$= H (Λ^{'} ∣ X_{B}, Λ) + I (Λ^{'} : X_{B} ∣ Λ)$ by definition of conditional mutual information,

$\leq H (Λ^{'} ∣ X_{B}) + I (X_{A} : X_{B} ∣ Λ) + H (Λ^{'} ∣ X_{A})$ by information theory inequalities,

$\leq ϵ_{m e d} + 2 ϵ_{r e d}$ by Redundancy and Mediation.

Old Theorem 1 (stochastic latents)

Suppose the random variables $X_{1}, \dots X_{n}, Λ, Λ^{'}$ satisfy the following:

Independent Latents: $I (Λ : Λ^{'} ∣ X) \leq ϵ_{i n d}$ ,

$Λ$ Mediation: $I (X_{j} : X_{- j} ∣ Λ) \leq ϵ_{m e d}$ for all j,

$Λ^{'}$ Redundancy: $I (Λ^{'} : X_{j} ∣ X_{- j}) \leq ϵ_{r e d}$ for all j.

Then, $I (Λ^{'} : X ∣ Λ) \leq n (ϵ_{i n d} + ϵ_{m e d} + ϵ_{r e d})$ .

Proof

First, we have

$I (Λ^{'} : X_{j} ∣ X_{- j}) - I (Λ^{'} : X_{j} ∣ Λ, X_{- j})$

$= I (Λ^{'} : X_{j} : Λ ∣ X_{- j})$ by definition of 3-way interaction information,

$= I (Λ^{'} : Λ : X_{j} ∣ X_{- j})$ by symmetry of 3-way interaction information,

$= I (Λ^{'} : Λ ∣ X_{- j}) - I (Λ^{'} : Λ ∣ X_{j}, X_{- j})$

$\geq - I (Λ^{'} : Λ ∣ X_{j}, X_{- j})$

$\geq - ϵ_{i n d}$ by Independent Latents.

Therefore,

$I (Λ^{'} : X_{j} ∣ Λ)$

$\leq I ((Λ^{'}, X_{- j}) : X_{j} ∣ Λ)$

$= I (X_{- j} : X_{j} ∣ Λ) + I (Λ^{'} : X_{j} ∣ Λ, X_{- j})$ by mutual information chain rule,

$\leq I (X_{- j} : X_{j} ∣ Λ) + I (Λ^{'} : X_{j} ∣ X_{- j}) + ϵ_{i n d}$ by the above derivation,

$\leq ϵ_{i n d} + ϵ_{m e d} + ϵ_{r e d}$ by Mediation and Redundancy.

The result now follows by summing over all j=1,...,n.

^{^}
Since probabilistic models are often only defined in terms of a latent structure, you might find it philosophically suspect to impose a joint distribution on all variables including the latents. If so, feel free to replace the random variables with their specific instantiations: the derivations go through almost identically with Kolmogorov complexity and algorithmic mutual information replacing the Shannon entropy and mutual information, respectively.

LESSWRONG
LW

LESSWRONG
LW

30

Stability of natural latents in information theoretic terms

30

30

New Theorem 1 (deterministic latents)

Proof

Old Theorem 1 (stochastic latents)

Proof