Daniel C's Shortform

Daniel C

LESSWRONG
LW

Daniel C's Shortform — LessWrong

Daniel C's Shortform

by Daniel C

22nd Dec 2025

1 min read

3

This is a special post for quick takes by Daniel C. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:10 PM

[-]Daniel C2mo150

Natural latent maximizes sum of mutual information with each observable

Claim: Given constraints on the entropy of a latent variable , the redundancy and mediation errors are optimized exactly when the sum of mutual information $\sum_{i} I (X_{i}; Λ)$ is maximized.

Why is this useful?

Early works on natural abstractions often talk about throwing away information while maintaining predictive power, but how should predictive power be measured? If you have multiple observables, you want to maximize predictive power for each of the observables you care about. In particular, you want your latent variable to include “shared information” that allows you to predict multiple variables at once. This is similar to maximizing $\sum_{i} I (X_{i}; Λ)$ while minimizing $H (Λ)$ (as we’re trying to throw away information).
Maximizing the sum of mutual information is likely computationally easier than directly optimizing for the mediation and redundancy conditions, so this framing could be helpful for anyone trying to find natural latents computationally

Mediation

Mediation is one of the main conditions for naturality of latent variables. Intuitively, an (approximate) mediator $Λ$ captures (approximately) all the correlation between a collection of observables $X_{1} \dots X_{n}$ . We will prove that a latent variable $Λ$ is an (optimal) approximate mediator $⟺$ it maximizes the sum of mutual information $\sum_{i} I (X_{i}; Λ)$ given constraint on entropy $H (Λ) \leq k$

Intuition: To maximize the sum of mutual information $\sum_{i} I (X_{i}; Λ)$ , it is beneficial to include information that is shared among multiple $X_{i}$ in $Λ$ (as that would increase multiple mutual information term). A mediator is a latent variable that captures all the shared information among the $X_{i}$ s, so it will be selected for when maximizing $\sum_{i} I (X_{i}; Λ)$ .

Proof:

Note that the mediation error of $Λ$ is equivelant to the conditional total correlation $T C (X | Λ) = D_{K L} (P (X_{1} \dots X_{n} | Λ) ∥ P (X_{1} | Λ) \dots P (X_{n} | Λ))$
We have $\sum_{i} I (X_{i}; Λ) = \sum_{i} H (X_{i}) - H (X_{i} | Λ)$ , since each $H (X_{i})$ term is fixed relative to $Λ$ , maximizing the sum of mutual information is equivalent to minimizing the sum of conditional entropies $\sum_{i} H (X_{i} | Λ)$
We have $D_{K L} (P (X_{1} \dots X_{n} | Λ) ∥ P (X_{1} | Λ) \dots P (X_{n} | Λ)) = T C (X | Λ) = \sum_{i} H (X_{i} | Λ) - H (X | Λ)$ , which means
$\sum_{i} H (X_{i} | Λ) = H (X | Λ) + T C (X | Λ) = H (X) - I (X; Λ) + T C (X | Λ) \geq H (X) - H (Λ) + T C (X | Λ)$ . For the last inequality we used $H (X) \geq I (X; Λ)$ , we can obtain equality if $Λ$ is a deterministic function of $X$ .
Hence, if we have a constraint on entropy $H (Λ) \leq k$ , then we have the lower bound $\sum_{i} H (X_{i} | Λ) \geq H (X) - H (Λ) + T C (X | Λ) \geq H (X) - k + T C (X | Λ)$ .
We can achieve equality on the lower bound by choosing a deterministic (w.r.t $X$ ) latent $Λ$ with entropy $H (Λ) = k$ . For such $Λ$ we have $\sum_{i} H (X_{i} | Λ) = H (X) - k + T C (X | Λ)$ , which means $\sum_{i} H (X_{i} | Λ)$ is minimized exactly when $T C (X | Λ)$ is minimized (as that's the only term which depends on $Λ$ )
Since $T C (X | Λ)$ is nonnegative and equivalent to the mediation error, we conclude that $\sum_{i} H (X_{i} | Λ)$ is minimized exactly when the mediation error is minimized, and an exact mediator ( $T C (X | Λ) = 0$ ) with entropy $k$ always achieves minimal $\sum_{i} H (X_{i} | Λ)$ (hence maximal $\sum_{i} I (X_{i}; Λ)$ ) among latent variables with entropy $\leq k$

Redundancy

The correspondence with the redundancy condition is quite simple: The sum of the redundancy errors are $\sum_{i} H (Λ | X_{i}) = n H (Λ) - \sum_{i} I (X_{i}; Λ)$ , so if we have a constraint of the form $H (Λ) \geq k$ , then we have $\sum_{i} H (Λ | X_{i}) = n H (Λ) - \sum_{i} I (X_{i}; Λ) \geq n k - \sum_{i} I (X_{i}; Λ)$ , and the sum of redundancy errors are minimized exactly when $\sum_{i} I (X_{i}; Λ)$ is maximized and $H (Λ) = k$ .

Putting it together

We've shown that given constraints on $H (Λ)$ , both the mediation and the redundancy conditions are minimized exactly when the sum of mutual information $\sum_{i} I (X_{i}, Λ)$ is maximized, we can use this to simplify the search for natural latents, and while optimizing for this quantity there is no tradeoff between the redundancy and mediation errors.

However, note that mediation error increases as $H (Λ)$ decreases (the mediation error for the empty latent is simply total correlation), while the redundancy error increases with $H (Λ)$ (which is why we imposed $H (Λ) \leq k$ for mediation but $H (Λ) \geq k$ for redundancy). So the entropy of the latent is exactly the parameter that represents the tradeoff between the mediation and redundancy errors.

In summary, we can picture a pareto-frontier of latent variables with maximal $\sum_{i} I (X_{i}; Λ)$ and different entropies, by ramping up the entropy of $Λ$ along the pareto-frontier we gradually increase the redundancy errors while reducing the mediation errors, and these are the only parameters relevant for latent variables that are pareto-optimal w.r.t the naturality conditions.

[-]Daniel C1mo*30

Preserving mutual information terms ( Stochastic $⟹$ Deterministic Natural latent)

(See this post for background about the stochastic $\to$ deterministic natural latent conjecture)

We've shown that given fixed $H (Λ)$ , both the redundancy and mediation errors of a latent $Λ$ are minimized when $\sum_{i} I (X_{i}, Λ)$ is maximized, while $H (Λ)$ is exactly the parameter the determines the tradeoff between redundancy and mediation errors (among pareto-optimal latent). We'll discuss how this could open up new angles of attack for the stochastic $\to$ deterministic natural latent conjecture.

Suppose that we have a stochastic natural latent $Λ$ that satisfies:

$I (X_{2}; Λ | X_{1}) \leq ϵ$

$I (X_{1}; Λ | X_{2}) \leq ϵ$

$T C (X | Λ) = I (X_{1}; X_{2} | Λ) \leq ϵ$

From our result, we know that to construct a deterministic natural latent $Λ^{'}$ , all we have to do is to determine the entropy $H (Λ^{'})$ and then select the latent that maximizes $\sum_{i} I (X_{i}, Λ^{'})$ . The latter ensures that the latent is pareto-optimal w.r.t the mediation and determinism conditions, while the former selects a particular point on the pareto-frontier.

Now suppose that our stochastic natural latent has a particular amount of mutual information with the joint observables $I (X_{1}, X_{2}; Λ)$ . If the stochastic natural latent was a deterministic function of the observables, then we would have:

$H (Λ) = I (X_{1}, X_{2}; Λ)$ (as that would imply $H (Λ | X_{1}, X_{2}) = H (Λ) - I (X_{1}, X_{2}; Λ) = 0$ )

So one heuristic for constructing a deterministic natural latent is to just set $H (Λ^{'}) = I (X_{1}, X_{2}; Λ)$ and maximize $\sum_{i} I (X_{i}, Λ^{'})$ given the entropy constraint (so that $Λ^{'}$ hopefully captures all the mutual info between $Λ$ and $X$ ). We will show that if $Λ^{'}$ preserves the mutual information with each observable (i.e. $I (X_{i}; Λ^{'}) = I (X_{i}; Λ), i = 1, 2$ ), then the mediation condition is conserved and the stochastic redundancy conditions implies the deterministic redundancy conditions

Preserving mutual information terms $⟹$ Mediation is conserved

Note that the mediation error is $T C (X | Λ) = \sum_{i} H (X_{i} | Λ) - H (X | Λ) = \sum_{i} H (X_{i}) - I (X_{i}; Λ) - H (X) + I (X; Λ)$

Since all $H (X_{i})$ and $H (X)$ terms are fixed relative to $Λ$ , the mediation error is completely unchanged if we replace $Λ$ with a deterministic latent $Λ^{'}$ that satisfies $I (X; Λ^{'}) = H (Λ^{'}) = I (X; Λ)$ and $I (X_{i}; Λ^{'}) = I (X_{i}, Λ)$ for each $i$ .

Preserving mutual information terms $⟹$ Redundancy is conserved

Note that using partial information decomposition^[1], we can decompose the stochastic redundancy errors as the following:

$I (X_{1}; Λ | X_{2}) = Syn (X_{1}, X_{2}; Λ) + Uniq (X_{1}; Λ) < ϵ ⟹ Uniq (X_{1}; Λ) < ϵ$

$I (X_{2}; Λ | X_{1}) = Syn (X_{1}, X_{2}; Λ) + Uniq (X_{2}; Λ) < ϵ ⟹ Uniq (X_{2}; Λ) < ϵ$

where $Syn (X_{1}, X_{2}; Λ)$ represents synergistic information of $X_{2}$ and $Λ$ w.r.t $Λ$ while $Uniq (X_{1}; Λ)$ represents unique information of $X_{1}$ w.r.t $Λ$ . Intuitively, $I (X_{1}; Λ | X_{2})$ represents the information that $X_{1}$ has about $Λ$ when we have access to $X_{2}$ , which should include unique information that we can only derive from $X_{1}$ but not $X_{2}$ , but also synergistic information that we can only derive when we have both $X_{1}$ and $X_{2}$ .

We also have:

$I (X_{1}; Λ) = Red (X_{1}, X_{2}; Λ) + Uniq (X_{1}; Λ)$

$I (X_{2}; Λ) = Red (X_{1}, X_{2}; Λ) + Uniq (X_{2}; Λ)$

Intuitively, this is because $I (X_{1}; Λ)$ contains both the unique information about $Λ$ that you can only derive from $X_{1}$ but not $X_{2}$ , and also the redundant information that you can derive from either $X_{1}$ or $X_{2}$ . Note that since $0 \leq Uniq (X_{1}; Λ) \leq ϵ$ and $0 \leq Uniq (X_{2}; Λ) \leq ϵ$ , we have

$Red (X_{1}, X_{2}; Λ) \leq I (X_{1}; Λ) \leq Red (X_{1}, X_{2}; Λ) + ϵ$

$Red (X_{1}, X_{2}; Λ) \leq I (X_{2}; Λ) \leq Red (X_{1}, X_{2}; Λ) + ϵ$

Similarly, we have:

$I (X_{1}, X_{2}; Λ) = Red (X_{1}, X_{2}; Λ) + Uniq (X_{1}; Λ) + Uniq (X_{2}; Λ) + Syn (X_{1}, X_{2}; Λ)$

where

$Uniq (X_{1}; Λ) + Uniq (X_{2}; Λ) + Syn (X_{1}, X_{2}; Λ) \leq 2 ϵ ⟹ Red (X_{1}, X_{2}; Λ) \leq I (X_{1}, X_{2}; Λ) \leq Red (X_{1}, X_{2}; Λ) + 2 ϵ$

As a result, both $I (X_{1}, X_{2}; Λ) - I (X_{1}; Λ)$ and $I (X_{1}, X_{2}; Λ) - I (X_{2}; Λ)$ are bounded by $2 ϵ$ . This means that if we can find a deterministic latent $Λ^{'}$ that conserves all the relevant mutual information terms $I (X_{1}, X_{2}; | Λ)$ , $I (X_{1}; | Λ)$ and $I (X_{2}; | Λ)$ , then we can bound the deterministic redundancy errors:

$H (Λ^{'} | X_{1}) = H (Λ^{'}) - I (X_{1}; Λ^{'}) = I (X_{1}, X_{2}; Λ) - I (X_{1}; Λ) < 2 ϵ$ $H (Λ^{'} | X_{2}) = H (Λ^{'}) - I (X_{2}; Λ^{'}) = I (X_{1}, X_{2}; Λ) - I (X_{2}; Λ) < 2 ϵ$

Conclusion

We've shown that a sufficient condition for mediation and redundancy to transfer from the stochastic to deterministic case is if the deterministic latent preserves the mutual information of the stochastic latent with both the joint observable $X$ as well as the individual observables $X_{1}$ and $X_{2}$ . Given this, the remaining task would be to prove that such a deterministic latent always exists, or that it can preserve the mutual information terms up to some small error. In particular, if existence is guaranteed, then a tractable way to find the deterministic latent $Λ^{'}$ given a stochastic latent $Λ$ is to just set $H (Λ^{'}) = I (X; Λ)$ and maximize $\sum_{i} I (X_{i}; Λ^{'})$

^{^}
Note that PID depends on a choice of redundancy measure, but our proof holds for any choice that guarantees non-negativity of PID atoms

[-]Daniel C2mo10

Maximum redund=Maximal redund, Minimum mediator=Minimal mediator, & Naturality=Shrinking regimes of tradeoffs

Previously, we've shown that given constraints on the entropy of a natural latent variable , the mediation and redundancy errors are minimized exactly when the sum of mutual information with observables $\sum_{i} I (X_{i}; Λ)$ is maximized. In addition, the entropy $H (Λ)$ of the latent variable is exactly the parameter that represents the tradeoff between the mediation and redundancy condition. In particular, the mediation error can only reduce with $H (Λ)$ while the redundancy errors can only increase with $H (Λ)$ .

However, there may be regimes where changes in $H (Λ)$ can reduce the mediation error without increasing the redundancy errors or vice versa. For instance:

If we gradually increase $H (Λ)$ while perfectly preserving the redundancy conditions $H (Λ | X_{i}) = 0, \forall i$ , then we can reduce the mediation error (as it can only reduce with increasing $H (Λ)$ ) without increasing the redundancy errors (as they stay 0). Increasing $H (Λ)$ therefore becomes a weak pareto-improvement over the naturality conditions
Similarly, if we gradually reduce $H (Λ)$ while perfectly preserving the mediation condition $T C (X | Λ) = 0$ , then we can reduce the redundancy errors without increasing the mediation error (as it stays 0)

If we define a maximum redund $Λ_{*}$ as a latent variable that satisfies the redundancy conditions $H (Λ_{*} | X_{i}) = 0$ and has the maximum entropy among redunds, then $H (Λ) \leq H (Λ_{*})$ represents the regime where we can increase $H (Λ)$ without increasing the redundancy errors, since increasing $H (Λ)$ beyond $H (Λ_{*})$ would necessarily violate the redundancy condition given our assumption of maximum entropy

Similarly, define a minimum mediator $Λ^{*}$ as a mediator with minimal entropy (among mediators). Then $H (Λ) \geq H (Λ^{*})$ represents the regime where we can reduce entropy without increasing the mediation error, since reducing $H (Λ)$ below $H (Λ^{*})$ necessarily violates the mediation condition.

Combining these ideas, $H (Λ_{*}) \leq H (Λ) \leq H (Λ^{*})$ represents the regime where changing $H (Λ)$ actually presents a tradeoff between the mediation and redundancy errors; the minimum mediator and maximum redund marks the boundaries for when weak pareto-improvements are possible.

Maximal redunds and minimal mediators

In natural latents we care about the uniqueness of latent variables, which is why we have concepts like minimal mediators and maximal redunds:

A minimal mediator is a mediator $Λ$ such that for any other mediator $Λ^{'}$ we have $H (Λ | Λ^{'}) < ϵ$ . So a minimal mediator is an approximately deterministic function of any other mediator
A maximal redund is a redund $Λ$ such that for any other redund $Λ^{'}$ we have $H (Λ^{'} | Λ) < ϵ$ . So any redund is approximately a deterministic function of the maximal redund

Through a universal-property-flavored proof, we can show approximate isomorphism among any pair of minimal mediators $X, Y$ : Since $X$ is a minimal mediator and $Y$ is a mediator, $Y$ approximately determines $X$ , and using a similar argument we conclude $X$ determines $Y$ . The same reasoning also allows us to derive uniqueness of any pair of maximal redunds. Naturality occurs when the maximal redund converge with the minimal mediator.

However, note that the concepts of minimal mediators and maximal redunds are at least conceptually distinct from minimum mediators and maximum redunds. We shall therefore prove that these concepts are mathematically equivalent. This can be useful because it's much easier to find minimum mediators and maximum redunds computationally, but we ultimately care about the unqiueness property offered by minimal mediators and maximal redunds, proving an equivalence enables the former to have the uniqueness guarantees of the latter.

Minimum mediator = Minimal mediator (when minimal mediator exists)

Proof:

Let $Y$ be a minimal mediator and $Z$ be a minimum mediator
Since $Y$ is a minimal mediator and $Z$ is a mediator, we have $H (Y | Z) < ϵ$ which means $I (Y; Z) \geq H (Y) - ϵ$
$H (Z | Y) = H (Z) - I (Z; Y) \leq H (Z) - H (Y) + ϵ$
Since $Z$ has minimal entropy, we have $H (Z) - H (Y) < 0$ which means $H (Z | Y) < ϵ$
For any other latent $Λ$ , we have $H (Z | Λ) \leq H (Z | Y) + H (Y | Λ) < 2 ϵ$ , which means $Z$ is also a minimal mediator (up to error $2 ϵ$ )

In addition, we have $H (Y) \leq H (Y, Z) = H (Z) + H (Y | Z) \leq H (Z) + ϵ$ so $Y$ is also an approximate minimum mediator

Maximum redund= Maximal redund

Proof:

Suppose that $Z$ is maximum redund of $X_{1} \dots X_{n}$ and $Y$ is any other redund
$(Z, Y)$ is a redund since both $Z$ and $Y$ are deterministic functions of any $X_{i}$ , since $Z$ is maximum redund we have $H (Z) \geq H (Z, Y)$
$H (Z, Y) = H (Z) + H (Y | Z)$ hence we have $H (Z) \geq H (Z) + H (Y | Z)$ . Since $H (Y | Z)$ is nonegative, we must have $H (Y | Z) = 0$
As a result, $H (Z)$ is also a maximal redund

Similarly, suppose that $Y$ is a maximal redund, then $H (Z) \leq H (Z, Y) = H (Y) + H (Z | Y) \leq H (Y) + ϵ$ , which means $H (Y) \geq H (Z) - ϵ$ and $Y$ is also an approximate maximum redund.

Naturality as shrinking regime of tradeoffs

Recall that $H (Λ_{*}) \leq H (Λ) \leq H (Λ^{*})$ (where $Λ^{*}$ is the minimum mediator and $Λ_{*}$ is the maximum redund) represents the regime where changing $H (Λ)$ actually presents a tradeoff between the mediation and redundancy errors. Due to the equivalence we proved, we can also think of $Λ^{*}$ as the minimal mediator and $Λ_{*}$ as the maximal redund.

We also know that naturality occurs when the minimal mediator converges with the maximal redund (as a natural latent satisfies both mediation and redundancy, and mediator determines redund); we can picture this convergence as if we're shrinking the gap between $H (Λ_{*})$ and $H (Λ^{*})$ . In other words, naturality occurs exactly when the regime of tradeoff ( $H (Λ_{*}) \leq H (Λ) \leq H (Λ^{*})$ ) between the redundancy and mediation error is small. If we have exact naturality $H (Λ_{*}) = H (Λ^{*})$ , then pareto-improvements on the naturality conditions can always be made by nudging $H (Λ)$ closer to $H (Λ^{*})$ .

Combining this with our previous result, we conclude that that maximizing $\sum_{i} I (X_{i}, Λ)$ represents strong pareto-improvements over the naturality conditions; $H (Λ) \leq H (Λ_{*})$ and $H (Λ) \geq H (Λ^{*})$ represents the regime where we can have weak pareto-improvements by nudging $H (Λ)$ closer to the boundary of $H (Λ_{*})$ or $H (Λ_{*})$ ; whereas $H (Λ_{*}) \leq H (Λ) \leq H (Λ^{*})$ represents the regime of real tradeoffs between naturality conditions. An approximate natural latent exist exactly when the regime of real tradeoffs is small and we can pareto-improve towards naturality

Moderation Log

Natural latent maximizes sum of mutual information with each observable

Claim: Given constraints on the entropy of a latent variable , the redundancy and mediation errors are optimized exactly when the sum of mutual information $\sum_{i} I (X_{i}; Λ)$ is maximized.

Why is this useful?

Early works on natural abstractions often talk about throwing away information while maintaining predictive power, but how should predictive power be measured? If you have multiple observables, you want to maximize predictive power for each of the observables you care about. In particular, you want your latent variable to include “shared information” that allows you to predict multiple variables at once. This is similar to maximizing $\sum_{i} I (X_{i}; Λ)$ while minimizing $H (Λ)$ (as we’re trying to throw away information).
Maximizing the sum of mutual information is likely computationally easier than directly optimizing for the mediation and redundancy conditions, so this framing could be helpful for anyone trying to find natural latents computationally

Mediation

Proof:

Note that the mediation error of $Λ$ is equivelant to the conditional total correlation $T C (X | Λ) = D_{K L} (P (X_{1} \dots X_{n} | Λ) ∥ P (X_{1} | Λ) \dots P (X_{n} | Λ))$
We have $\sum_{i} I (X_{i}; Λ) = \sum_{i} H (X_{i}) - H (X_{i} | Λ)$ , since each $H (X_{i})$ term is fixed relative to $Λ$ , maximizing the sum of mutual information is equivalent to minimizing the sum of conditional entropies $\sum_{i} H (X_{i} | Λ)$
We have $D_{K L} (P (X_{1} \dots X_{n} | Λ) ∥ P (X_{1} | Λ) \dots P (X_{n} | Λ)) = T C (X | Λ) = \sum_{i} H (X_{i} | Λ) - H (X | Λ)$ , which means
$\sum_{i} H (X_{i} | Λ) = H (X | Λ) + T C (X | Λ) = H (X) - I (X; Λ) + T C (X | Λ) \geq H (X) - H (Λ) + T C (X | Λ)$ . For the last inequality we used $H (X) \geq I (X; Λ)$ , we can obtain equality if $Λ$ is a deterministic function of $X$ .
Hence, if we have a constraint on entropy $H (Λ) \leq k$ , then we have the lower bound $\sum_{i} H (X_{i} | Λ) \geq H (X) - H (Λ) + T C (X | Λ) \geq H (X) - k + T C (X | Λ)$ .
We can achieve equality on the lower bound by choosing a deterministic (w.r.t $X$ ) latent $Λ$ with entropy $H (Λ) = k$ . For such $Λ$ we have $\sum_{i} H (X_{i} | Λ) = H (X) - k + T C (X | Λ)$ , which means $\sum_{i} H (X_{i} | Λ)$ is minimized exactly when $T C (X | Λ)$ is minimized (as that's the only term which depends on $Λ$ )
Since $T C (X | Λ)$ is nonnegative and equivalent to the mediation error, we conclude that $\sum_{i} H (X_{i} | Λ)$ is minimized exactly when the mediation error is minimized, and an exact mediator ( $T C (X | Λ) = 0$ ) with entropy $k$ always achieves minimal $\sum_{i} H (X_{i} | Λ)$ (hence maximal $\sum_{i} I (X_{i}; Λ)$ ) among latent variables with entropy $\leq k$

Redundancy

Putting it together

[-]Daniel C1mo*30

Preserving mutual information terms ( Stochastic $⟹$ Deterministic Natural latent)

(See this post for background about the stochastic $\to$ deterministic natural latent conjecture)

Suppose that we have a stochastic natural latent $Λ$ that satisfies:

$I (X_{2}; Λ | X_{1}) \leq ϵ$

$I (X_{1}; Λ | X_{2}) \leq ϵ$

$T C (X | Λ) = I (X_{1}; X_{2} | Λ) \leq ϵ$

$H (Λ) = I (X_{1}, X_{2}; Λ)$ (as that would imply $H (Λ | X_{1}, X_{2}) = H (Λ) - I (X_{1}, X_{2}; Λ) = 0$ )

Preserving mutual information terms $⟹$ Mediation is conserved

Note that the mediation error is $T C (X | Λ) = \sum_{i} H (X_{i} | Λ) - H (X | Λ) = \sum_{i} H (X_{i}) - I (X_{i}; Λ) - H (X) + I (X; Λ)$

Preserving mutual information terms $⟹$ Redundancy is conserved

Note that using partial information decomposition^[1], we can decompose the stochastic redundancy errors as the following:

$I (X_{1}; Λ | X_{2}) = Syn (X_{1}, X_{2}; Λ) + Uniq (X_{1}; Λ) < ϵ ⟹ Uniq (X_{1}; Λ) < ϵ$

$I (X_{2}; Λ | X_{1}) = Syn (X_{1}, X_{2}; Λ) + Uniq (X_{2}; Λ) < ϵ ⟹ Uniq (X_{2}; Λ) < ϵ$

We also have:

$I (X_{1}; Λ) = Red (X_{1}, X_{2}; Λ) + Uniq (X_{1}; Λ)$

$I (X_{2}; Λ) = Red (X_{1}, X_{2}; Λ) + Uniq (X_{2}; Λ)$

$Red (X_{1}, X_{2}; Λ) \leq I (X_{1}; Λ) \leq Red (X_{1}, X_{2}; Λ) + ϵ$

$Red (X_{1}, X_{2}; Λ) \leq I (X_{2}; Λ) \leq Red (X_{1}, X_{2}; Λ) + ϵ$

Similarly, we have:

$I (X_{1}, X_{2}; Λ) = Red (X_{1}, X_{2}; Λ) + Uniq (X_{1}; Λ) + Uniq (X_{2}; Λ) + Syn (X_{1}, X_{2}; Λ)$

where

$Uniq (X_{1}; Λ) + Uniq (X_{2}; Λ) + Syn (X_{1}, X_{2}; Λ) \leq 2 ϵ ⟹ Red (X_{1}, X_{2}; Λ) \leq I (X_{1}, X_{2}; Λ) \leq Red (X_{1}, X_{2}; Λ) + 2 ϵ$

$H (Λ^{'} | X_{1}) = H (Λ^{'}) - I (X_{1}; Λ^{'}) = I (X_{1}, X_{2}; Λ) - I (X_{1}; Λ) < 2 ϵ$ $H (Λ^{'} | X_{2}) = H (Λ^{'}) - I (X_{2}; Λ^{'}) = I (X_{1}, X_{2}; Λ) - I (X_{2}; Λ) < 2 ϵ$

Conclusion

^{^}
Note that PID depends on a choice of redundancy measure, but our proof holds for any choice that guarantees non-negativity of PID atoms

[-]Daniel C2mo10

Maximum redund=Maximal redund, Minimum mediator=Minimal mediator, & Naturality=Shrinking regimes of tradeoffs

However, there may be regimes where changes in $H (Λ)$ can reduce the mediation error without increasing the redundancy errors or vice versa. For instance:

If we gradually increase $H (Λ)$ while perfectly preserving the redundancy conditions $H (Λ | X_{i}) = 0, \forall i$ , then we can reduce the mediation error (as it can only reduce with increasing $H (Λ)$ ) without increasing the redundancy errors (as they stay 0). Increasing $H (Λ)$ therefore becomes a weak pareto-improvement over the naturality conditions
Similarly, if we gradually reduce $H (Λ)$ while perfectly preserving the mediation condition $T C (X | Λ) = 0$ , then we can reduce the redundancy errors without increasing the mediation error (as it stays 0)

Maximal redunds and minimal mediators

In natural latents we care about the uniqueness of latent variables, which is why we have concepts like minimal mediators and maximal redunds:

A minimal mediator is a mediator $Λ$ such that for any other mediator $Λ^{'}$ we have $H (Λ | Λ^{'}) < ϵ$ . So a minimal mediator is an approximately deterministic function of any other mediator
A maximal redund is a redund $Λ$ such that for any other redund $Λ^{'}$ we have $H (Λ^{'} | Λ) < ϵ$ . So any redund is approximately a deterministic function of the maximal redund

Minimum mediator = Minimal mediator (when minimal mediator exists)

Proof:

Let $Y$ be a minimal mediator and $Z$ be a minimum mediator
Since $Y$ is a minimal mediator and $Z$ is a mediator, we have $H (Y | Z) < ϵ$ which means $I (Y; Z) \geq H (Y) - ϵ$
$H (Z | Y) = H (Z) - I (Z; Y) \leq H (Z) - H (Y) + ϵ$
Since $Z$ has minimal entropy, we have $H (Z) - H (Y) < 0$ which means $H (Z | Y) < ϵ$
For any other latent $Λ$ , we have $H (Z | Λ) \leq H (Z | Y) + H (Y | Λ) < 2 ϵ$ , which means $Z$ is also a minimal mediator (up to error $2 ϵ$ )

In addition, we have $H (Y) \leq H (Y, Z) = H (Z) + H (Y | Z) \leq H (Z) + ϵ$ so $Y$ is also an approximate minimum mediator

Maximum redund= Maximal redund

Proof:

Suppose that $Z$ is maximum redund of $X_{1} \dots X_{n}$ and $Y$ is any other redund
$(Z, Y)$ is a redund since both $Z$ and $Y$ are deterministic functions of any $X_{i}$ , since $Z$ is maximum redund we have $H (Z) \geq H (Z, Y)$
$H (Z, Y) = H (Z) + H (Y | Z)$ hence we have $H (Z) \geq H (Z) + H (Y | Z)$ . Since $H (Y | Z)$ is nonegative, we must have $H (Y | Z) = 0$
As a result, $H (Z)$ is also a maximal redund

Naturality as shrinking regime of tradeoffs

Moderation Log

Daniel C's Shortform

3

Natural latent maximizes sum of mutual information with each observable

Why is this useful?

Mediation

Redundancy

Putting it together

Preserving mutual information terms ⟹ Mediation is conserved

Preserving mutual information terms ⟹ Redundancy is conserved

Conclusion

Maximum redund=Maximal redund, Minimum mediator=Minimal mediator, & Naturality=Shrinking regimes of tradeoffs

Maximal redunds and minimal mediators

Minimum mediator = Minimal mediator (when minimal mediator exists)

Maximum redund= Maximal redund

Naturality as shrinking regime of tradeoffs

Natural latent maximizes sum of mutual information with each observable

Why is this useful?

Mediation

Redundancy

Putting it together

Preserving mutual information terms ⟹ Mediation is conserved

Preserving mutual information terms ⟹ Redundancy is conserved

Conclusion

Maximum redund=Maximal redund, Minimum mediator=Minimal mediator, & Naturality=Shrinking regimes of tradeoffs

Maximal redunds and minimal mediators

Minimum mediator = Minimal mediator (when minimal mediator exists)

Maximum redund= Maximal redund

Naturality as shrinking regime of tradeoffs

Preserving mutual information terms $⟹$ Mediation is conserved

Preserving mutual information terms $⟹$ Redundancy is conserved

Preserving mutual information terms $⟹$ Mediation is conserved

Preserving mutual information terms $⟹$ Redundancy is conserved