Claim: Given constraints on the entropy of a latent variable , the redundancy and mediation errors are optimized exactly when the sum of mutual information is maximized.
Mediation is one of the main conditions for naturality of latent variables. Intuitively, an (approximate) mediator captures (approximately) all the correlation between a collection of observables . We will prove that a latent variable is an (optimal) approximate mediator it maximizes the sum of mutual information given constraint on entropy
Intuition: To maximize the sum of mutual information , it is beneficial to include information that is shared among multiple in (as that would increase multiple mutual information term). A mediator is a latent variable that captures all the shared information among the s, so it will be selected for when maximizing .
Proof:
The correspondence with the redundancy condition is quite simple: The sum of the redundancy errors are , so if we have a constraint of the form , then we have , and the sum of redundancy errors are minimized exactly when is maximized and .
We've shown that given constraints on , both the mediation and the redundancy conditions are minimized exactly when the sum of mutual information is maximized, we can use this to simplify the search for natural latents, and while optimizing for this quantity there is no tradeoff between the redundancy and mediation errors.
However, note that mediation error increases as decreases (the mediation error for the empty latent is simply total correlation), while the redundancy error increases with (which is why we imposed for mediation but for redundancy). So the entropy of the latent is exactly the parameter that represents the tradeoff between the mediation and redundancy errors.
In summary, we can picture a pareto-frontier of latent variables with maximal and different entropies, by ramping up the entropy of along the pareto-frontier we gradually increase the redundancy errors while reducing the mediation errors, and these are the only parameters relevant for latent variables that are pareto-optimal w.r.t the naturality conditions.