Integrating Hidden Variables Improves Approximation

johnswentworth

Abstraction 2020

15 Integrating Hidden Variables Improves Approximation

by johnswentworth

16th Apr 2020

AI Alignment Forum

1 min read

15 Ω 7

Fun fact: the KL divergence of distribution $P [X]$ from distribution $Q [X]$ is convex in the pair $P, Q$ . Writing it out: $D_{K L} (λ P_{1} [X] + (1 - λ) P_{2} [X] | | λ Q_{1} [X] + (1 - λ) Q_{2} [X]) \leq λ D_{K L} (P_{1} [X] | | Q_{1} [X]) + (1 - λ) D_{K} L (P_{2} [X] | | Q_{2} [X])$ with $0 \leq λ \leq 1$ .

This is particularly interesting if we take $P$ and $Q$ to be two different models, and take the indices 1, 2 to be different values of another random variable $Y$ with distribution $P [Y]$ given by $(λ, 1 - λ)$ . In that case, the above inequality becomes:

$D_{K L} (P [X] | | Q [X]) \leq E_{Y} [D_{K L} (P [X | Y] | | Q [X | Y])]$

In English: the divergence between our models of the $X$ -distribution ignoring $Y$ is at least as small as the average divergence between our models of the $X$ -distribution given $Y$ . This is true regardless of what the two models are - any approximation of the observable distribution improves (or gets no worse) when we integrate out a hidden variable, compared to fixing the value of the hidden variable.

Of course, this doesn't say anything about how much the approximation improves. Presumably for bad approximations, the divergence will not converge to anywhere near zero as we integrate more and more hidden variables. And if the hidden variable doesn't actually interact with the observables significantly, then presumably the divergence decrease will be near-zero.

So when would we expect this to matter?

I'd expect it to matter mainly when the observable $X$ consists of multiple variables which are "far apart" in a large model - i.e. there are many hidden variables mediating the interactions between observables. In other words, I'd expect this phenomenon to mainly be relevant to information at a distance. It's a hint that information at a distance, in complex systems, converges to some sort of universal behavior/properties, which is simpler in some sense than the full fine-grained model.

AbstractionAIWorld Modeling

Frontpage

15 Ω 7

Noise Simplifies

3 comments28 karma

Intuitions on Universal Behavior of Information at a Distance

3 comments21 karma

Mentioned in

21Intuitions on Universal Behavior of Information at a Distance

Integrating Hidden Variables Improves Approximation

New Comment

5 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:41 AM

[-]VojtaKovarik4yΩ230

I am usually reasonably good at translating from math to non-abstract intuitive examples...but I didn't have much success here. Do you have "in English, for simpletons" example to go with this? :-) (You know, something that uses apples and biscuits rather than English-but-abstract words like "there are many hidden variables mediating the interactions between observables" :D.)

Otherwise, my current abstract interpretation of this is something like: "There are detailed models, and those might vary a lot. And then there are very abstract models, which will be more similar to each other...well, except that they might also be totally useless." So I was hoping that a more specific example would clarify things for a bit and tell me whether there is more to this (and also whether I got it all wrong or not :-).)

[-]johnswentworth4yΩ120

I recommend skipping to the next post. This post was kind of a stub, the next one explains the same idea better.

[-]johnswentworth4moΩ220

This is particularly interesting if we take and $Q$ to be two different models, and take the indices 1, 2 to be different values of another random variable $Y$ with distribution $P [Y]$ given by $(λ, 1 - λ)$ . In that case, the above inequality becomes:

Note to self: this assumes P[Y] = Q[Y].

[-]Clark Benham5y20

Doesn't

D_KL(P[X]||Q[X])≤E_Y[D_KL(P[X|Y]||Q[X|Y])]

mean that you can't expect to improve by integrating in additional information Y?

[-]johnswentworth5y30

Bit of an accidental pun, here. "Integrating additional information" (in the usual sense of the phrase), has exactly the opposite meaning of "integrate out a variable" - when we integrate over the variable (in the mathy sense of the phrase), we're throwing out whatever information it contains.

So, yes - it does mean that we can't expect an approximation to improve when we integrate in additional information (in the layman's sense of the phrase).

Moderation Log