Natural Latents: Latent Variables Stable Across Ontologies

David Lorell

I'm curious about your sense of the path towards AI safety applications, if you have a more specific and/or opinionated view than the conclusion/discussion section.

[-]johnswentworth3mo237

My main opinionated view relative to current discourse is that if someone is trying to apply any of this directly to LLMs, then they are probably very deeply confused about natural abstractions and also language and also agency/intelligence/etc in general.

The path we're optimizing for right now is to figure out the whole damn core of the theory of agency, get it across the theory-practice gap, and then not be so clueless about everything. See The Plan. Possibly with AI acceleration of the research at some point; our decisions right now are basically the same regardless of whether the research will be accelerated by AI later.

[-]Sodium3mo100

People might be interested in the results from this paper.

[-]Thane Ruthenis3mo*60

Just skimmed for now...

We say a latent is a "redund" over observables $X_{1}, . . ., X_{n}$ if and only if $Λ^{'}$ is fully determined by each $X_{i}$ individually, i.e. there exist functions $f_{i}$ such that $Λ^{'} = f_{i} (X_{i})$ for each $i$ . In the approximate case, we weaken this condition to say that the entropy $H (Λ^{'} | X_{i}) \leq ϵ$ for all $i$ , for some approximation error $ϵ$ .

I see your latest result has allowed you to streamline the definitions of redundancy and redunds.

I think attempting to require $ϵ_{red}$ to be small in terms of $ϵ_{{red}^{'}}$ would still run into my counterexample, right? (Setups could be constructed such that requiring $ϵ_{red}$ to be $ϵ_{{red}^{'}}$ -small would cause $ϵ_{med}$ to scale arbitrarily with $i$ , and vice versa. So in the general case, there may exist valid redunds with the redundancy error $ϵ_{{red}^{'}}$ such that the maximal redund's $ϵ_{red}$ and $ϵ_{med}$ (and therefore $ϵ_{med} + 2 ϵ_{{red}^{'}}$ ) cannot both be $ϵ_{{red}^{'}}$ -small.)

[-]Lucius Bushnaq3mo51

(Mediation)

Wait, so it's enough for the agents to just believe the observables are independent given the state of their latents? We only need the $X_{j}$ to be independent conditional on $λ_{i}$ under a particular model $M^{i}$ ?

I didn't realise that. I thought the observables had to be 'actually independent' after conditioning in some sort of frequentist sense.

Getting a version of this that works under approximate Agreement on Observables sounds like it would be very powerful then. It'd mean that even if Alice is much smarter than Bob, with her model e.g. having more FLOP which she can use to squeeze more bits of information out of the data, there'd still need to be a mapping between the concepts Bob and Alice internally use in those domains where Bob doesn't do very much worse than Alice on predictive accuracy.

So, if a superintelligence isn't that much better than humanity at modelling some specific part of reality, there'd need to be an approximate mapping between humanity's latents and (some of) the superintelligence's latents for that part of reality. If the the theorems approximately hold under approximate agreement on observables.

[-]johnswentworth3mo30

Yup, that is correct.

If the the theorems approximately hold under approximate agreement on observables.

Yeah, there is still the issue that the theorems aren't always robust to approximation on the Agreement on Observables condition, though the Solomonoff version is and there's probably other ways to sidestep the issue.

[-]Daniel C3mo50

I think a subtle point is that this is saying we merely have to assume predictive agreement of distributions marginalized over the latent variables $Λ_{A} / Λ_{B}$ , but once we assume that & the naturality conditions, then even as each agent receive more information about $X$ to update their distributions & latent variables $Λ_{i}$ , the deterministic constraints between the latents will continue to hold.

Or if a human and AI start out with predictive agreement over some future observables, & the AI's latent satisfy mediation while human's latent satisfy redundancy, then we could send the AI out to update on information about those future observables, and humans can (in principle) estimate the redundant latent variable they care about from the AI's latent without observing the observables themselves. The remaining challenge is that humans often care about things that are not approximately deterministic w.r.t observables from typical sensors.

[-]johnswentworth3mo40

Yes, though I'll flag that we don't have robustness with respect to approximation on the agreement condition (though we do have other ways around that to some extent, e.g. using the Solomonoff version of natural latents), and those sorts of updates are the kind of thing which I'd expect to run into that robustness problem.

[-]Simon Pepin Lehalleur3mo40

Which formal properties of the KL-divergence do the proofs of your result use? It could be useful to make them all explicit to help generalize to other divergences or metrics between probability distributions.

[-]johnswentworth3mo40

The appendices make heavy use of additivity across independent variables (and across factorizations more generally), which is the main thing I'd expect to need to work around in order to use other divergences/metrics.

[-]MalcolmMcLeod3mo30

Really gorgeous stuff with philosophically significant and plausibly practical implications. Great work. I assume you've also looked at this from a categorial perspective? It would surprise me if treating latents as limits didn't simplify some of the arguments (or at least the presentation, which is already admirably clear). And I can't help but wonder whether "bigger" uniqueness/isomorphism/stability results for world-models or other features of agents might result from comparing Bayes net categories. If you haven't tried categorial abstractions (I dunno the specifics---there are a few categorification choices that I could see being useful here), play around with them.

[-]johnswentworth3mo31

@Lorxus translated the proofs here into categorical language IIRC (and found these deterministic versions much nicer than the older stochastic versions).

[-]Gurkenglas3mo20

oh good to know, i was thinking about how one might get redund and mediator formally dual to each other. is this written up?

[-]Satya Benson2mo10

Graphical statement of Theorem 2

I find this picture pretty misleading, because it seems to say that if is determined by $Λ^{B}$ , then $Λ^{A}$ is a mediator, when really this is false, and it's stated explicitly in the text above that Alice's latent satisfying mediation is assumed.

[-]johnswentworth2mo20

I think you might have misread something? The graphical statement of theorem 2 does not say that if is determined by $Λ^{B}$ , then $Λ^{A}$ is a mediator; that would indeed be false in general. It says that:

If $Λ^{B}$ is a mediator and we have agreement on observables, then...
... naturality of $Λ^{A}$ implies that $Λ^{A}$ is determined by $Λ^{B}$ .

In particular, the theorem says that under some conditions $Λ^{A}$ is determined by $Λ^{B}$ . Determination is in the conclusion, not the premises. On the flip side, $Λ^{A}$ being a mediator is in the premises, not the conclusion.

[-]Satya Benson2mo10

This was all clear to me, but only from reading the text; my comment is just to say that the graphical statement doesn't show being a mediator in the premises, so in isolation it gives the wrong idea; this led to a little confusion.

To be clear, I am talking about the reverse direction, as pictured here:

I understand that you have already set up $Λ^{A}$ as a mediator immediately above the image. Your text is perfectly clear:

In other words, we want to show: if Alice' latent $Λ^{A}$ satisfies Mediation, and for any latent $Λ^{B}$ Bob could choose (i.e. any other mediator) we have $Λ^{A} \leftarrow Λ^{B} \to Λ^{A}$ , then Alice' latent must be natural.

[-]Cristian-Curaba3mo10

Great work! I have a technical question.

My current understanding is as follows:

1. If we have even one observable variable with agreement observation and for which the latent variables satisfy the exact naturality condition, we can then build the transferability function exactly.
2. In the approximation case, if we have multiple observable variables that meet these same conditions, we can choose the specific variable (or set of variables, in the proofs you used a couple) that will minimize the errors. We would not need to use all of them.

Is this correct?

Additionally, I was wondering if you have attempted to implement the algorithm derived from the proof to construct the isomorphism. It seems that some effort could be dedicated to developing an algorithm that minimizes or reduces these errors. It could one day be helpful for interpreting and aligning different ontological frameworks, like mapping an alien Bayesian network to a human one.

[-]johnswentworth3mo20

I do not understand the statement of your current understanding, in particular point 1. Could you please state in different words and/or state it formally and/or give an example?

[-]Cristian-Curaba2mo10

By scanning the graphical proof, I don't see any issue on the following generalization of the Mediator Determines Redund Theorem:
Let  and $Λ^{'}$ be random variables and let $X_{1}, \dots, X_{m}$ be any not-empty subset of $X_{1}, \dots, X_{n}$ that satisfy the following conditions:
    - $Λ$ Mediation: $X_{1}, \dots, X_{m}$ are independent given $Λ$
    - $Λ^{'}$ Redundancy: $\forall j \in {1, \dots, m} Λ^{'} \leftarrow X_{j} \to Λ^{'}$
Then $Λ^{'} \leftarrow Λ \to Λ^{'}$ .

In the above, I've weaken the $Λ^{'}$ Redundancy hypothesis, requiring that the redundancy of any subset of random variables is enough to conclude the thesis.
Does the above generalization work (if don't, why?).

If the above stands true, then just one observational random variable (with agreement) is enough to satisfy the Redundancy condition (Mediation is trivially true with one variable), an therefore $Λ^{A}$ is determined by $Λ^{B} .$ Moreover, in the general approximation case, if we have various sets of random variables that meet the naturality condition, we can choose the one that will minimize the errors (there's some kind of trade-off between $ϵ_{m e d}$ and $ϵ_{r e d}$ errors).

[-]johnswentworth2mo30

Ah, yes, that is almost correct. You need redundancy over TWO distinct observables (i.e. the subset must be at least size two), not just one, but otherwise yes. With just one observable, you don't have two branches to dangle a off of in the graphical proof, so we can't get $Λ$ between two copies of $Λ^{'}$ .

[+][comment deleted]3mo10

^{^}

W. V. O. Quine, On empirically equivalent systems of the world, Erkenntnis 9, 313 (1975).

^{^}

H. Cunningham, A. Ewart, L. Riggs, R. Huben, and L. Sharkey, Sparse autoencoders find highly interpretable features in
language models, (2023), arXiv:2309.08600 [cs.LG].

^{^}

Marvik, Model merging: Combining different fine-tuned LLMs, Blog post (2024), retrieved from https://marvik.com/
model-merging.

^{^}

M. Huh, B. Cheung, T. Wang, and P. Isola, The platonic representation hypothesis, (2024), arXiv:2405.07987 [cs.LG].

^{^}

J. Pearl, Causality: Models, Reasoning and Inference, 2nd ed. (Cambridge University Press, USA, 2009).

LESSWRONG
LW

LESSWRONG
LW

120

Natural Latents: Latent Variables Stable Across Ontologies

120

Ω 50

120

Ω 50

Abstract

Background

The Math

Setup & Objective

Notation

Foundational Concepts: Mediation, Redundancy & Naturality

Core Theorems

Naturality $⟹$ Minimality Among Mediators

Naturality $⟹$ Maximality Among Redunds

Isomorphism of Natural Latents

Application To Translatability

Motivating Question

Guaranteed Translatability

Natural Latents: Intuition & Examples

When Do Natural Latents Exist? Some Intuition From The Exact Case

Worked Quantitative Example of the Mediator Determines Redund Theorem

Intuitive Examples of Natural Latents

Ideal Gas

Biased Die

Timescale Separation In A Markov Chain

Discussion & Conclusion

Acknowledgements

Appendices

Graphical Notation and Some Rules for Graphical Proofs

Frankenstein Rule

Factorization Transfer

Bookkeeping Rule

The Dangly Bit Lemma

Graphical Proofs

Python Script for Computing $D_{K L}$ in Worked Example

120

Natural Latents: Latent Variables Stable Across Ontologies

120

Ω 50

120

Ω 50

Abstract

Background

The Math

Setup & Objective

Notation

Foundational Concepts: Mediation, Redundancy & Naturality

Core Theorems

Naturality ⟹ Minimality Among Mediators

Naturality ⟹ Maximality Among Redunds

Isomorphism of Natural Latents

Application To Translatability

Motivating Question

Guaranteed Translatability

Natural Latents: Intuition & Examples

When Do Natural Latents Exist? Some Intuition From The Exact Case

Worked Quantitative Example of the Mediator Determines Redund Theorem

Intuitive Examples of Natural Latents

Ideal Gas

Biased Die

Timescale Separation In A Markov Chain

Discussion & Conclusion

Acknowledgements

Appendices

Graphical Notation and Some Rules for Graphical Proofs

Frankenstein Rule

Factorization Transfer

Bookkeeping Rule

The Dangly Bit Lemma

Graphical Proofs

Python Script for Computing DKL in Worked Example

Naturality $⟹$ Minimality Among Mediators

Naturality $⟹$ Maximality Among Redunds

Python Script for Computing $D_{K L}$ in Worked Example