A while ago, I watched recordings of the lectures given by by Wolpert and Kardes at the Santa Fe Institute*, and I am extremely excited to see you and Marcus Hutter working in this area.

Could you speculate on if you see this work having any direct implications for AI Safety?

Edit:

I was incorrect. The lectures from Wolpert and Kardes were not the ones given at the Santa Fe Institute.

[-]Aram Ebtekar8mo*72

Were those recorded!?

For direct implications, I'd like to speak with the alignment researchers who use ideas from thermodynamics. While Shannon's probabilistic information theory is suited to settings where the law of large numbers holds, algorithmic information theory should bring more clarity in messier settings that are relevant for AGI.

Less directly, I used physics as a testing ground to develop some intuitions on how to apply algorithmic information theory. The follow-up agenda is to develop a theory of generalization (i.e., inductive biases) using algorithmic information theory. A lot of AI safety concerns depend on the specific ways that AIs (mis)generalize beliefs and objectives, so I'd like us to have more precise ideas about which generalizations are likely to occur.

[-]Matt Dellago8mo10

I would be interested in seeing those talks, can you maybe share links to these recordings?

[-]Stephen Fowler8mo*40

These recordings I watched were actually from 2022 and weren't the Sante Fe ones.

[-]Yoav Ravid8mo3-1

How does this relate to cross-entropy?

[-]Aram Ebtekar8mo20

They are equal! As discussed in the comments to that post, the difference is at most a constant; it's possible to make that constant vanish by an appropriate choice of reference universal Turning machine.

[-]Noosphere898mo42

Notably, the difference can become non-constant if we allow for semi-measures on infinite strings:

https://www.lesswrong.com/posts/KcvJXhKqx4itFNWty/k-complexity-is-silly-use-cross-entropy-instead#LNgzYraTeGcEyry9a

[-]Yoav Ravid8mo20

Isn't that significant?

[-]Aram Ebtekar8mo*40

It is, when dealing with sequences that go on to infinity. In that case you get the "KM complexity", from Definition 4.5.8 of Li & Vitanyi (2019). For online sequence prediction, Solomonoff's prior needs to sum the weights from every program.

No such complications appear in the entropy of a bounded system at a fixed precision. And ultimately, it appears that for entropy to increase, you need some kind of coarse-graining, leading us to finite strings. I discuss this in the Background section and around Corollary 1.

[-]Matt Dellago8mo10

Very good work, thank you for sharing!

Intuitively speaking, the connection between physics and computability arises because the coarse-grained dynamics of our Universe are believed to have computational capabilities equivalent to a universal Turing machine [19–22].

I can see how this is a reasonable and useful assumption, but the universe seems to be finite in both space and time and therefore not a UTM. What convinced you otherwise?

[-]Alex_Altair4mo62

Note that being finite in spatial extent is different from being finite in number of possible states. A Turing machine needs the latter but not the former. Finite spatial extent isn't a blocker if your states can have arbitrarily fine precision. As an intuition pump, you could imagine a Turing machine where the nth cell of the tape has a physical width of 1/2^n. Then the whole tape has length 1 (meter?) but can fit arbitrarily long binary strings on it.

Separately, when I've looked into it, it seemed far from consensus whether the universe was likely to be spatially infinite or not, as in the global curvature was still consistent with being flat (= infinite space). Although, I think the "accessible" universe has finite diameter.

Lastly, I claim that whether or not something is best modeled as a Turing machine is a local fact, not a global one. A Turing machine is something that will compute the function if you give it enough tape. If the universe is finite (in the state space way) then it's a Turing machine that just didn't get fed enough tape. In contrast, something is better modeled as a DFA if you can locally observe that it will only ever try to access finitely many states.

[-]Matt Dellago3mo30

Good point! My intuition was that the Berkenstein bound (https://en.wikipedia.org/wiki/Bekenstein_bound) limits the amount of information in a volume. (Or more precisely the information surrounded by an area.) Therefore the number of states in a finite volume is also finite.

I must add: since writing this comment, a man called george pointed out to me that, when modeling the universe as a computation one must take care, to not accidentally derive ontological claims from it.

So today I would have a more 'whatever-works-works'-attitude; UTMs, DFAs both just models, neither likely to be ontologically true.

[-]Aram Ebtekar8mo*20

Thanks. Obviously this claim needs some interpretation, but a UTM still seems a better model of the Universe than, say, any lower automation in the Chomsky hierarchy. For the purposes of defining entropy, it's important that we can use a small base machine, plus a memory tape that we may think of as expanding in an online fashion.

[-]UnderTruth8mo10

I am reminded of this paper, on the equivalence of information-theoretic formalisms and those of physics. I am linking the paper here not as an endorsement, but because it may provide some unusual, but useful, lines of thought.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

77

Thermodynamic entropy = Kolmogorov complexity

77

77