The Solomonoff prior is malign. It's not a big deal.

Charlie Steiner

The Solomonoff prior is malign. It's not a big deal.

8 min read25th Aug 20229 comments

41

Epistemic status: Endorsed at ~85% probability. In particular, there might be clever but hard-to-think-of encodings of observer-centered laws of physics that tilt the balance in favor of physics. Also, this isn't that different from Mark Xu's post.

Previously, previously, previously

I started writing this post with the intuition that the Solomonoff prior isn't particularly malign, because of a sort of pigeon hole problem - for any choice of universal Turing machine there are too many complicated worlds to manipulate, and too few simple ones to do the manipulating.

Other people have different intuitions.

So there was only one thing to do.

Math.

All we have to do is compare [wild estimates of] the complexities of two different sorts of Turing machines: those that reproduce our observations by reasoning straightforwardly about the physical world, and those that reproduce our observations by simulating a totally different physical world that's full of consequentialists who want to manipulate us.

Long story short, I was surprised. The Solomonoff prior is malign. But it's not a big deal.

Team Physics:

If you live for 80 years and get 10^7 bits/s of sensory signals, you accumulate about 10^16 bits of memory to explain via Solomonoff induction.

In comparison, there are about 10^51 electrons on Earth - just writing their state into a simulation is going to take somewhere in the neighborhood of 10^51 bits^[1]. So the Earth, or any physical system within 35 orders of magnitude of complexity of the Earth, can't be a Team Physics hypothesis for compressing your observations.

What's simpler than the Earth? Turns out, simulating the whole universe. The universe can be mathematically elegant and highly symmetrical in ways that Earth isn't.

For simplicity, let's suppose that "I" am a computer with a simple architecture, plus some complicated memories. The trick that allows compression is you don't need to specify the memories - you just need to give enough bits to pick out "me" among all computers with the same architecture in the universe^[2]. This depends only on how many similar computers there are in Hilbert space with different memories^[3], not directly on the complexity of my memories themselves.

So the complexity of a simple exemplar of Team Physics more or less looks like:

Complexity of our universe

+ Complexity of my architecture, and rules for reading observations out from that architecture

+ Length of the unique code for picking me out among things with the right architecture in the universe.

That last one is probably pretty big relative to the first two. The universe might only be a few hundred bits, and you can specify a simple computer with under a few thousand (or a human with ~10^8 bits of functional DNA). The number of humans in the quantum multiverse, past and future, is hard to estimate, but 10^8 bits is only a few person-years of listening to Geiger counter clicks, or being influenced by the single-photon sensitivity of the human eye.

Team Manipulation:

A short program on Team Manipulation just simulates a big universe with infinite accessible computing power and a natural coordinate system that's obvious to its inhabitants. Life arises, and then some of that life realizes that they could influence other parts of the mathematical multiverse by controlling the simple locations in their universe. I'll just shorthand the shortest program on Team Manipulation, and its inhabitants, as "Team Manipulation" full stop, since they dominate the results.

So they use the limitless computing power of their universe to simulate a bunch of universes, and when they find a universe where people are using the Solomonoff prior to make decisions, they note what string of bits the people are conditioning on, and whether they gain something (e.g. make this other universe more pleasing according to their standards) by influencing the decision. Then, if they want to influence the decision, they manipulate the state of their universe at the simple locations, so that it contains a unique "start reading" code, then the string the people are conditioning on, plus the desired continuation.

As the story goes, at some point they'll discover our universe, and so if we ever use the Solomonoff prior to make decisions they'll be in there trying to sway us. [cue spooky music]

The complexity of this program looks like:

Complexity of a universe with infinite computing power and Schelling points

+ Complexity of finding those Schelling points, and rules for reading data out

+ Length of the unique "start reading" code that encodes the observations you're trying to continue with Solomonoff induction.

There's a problem for applying this to our universe, or ones like it: computation costs! Team Manipulation can't simulate anyone really using the Solomonoff prior, because they're in it, and they can't simulate themselves all the way to the end. Or looking at it from the other side, the people using the Solomonoff prior would need to have enough computing power to simulate their own universe many times over inside the Solomonoff prior computation.

Getting around this is possible, though.

The basic idea is to simulate universes that are computable except they have access to a Solomonoff prior oracle (or similar properties^[4]). Then the manipulators just have to simulate that universe up until the first time someone is about to use the Solomonoff oracle, figure out what bit-string they're going to condition on, and write down that bit string in their own universe plus their desired continuation.

In fact, Team Manipulation, despite living in a computable universe, can keep predicting you through multiple queries to a hypercomputer, with no complexity penalty, if they actually are the simplest Turing machine to output your observations and thus can control the output of the Solomonoff oracle.

In fact, not even un-hijacked hypercomputation is enough to stop Team Manipulation. Sure, they can't predict the hypercomputer, but they can still simulate the aftermath of each possible hypercomputer output, adding one bit to the complexity of the "start reading" code for each bit of hypercomputer output.

Yes, those bits can add up real fast. But it's not like any other Turing machines are doing better! To Turing machines, the hypercomputer output looks like noise, and can't generally be compressed. So the rising tide of hypercomputation lifts all boats equally, not giving an advantage to Team Physics.

Comparison:

A key first point is that if we never find a hypercomputer lying around, Team Manipulation doesn't really care about us. So for the rest of the post just take the perspective of someone who's tripped over a Solomonoff oracle and is wondering what will happen.

Just eyeballing it...

The complexity of our universe and that of the Team Manipulation universe are probably fairly similar. The complexity of the rules for reading out data might be a bit different, in favor of Team Manipulation, but not by that much.

Most of the difference, are in the unique codes required to pick out me from other agents like me.

So which is it? Is it easier to code for me among people in our universe, or is it easier to code for me among people doing Solomonoff induction in all universes?

The complexity of that latter one - picking me out from all universes, knowing that I'm doing Solomonoff induction, can be factored into the complexity of picking out our universe, plus the complexity of picking me out from everyone doing Solomonoff induction in here^[5].

So if we only use Solomonoff induction once in this entire universe, the complexity just goes close to the complexity of the universe, which I'm quite confident is small relative to the complexity of picking me out in Team Physics, and therefore that one use of the Solomonoff oracle is going to be manipulated.

But if every single person like me in the entire universe has used the Solomonoff oracle (or there have been a similar number of non-prefix queries), then the complexity of picking me out is the same either way, and so Team Manipulation is worse-off, and your use of the oracle is probably safe. Unless...

If you are a particularly unique person, such as the first person in your universe to ever stumble over the Solomonoff oracle and use it to make a crucial decision, with large impact on the rest of the universe, then hey presto, the complexity of specifying you in Team Manipulation's "start reading" code is low again, and the Solomonoff oracle turns out to be untrustworthy after all. This goes double if you've never thought of these arguments and are predictably going to trust the oracle.

Does this matter for AI safety?:

No.

We don't have a Solomonoff oracle. So we aren't simulating Team Manipulation, and therefore aren't making decisions based on their communiques.

And this isn't really the sort of problem that has "echoes" in approximations. First, because the approximations we use are things like AIXI-tl or gradient descent on randomly initialized neural nets of fixed size, which don't care about things they can't simulate. Second, because even if we really did try to approximate Team Manipulation's effects using abstract reasoning, it's (even theoretically) hard to figure out what effects those would be without running their Turing machine (and all its similarly-short but similarly-expensive relatives) to find out.

Yes, inner optimizers in big neural nets might be a thing, and this is sort of like an inner optimizer problem in the Solomonoff prior. But even though both issues involve agents being inside of things, the underlying reasons are different, and learning about one isn't super informative about the other.

Agenty optimizers show up in both cases because being an agenty optimizer is a great way to get things done. But for Team Manipulation in the Solomonoff prior, their agentiness is merely what we expect to happen if you simulate an ecosystem long enough to get intelligent life - it's not really the active ingredient of the argument; that honor probably goes to having the extreme computing power to simulate entire other universes. Meanwhile, for inner optimizers in neural nets, their agentiness, and the usefulness of agentiness as a problem-solving tool, are the active ingredients.

So.

TL;DR:

I was wrong about the malignity of the Solomonoff prior. If you have a Solomonoff oracle and are making an important decisions with it, it's untrustworthy. However, since we don't have Solomonoff oracles, I regard this as mostly a curiosity.

^{^}
This isn't really the K-complexity of Earth, since as we shortly argue that complexity is sublinear in the size of the Earth so long as it can be located inside a simulation of the whole universe. It's just how expensive it would be to describe the Earth if it weren't part of a nice simple universe, and instead had to be treated as a raw ingredient.
It might seem obvious to you that such a raw description of Earth takes much more than 1 bit per electron since you have to specify their positions. I claim that it's actually plausibly less than 1 bit per electron since so many electrons on Earth are part of correlated crystals. But either way, the total number is still way way bigger than our 10^16 bits of memories.
^{^}
This process is sufficient to locate me - you don't need extra bridging rules. Extra bridging rules are necessary if pretending to be Cartesian, though.
^{^}
"How many" is a simplification - really there's a continuous measure with an entropy that we can convert into bits.
^{^}
Many forms of hypercomputation work. Or even just extreme patience - a universe with lots of computational power that's willing to put everything on pause to run an approximation of Solomonoff induction big enough to run Team Manipulation's Turing machine counts.
^{^}
Plus or minus a constant depending on what's going on in the other universes.

Acausal TradeSolomonoff InductionAI

Frontpage

41

Mentioned in

40Prosaic misalignment from the Solomonoff Predictor

The Solomonoff prior is malign. It's not a big deal.

New Comment

9 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:39 AM

[-]Steven Byrnes2y42

Here’s a “physicalist” scenario that I think is closely analogous if not identical, that I was thinking about, let me know if it makes sense or not.

Some humans are trying to figure out how the Tegmark level 4 multiverse works. If those humans were smarter—like how an AGI will be smarter—then maybe they'll succeed. And if they succeed then they'd have a very sound prior for anthropic reasoning. And maybe they would do such reasoning and see that the overwhelming majority of observers-like-us are being simulated by other agents, with a googolplex lives on the line who will be happy or tortured depending on whether we do some particular action X.

Some reasons this scenario does not concern me is:

I don't think there is enough information to disambiguate different equally-plausible schemes for weights / priors about the different universes in the multiverse (this is analogous to uncertainty about which universal Turing machine to use for your Solomonoff prior),
I don’t think that we could ever figure out that the basement-universe-people-who-are-simulating-us want us to do X, even if they did in fact want us to do X, and I don't think a superintelligent AGI could figure that out either (this is related to us not having a Solomonoff oracle, as you mention)
Hopefully the AGI will adopt a policy of categorically not giving into threats, even if there are in fact a googolplex lives on the line, and therefore won’t try to figure out whether it’s being simulated or not. Also, by weird decision theory logic, if you make an assumption that you’re not being simulated, and never bother to check it, then it can be kinda self-fulfilling, because the basement-universe-people might be insightful enough to anticipate that and not bother simulating you a bunch of times in the first place.

Again, I haven’t thought about this very much, I may be confused.

[-]Charlie Steiner2y40

I basically agree. There's both the practical issue and the theoretical one.

In our universe, we can't simulate these universes/mathematical objects that are simulating us, and reasoning about properties of computational things without simulating them is often hard to impossible.
Supposing we did have big compute and could simulate this thing that simulates us - can we choose actions such that they're not incentivized to try to manipulate us? Should we even assign measure to the mathematical multiverse such that we care? I think the answers are "yes" and "maybe not," but am unsure.

[-]Signer2y30

Second, because even if we really did try to approximate Team Manipulation’s effects using abstract reasoning, it’s (even theoretically) hard to figure out what effects those would be without running their Turing machine (and all its similarly-short but similarly-expensive relatives) to find out.

Why it's hard? I mean if we explicitly code "expect simple things" into AI, wouldn't it figure out what's the most probable kind of things for the Team Manipulation to do? Like we can speculate about what they would do.

[-]Charlie Steiner2y51

The beings inside Team Manipulation aren't going to to be simple creatures with simple desires just because they live in a simple universe. I mean, we live in a simple universe, and look at us!

But more importantly, the relationship of what they want to what effect that will have on the output of the Solomonoff oracle is hard to figure out without actually simulating them simulating us. Most of the time it's just a string of bits with unclear long-term impact.

But now that you mention it, maybe there are edge cases. If I built a universe-destroying bomb, and rigged it to go off if the Solomonoff oracle output a 1, maybe we can expect them to not set off the bomb (unless they are actively competitive with other universes, or think it would be funny, or don't gain much from influencing our universe...).

[-]magfrump2y2-1

I'm confused by your intuition that team manipulation's universe has similar complexity to ours.

My prior is that scaling the size of (accessible) things in a universe also requires scaling the complexity of the universe in a not-bounded way, probably even a super-linear way, such that fully specifying "infinite computing power" or more concretely "sufficient computing power to simulate universes of complexity <=X for time horizons <=Y" requires complexity f(x,y) which is unbounded in x,y, and therefore falls apart completely as a practical solution (since our universe is at age 10^62 planck intervals) unless f(x,y) is ~O(log(y)), whereas using a pure counting method (e.g. the description simply counts how many universe states can be simulated) gives O(exp(y)).

Since my intuition gives the complexity of Team Manipulation's raw universe at >10^(10^62), I'm curious what your intuition is that makes it clearly less than that of Team Science. There are approximately 10^185 Planck volumes in our observable universe so it takes only a few hundred bits to specify a specific instance of something inside a universe, plus a hundred or so to specify the Planck timestamp. In particular, this suggests that the third branch of Team Science is pretty small relative to the 10^8 specification of an observer architecture, not overwhelmingly larger.

[-]Charlie Steiner2y2-2

One antidote to feeling like simple Turing machines can't contain complicated stuff is to consider Universal Search (I forget its real name if that's not it) - this is a Turing machine that iterates over every Turing machine.

Turing machines can be put in an ordered list (given a choice of programming language), so Universal Search just runs them all. You can't run them in order (because many never halt) and you can't run the first step of each one before moving onto step two (because there's an infinite number of Turing machines, you'd never get to step two). But you can do something in between, kinda like the classic picture of how to make a list of the rational numbers. You run the first Turing machine for a step, then you run the first and second Turing machines for a step, and you run the first, second, and third... at every step you're only advancing the state of a finite number of Turing machines, but for any finite Turing machine, you'll simulate it for an arbitrary number of steps eventually. And all of it fits inside Universal Search, which is quite a simple program.

As for finding us in our universe, your estimate makes sense in a classical universe (where you just have to specify where we are), but not in a quantum on (where you have to specify what branch of the universe's wavefunction we're on).

[-]BTernaryTau2y52

I'm assuming that you're using "Universal Search" to refer to Toby Ord's Complete Turing Machine.

[-]magfrump2y20

My issue isn't with the complexity of a Turing machine, it's with the term "accessible." Universal search may execute every Turing machine, but it also takes adds more than exponential complexity time to do so.

In particular because if there are infinitely many schelling points in the manipulation universe to be manipulated and referenced, then this requires all of that computation to causally precede the simplest such schelling point for any answer that needs to be manipulated!

It's not clear to me what it actually means for there to exist a schelling point in the manipulation universe that would be used by Solomonoff Induction to get an answer, but my confusion isn't about (arbitrarily powerful computer) or (schelling point) on their own, it's about how much computation you can do before each schelling point, while still maintaining the minimality criteria for induction to be manipulated.

[-]Perhaps2y10

I love the Team Physics and Team Manipulation characterization, gives big pokemon vibes.

Moderation Log