# Ω 8

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I have spent a long time being confused about Paul’s post on consequentialists in the Solomonoff prior. I now think I understand the problem clearly enough to engage with it properly.

I think the reason I was confused is to a large degree a problem of framing. It seemed to me in the course of discussions I had to deconfuse myself to me that similar confusions are shared by other people. In this post, I will attempt to explain the framing that helped clarify the problem for me.

## i. A brief sketch of the Solomonoff prior

The Solomonoff, or Universal, prior is a probability distribution over strings of a certain alphabet (usually over all strings of 1s and 0s). It is defined by taking the set of all Turing machines (TMs) which output strings, assigning to each a weight proportional to

(where L is its description length), and then assigning to each string a probability equal to the weights of the TMs that compute it. The description length is closely related to the amount of information required to specify the machine; I will use description length and amount of information for specification interchangeably.

(The actual formalism is in fact a bit more technically involved. I think this picture is detailed enough, in the sense that my explanation will map onto the real formalism about as well.)

The above defines the Solomonoff prior. To perform Solomonoff induction, one can also define conditional distributions by considering only those TMs that generate strings beginning with a certain prefix. In this post, we’re not interested in that process, but only in the prior.

## ii. The Malign Prior Argument

In the post, Paul claims that the prior is dominated by consequentialists. I don’t think it is quite dominated by them, but I think the effect in question is plausibly real.

I’ll call the key claim involved the Malign Prior Argument. On my preferred framing, it goes something like this:

Premiss: For some strings, it is easier to specify a Turing Machine that simulates a reasoner which decides to predict that string, than it is to specify the intended generator for that string.

Conclusion: Therefore, those strings’ Solomonoff prior probability will be dominated by the weight assigned to the TM containing the reasoner.

It’s best to explain the idea of an ‘intended generator’ with examples. In the case of a camera signal as the string, the intended generator is something like a TM that simulates the universe, plus a specification of the point in the simulation where the camera input should be sampled. Approximations to this, like a low-fidelity simulation, can also be considered intended generators.

There isn’t anything special about the intended generator’s relationship to the string - it’s just one way in which that string can be generated. It seems most natural to us as humans, and the Occamian nature of SI feels like it should be biased towards such strings, but nothing in principle stops something less ‘natural’ from being in fact a shorter description.

This idea of ‘naturalness’ is important in understanding what the Malign Prior Argument is about; I will use it roughly to refer to something like ‘the set of Turing Machines that don’t involve reasoners that attempt to influence the prior’, or ‘the set of intended generators’. It’s vague, but I think it gets across the point.

I read most of Paul’s post as an existence argument for the premiss, using consequentialists in other worlds as the reasoners. I don’t think all such reasoners are like Paul describes; I also doubt that all or even most strings are subject to this effect, but find it very plausible that some are.

I think the argument is not, at its core, about these reasoners making the strings they output more likely than the ‘true string’. It is concerning enough that there is any effect at all that these reasoners have on the prior, which is the fact this argument establishes.

As a side note, it’s also worth noting that this is not about these reasoners breaking out of the box and taking over our world, although that is also a related concern one might have.

## iii. The support for premiss 1

Consider a string S’ with very high natural K-complexity (description length of the intended generator) that shares a prefix with a string S that is of high interest to human-like civilisations.

I claim that the prior probability of this string is higher than it ‘naturally’ ‘should’ be, in the sense that a large part of the weight that composes this probability is coming from a TM that simulates a reasoner that is attempting to influence the prior.

The reasons this happens are:

1. A reasoner in a TM can have an arbitrarily long amount of compute time to decide what strings to output.
2. Specifying reasoners is cheap relative to specifying the string S’.
3. There exists a reasoner whose goals are best served by influencing the prior to make S’ more likely.

1 is a crucial property of the Solomonoff prior that allows this to happen. A TM in the Solomonoff prior can think for a very, very long time — enough to e.g. simulate an Ackerman(Ackerman(10)) initial world states each for Ackerman(Ackerman(10)) timesteps. It can perform something close to an exhaustive search of all possible civilizations and decide to attempt to influence the one that is most susceptible to be influenced, if that’s what it wants to do. This is a ridiculous computation, but we’re talking about a mathematical object, not an actual process that we run. It’s plausible that if the prior was also weighted by speed of computation, these effects would be far less pronounced (and maybe would not arise at all).

To see that 2 and 3 are plausible, we need to think about S’, which by assumption is a string with high natural K-complexity. This high complexity ‘buys’ us the space to specify a reasoner, and the space to specify values, without making the TM more complex than a natural generator of S’. Now, because S is by assumption of interest to civilisations, there likely exists a TM containing a reasoner that performs its exhaustive search, finds S, and concludes that its values are best served by making S’ more likely (e.g. to influence the decision-making of civilisations that are thinking about what S is, given a prefix of it known to them).

In a way, this agent uses its simplicity to give more simplicity to some other string. That is how the prior gets hijacked.

Note that this reasoner will need to have goals that are simpler than the natural generator of S’ in order to actually contribute to S’ being more likely - otherwise, specifying its TM would be more expensive than specifying the natural generator of S’.

The above is non-constructive (in the mathematical sense), but nevertheless the existence of strings S’ that are affected thus seems plausible. The spaces of possible TMs and of the strings we (or other users of the Solomonoff prior) could be interested in are simply too vast for there not to be such TMs. Whether there are very many of these, or whether they are so much more complicated than the string S so as to make this effect irrelevant to our interests, are different questions.

## iv. Alien consequentialists

In my view, Paul’s approach in his post is a more constructive strategy for establishing 2 and 3 in the argument above. If correct, it suggests a stronger result - not only does it cause the probability of S’ to be dominated by the TM containing the reasoner, it makes the probability of S’ roughly comparable to S, for a wide class of choices of S.

In particular, the choice of S that is susceptible to this is something like the camera example I used, where the natural generator is S is a specification of our world together with a location where we take samples from. The alien civilisation is a way to construct a Turing Machine that outputs S’ which has comparable complexity to S.

To do that, we specify a universe, then run it for however long we want, until we get somewhere within it smart agents that decide to influence the prior. Since 1 is true, these agents have an arbitrary amount of time to decide what they output. If S is important, there probably will be a civilisation somewhere in some simulated world which will decide to attempt to influence decisions based on S, and output an appropriate S’. We then specify the output channel to be whatever they decide to use as the output channel.

This requires a relatively modest amount of information - enough to specify the universe, and the location of the output. This is on the same order as the natural generator for S itself, if it is like a camera signal.

Trying to specify our reasoner within this space (reasoners that naturally develop in simulations) does place restrictions on what kind of reasoner we up end up with. For instance, there are now some implicit runtime bounds on many of our reasoners, because they likely care about things other than the prior. Nevertheless, the space of our reasoners remains vast, including unaligned superintelligences and other odd minds.

## v. Conclusion. Do these arguments actually work?

I am mostly convinced that there is at least some weirdness in the Solomonoff prior.

A part of me wants to add ‘especially around strings whose prefixes are used to make pivotal decisions’; I’m not sure that is right, because I think scarcely anyone would actually use this prior in its true form - except, perhaps, an AI reasoning about it abstractly and naïvely enough not to be concerned about this effect despite having to explicitly consider it.

In fact, a lot of my doubt about the malign Solomonoff prior is concentrated around this concern: if the reasoners don’t believe that anyone will act based on the true prior, it seems unclear why they should spend a lot of resources on messing with it. I suppose the space is large enough for at least some to get confused into doing something like this by mistake.

I think that even if my doubts are correct, there will still be weirdness associated with the agents that are specified directly, along the lines of section iii, if not those that appear in simulated universes, as described in iv.