The Solomonoff Prior is Malign

by Mark Xu16 min read14th Oct 202034 comments

121

Ω 43

Solomonoff InductionPriorsInner AlignmentAI
Curated
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This argument came to my attention from this post by Paul Christiano. I also found this clarification helpful. I found these counter-arguments stimulating and have included some discussion of them.

Very little of this content is original. My contributions consist of fleshing out arguments and constructing examples.

Thank you to Beth Barnes and Thomas Kwa for helpful discussion and comments.

What is the Solomonoff prior?

The Solomonoff prior is intended to answer the question "what is the probability of X?" for any X, where X is a finite string over some finite alphabet. The Solomonoff prior is defined by taking the set of all Turing machines (TMs) which output strings when run with no input and weighting them proportional to , where  is the description length of the TM (informally its size in bits).

The Solomonoff prior says the probability of a string is the sum over all the weights of all TMs that print that string.

One reason to care about the Solomonoff prior is that we can use it to do a form of idealized induction. If you have seen 0101 and want to predict the next bit, you can use the Solomonoff prior to get the probability of 01010 and 01011. Normalizing gives you the chances of seeing 1 versus 0, conditioned on seeing 0101. In general, any process that assigns probabilities to all strings in a consistent way can be used to do induction in this way.

This post provides more information about Solomonoff Induction.

Why is it malign?

Imagine that you wrote a programming language called python^10 that works as follows: First, it takes all alpha-numeric chars that are not in literals and checks if they're repeated 10 times sequentially. If they're not, they get deleted. If they are, they get replaced by a single copy. Second, it runs this new program through a python interpreter.

Hello world in python^10:

ppppppppprrrrrrrrrriiiiiiiiiinnnnnnnnnntttttttttt('Hello, world!')

Luckily, python has an exec function that executes literals as code. This lets us write a shorter hello world:

eeeeeeeeexxxxxxxxxxeeeeeeeeeecccccccccc("print('Hello, world!')")

It's probably easy to see that for nearly every program, the shortest way to write it in python^10 is to write it in python and run it with exec. If we didn't have exec, for sufficiently complicated programs, the shortest way to write them would be to specify an interpreter for a different language in python^10 and write it in that language instead.

As this example shows, the answer to "what's the shortest program that does X?" might involve using some roundabout method (in this case we used exec). If python^10 has some security properties that python didn't have, then the shortest program in python^10 that accomplished any given task would not have these security properties because they would all pass through exec. In general, if you can access alternative ‘modes’ (in this case python), the shortest programs that output any given string might go through one of those modes, possibly introducing malign behavior.

Let's say that I'm trying to predict what a human types next using the Solomonoff prior. Many programs predict the human:

  1. Simulate the human and their local surroundings. Run the simulation forward and check what gets typed.
  2. Simulate the entire Earth. Run the simulation forward and check what that particular human types.
  3. Simulate the entire universe from the beginning of time. Run the simulation forward and check what that particular human types.
  4. Simulate an entirely different universe that has reason to simulate this universe. Output what the human types in the simulation of our universe.

Which one is the simplest? One property of the Solmonoff prior is that it doesn't care about how long the TMs take to run, only how large they are. This results in an unintuitive notion of "simplicity"; a program that does something  times might be simpler than a program that does the same thing  times because the number  is easier to specify than .

In our example, it seems likely that "simulate the entire universe" is simpler than "simulate Earth" or "simulate part of Earth" because the initial conditions of the universe are simpler than the initial conditions of Earth. There is some additional complexity in picking out the specific human you care about. Since the local simulation is built around that human this will be easier in the local simulation than the universe simulation. However, in aggregate, it seems possible that "simulate the universe, pick out the typing" is the shortest program that predicts what your human will do next. Even so, "pick out the typing" is likely to be a very complicated procedure, making your total complexity quite high.

Whether simulating a different universe that simulates our universe is simpler depends a lot on the properties of that other universe. If that other universe is simpler than our universe, then we might run into an exec situation, where it's simpler to run that other universe and specify the human in their simulation of our universe.

This is troubling because that other universe might contain beings with different values than our own. If it's true that simulating that universe is the simplest way to predict our human, then some non-trivial fraction of our prediction might be controlled by a simulation in another universe. If these beings want us to act in certain ways, they have an incentive to alter their simulation to change our predictions.

At its core, this is the main argument why the Solomonoff prior is malign: a lot of the programs will contain agents with preferences, these agents will seek to influence the Solomonoff prior, and they will be able to do so effectively.

How many other universes?

The Solomonoff prior is running all possible Turing machines. How many of them are going to simulate universes? The answer is probably "quite a lot".

It seems like specifying a lawful universe can be done with very few bits. Conway's Game of Life is very simple and can lead to very rich outcomes. Additionally, it seems quite likely that agents with preferences (consequentialists) will appear somewhere inside this universe. One reason to think this is that evolution is a relatively simple mathematical regularity that seems likely to appear in many universes.

If the universe has a hospitable structure, due to instrumental convergence these agents with preferences will expand their influence. As the universe runs for longer and longer, the agents will gradually control more and more.

In addition to specifying how to simulate the universe, the TM must specify an output channel. In the case of Game of Life, this might be a particular cell sampled at a particular frequency. Other examples include whether or not a particular pattern is present in a particular region, or the parity of the total number of cells.

In summary, specifying lawful universes that give rise to consequentialists requires a very simple program. Therefore, the predictions generated by the Solomonoff prior will have some influential components comprised of simulated consequentialists.

How would they influence the Solomonoff prior?

Consequentialists that find themselves in universes can reason about the fundamental laws that govern their universe. If they find that their universe has relatively simple physics, they will know that their behavior contributes to the Solomonoff prior. To gain access to more resources in other universes, these consequentialists might seek to act in ways that influence the Solomonoff prior.

A contrived example of a decision other beings would want to manipulate is "what program should be written and executed next?" Beings in other universes would have an incentive to get us to write programs that were aligned with their values. A particularly interesting scenario is one in which they write themselves into existence, allowing them to effectively "break into" our universe.

For example, somewhere in the Solomonoff prior there is a program that goes something like: "Simulate this universe. Starting from the year 2100, every hour output '1' if there's a cubic meter of iron on the Moon, else output '0'." By controlling the presence/absence of a cubic meter of iron on the Moon, we would be able to influence the output of this particular facet of the Solomonoff prior.

This example is a very complicated program and thus will not have much weight in the Solomonoff prior. However, by reasoning over the complexity of possible output channels for their universe, consequentialists would be able to identify output channels that weigh heavily in the Solomonoff prior. For example, if I was in the Game of Life, I might reason that sampling cells that were living in the initial conditions of the universe is simpler than sampling other cells. Additionally, sampling cells and reporting their outputs directly is simpler than sampling cells and reversing their values. Therefore, I might choose to control regions close to the initial live cells of the universe for purposes of exerting influence over the Solomonoff prior.

Additionally, consequentialists can send across many plausible channels at once, focus efforts on channels that are easy to control, send through channels that would not unduly decrease their values for other reasons, etc. Since the proportional weight in the Solomonoff prior drops off exponentially with respect to complexity, it’s possible only very small regions of space would need to be used to achieve a large fraction of the maximum possible influence.

How much influence will they have?

A simple application of the no free lunch theorem shows that there is no way of making predictions that is better than the Solomonoff prior across all possible distributions over all possible strings. Thus, agents that are influencing the Solomonoff prior cannot be good at predicting, and thus gain influence, in all possible worlds.

In practice, this is irrelevant because consequentialists do not want to achieve equal predictive accuracy in all worlds; they only care about worlds in which being predictive results in increased influence. If I'm exerting control over the Solomonoff prior, I only care about influencing it in ways that might affect material consequences in other universes. For example, I do not care about gaining influence in universes inhospitable to life. Thus I will be able to trade off predictive power in universes I don't care about for predictive power in universes I do care about. This means that I should restrict my attention to all universes that have resources and the Solomonoff prior is being used to make important decisions, weighting appropriately.

Another interesting thing about the Solomonoff prior is it is actually a collection of priors. The "length" of a TM is defined relative to some universal TM. For any particular TM, it's length with respect to different universal TMs will vary; thus, there are versions of the Solomonoff prior that give more or less weight to any given TM. (Note that you can simulate universal TMs with universal TMs, so the difference across all programs will be bounded by a constant factor. However, this constant factor can be large, so the difference in relative weight between different Solomonoff priors can also be large).

In particular, this suggests a good strategy for consequentialists: find a universe that is using a version of the Solomonoff prior that has a very short description of the particular universe the consequentialists find themselves in.

The combined strategy is thus to take a distribution over all decisions informed by the Solomonoff prior, weight them by how much influence can be gained and the version of the prior being used, and read off a sequence of bits that will cause some of these decisions to result in a preferred outcome.

The question of how much influence any given universe of consequentialists will have is difficult to answer. One way of quantifying this is to think about how many “universes they don't care about” they're trading off for “universes they do care about” (really we should be thinking in terms of sequences, but I find reasoning about universes to be easier).

Since the consequentialists care about exerting maximum influence, we can approximate them as not caring about universes that don't use a version of the Solomonoff prior that gives them a large weight. This can be operationalized as only caring about universes that use universal TM in a particular set for their Solomonoff prior. What is the probability that a particular universe uses a universal TM from that set? I am not sure, but 1/million to 1/billion seems reasonable. This suggests a universe of consequentialists will only care about 1/million to 1/billion universes, which means they can devote a million/billion times the predictive power to universes they care about. This is sometimes called the “anthropic update”. (This post contains more discussion about this particular argument.)

Additionally, we might think about which decisions the consequentialists would care about. If a particular decision using the Solomonoff prior is important, consequentialists are going to care more about that decision than other decisions. Conservatively, perhaps 1/1000 decisions are "important" in this sense, giving another 1000x relative weighting.

After you condition on a decision being important and using a particular version of the Solomonoff prior, it thus seems quite likely that a non-trivial fraction of your prior is being controlled by consequentialists.

An intuition pump is that this argument is closer to an existence claim than a for-all claim. The Solomonoff prior is malign if there exists a simple universe of consequentialists that wants to influence our universe. This universe need not be simple in an absolute sense, only simple relative to the other TMs that could equal it in predictive power. Even if most consequentialists are too complicated or not interested, it seems likely that there is at least one universe that is.

Example

Complexity of Consequentialists

How many bits does it take to specify a universe that can give rise to consequentialists? I do not know, but it seems like Conway’s Game of Life might provide a reasonable lower bound.

Luckily, the code golf community has spent some amount of effort optimizing for program size. How many bytes would you guess it takes to specify Game of Life? Well, it depends on the universal TM. Possible answers include 6, 32, 39, or 96.

Since universes of consequentialists can “cheat” by concentrating their predictive efforts onto universal TMs in which they are particularly simple, we’ll take the minimum. Additionally, my friend who’s into code golf (he wrote the 96-byte solution!) says that the 6-byte answer actually contains closer to 4 bytes of information.

To specify an initial configuration that can give rise to consequentialists we will need to provide more information. The smallest infinite growth pattern in Game of Life has been shown to need 10 cells. Another reference point is that a self-replicator with 12 cells exists in HighLife, a Game of Life variant. I’m not an expert, but I think an initial configuration that gives rise to intelligent life can be specified in an 8x8 bounding box, giving a total of 8 bytes.

Finally, we need to specify a sampling procedure that consequentialists can gain control of. Something like “read <cell> every <large number> time ticks” suffices. By assumption, the cell being sampled takes almost no information to specify. We can also choose whatever large number is easiest to specify (the busy beaver numbers come to mind). In total, I don’t think this will take more than 2 bytes.

Summing up, Game of Life + initial configuration + sampling method takes maybe 16 bytes, so a reasonable range for the complexity of a universe of consequentialists might be 10-1000 bytes. That doesn’t seem like very many, especially relative to the amount of information we’ll be conditioning the Solomonoff prior on if we ever use it to make an important decision.

Complexity of Conditioning

When we’re using the Solomonoff prior to make an important decision, the observations we’ll condition on include information that:

  1. We’re using the Solomonoff prior
  2. We’re making an important decision
  3. We’re using some particular universal TM

How much information will this include? Many programs will not simulate universes. Many universes exist that do not have observers. Among universes with observers, some will not develop the Solomonoff prior. These observers will make many decisions. Very few of these decisions will be important. Even fewer of these decisions are made with the Solomonoff prior. Even fewer will use the particular version of the Solomonoff prior that gets used.

It seems reasonable to say that this is at least a megabyte of raw information, or about a million bytes. (I acknowledge some cart-horse issues here.)

This means that after you condition your Solomonoff prior, you’ll be left with programs that are at least a million bytes. As our Game of Life example shows, it only takes maybe 10-1000 of these bytes to specify a universe that gives rise to consequentialists. You have approximately a million bytes left to specify more properties of the universe that will make it more likely the consequentialists will want to exert influence over the Solomonoff prior for the purpose of influencing this particular decision.

Why might this argument be wrong?

Inaccessible Channels

Argument

Most of the universe is outside of humanity's light-cone. This might suggest that most "simple" ways to sample from our universe are currently outside our influence, meaning that the only portions of the Solomonoff prior we can control are going to have an extremely low weight.

In general, it might be the case that for any universe, consequentialists inside that universe are going to have difficulty controlling simple output channels. For example, in Game of Life, a simple way to read information might sample a cell particular cell starting at t=0. However, consequentialists in Game of Life will not appear until a much later time and will be unable to control a large initial chunk of that output channel.

Counter-argument

Paul Christiano points out that the general form of this argument also applies to other TMs that compose of your Solomonoff prior. For example, when predicting what I'll type next, you would "want" to simulate me and predict what I would type starting at some time T. However, this is a pretty complicated way of sampling. The fact that simple sampling procedures are less predictive doesn't asymmetrically penalize consequentialists. The consequentialists universe and sampling method only have to be simple relative to other programs that are equally good at predicting.

One might also note that large numbers can be produced with relatively few bits, so "sample starting at <large number>" is not much more complicated than "sample starting at 0".

Speedy Channels

Argument

There are many simple ways of sampling from universes very quickly. For example, in Game of Life, one can sample a cell every time-tick. It seems feasible for consequentialists to simulate Earth in the Game of Life, but not feasible to simulate Earth such that they can alter a specific cell every time tick per the simulation.

Counter-argument

Consequentialists in the Game of Life could simply simulate Earth, compute the predictions, then later broadcast them along very fast sampling channels. However, it might be the case that building a machine that alters a cell arbitrarily every time tick is impossible. In our universe, there might be sample procedures that physics does not permit us to exert arbitrary control over, e.g. due to speed of light limitations. If this is the case, consequentialists will direct efforts towards the simplest channel they can control.

Computational Burden

Argument

Determining how to properly influence the Solomonoff prior requires massive computation resources devoted to simulating other universes and how they're going to use the Solomonoff prior. While the Solomonoff prior does not penalize extremely long run-times, from the perspective of the consequentialists doing the simulating, run-times will matter. In particular, consequentialists will likely be able to use compute to achieve things they value (like we are capable of doing). Therefore, it would be extremely costly to exert influence over the Solomonoff prior, potentially to the point where consequentialists will choose not to do so.

Counter-argument

The computational burden of predicting the use of the Solomonoff in other universes is an empirical question. Since it's a relatively fixed cost and there are many other universes, consequentialists might reason that the marginal influence over these other universes is worth the compute. Issues might arise if the use of the Solomonoff prior in other universes is very sensitive to precise historical data, which would require a very precise simulation to influence, increasing the computational burden.

Additionally, some universes will find themselves with more computing power than other universes. Universes with a lot of computing power might find it relatively easy to predict the use of the Solomonoff prior in simpler universes and subsequently exert influence over them.

Malign implies complex

Argument

A predictor that correctly predicts the first N bits of a sequence then switches to being malign will be strictly more complicated than a predictor that doesn't switch to being malign. Therefore, while consequentialists in other universes might have some influence over the Solomonoff prior, they will be dominated by non-malign predictors.

Counter-argument

This argument makes a mistaken assumption that the malign influence on the Solomonoff prior is in the form of programs that have their "malignness" as part of the program. The argument given suggests that simulated consequentialists will have an instrumental reason to be powerful predictors. These simulated consequentialists have reasoned about the Solomonoff prior and are executing the strategy of "be good at predicting, then exert malign influence", but this strategy is not hardcoded so exerting malign influence does not add complexity.

Canceling Influence

Argument

If it's true that many consequentialists are trying to influence the Solomonoff prior, then one might expect the influence to cancel out. It's improbable that all the consequentialists have the same preferences; on average, there should be an equal number of consequentialists trying to influence any given decision in any given direction. Since the consequentialists themselves can reason thus, they will realize that the expected amount of influence is extremely low, so they will not attempt to exert influence at all. Even if some of the consequentialists try to exert influence anyway, we should expect the influence of these consequentialists to cancel out also.

Counter-argument

Since the weight of a civilization of consequentialists in the Solomonoff prior is penalized exponentially with respect to complexity, it might be the case that for any given version of the Solomonoff prior, most of the influence is dominated by one simple universe. Different values of consequentialists imply that they care about different decisions, so for any given decision, it might be that very few universes of consequentialists are both simple enough that they have enough influence and care about that decision.

Even if for any given decision, there are always 100 universes with equal influence and differing preferences, there are strategies that they might use to exert influence anyway. One simple strategy is for each universe to exert influence with a 1% chance, giving every universe 1/100 of the resources in expectation. If the resources accessible are vast enough, then this might be a good deal for the consequentialists. Consequentialists would not defect against each other for the reasons that motivate functional decision theory.

More exotic solutions to this coordination problem include acausal trade amongst universes of different consequentialists to form collectives that exert influence in a particular direction.

Be warned that this leads to much weirdness.

Conclusion

The Solomonoff prior is very strange. Agents that make decisions using the Solomonoff prior are likely to be subject to influence from consequentialists in simulated universes. Since it is difficult to compute the Solomonoff prior, this fact might not be relevant in the real world.

However, Paul Christiano applies roughly the same argument to claim that the implicit prior used in neural networks is also likely to generalize catastrophically. (See Learning the prior for a potential way to tackle this problem).

Addendum

Warning: highly experimental interesting speculation.

Unimportant Decisions

Consequentialists have a clear motive to exert influence over important decisions. What about unimportant decisions?

The general form of the above argument says: "for any given prediction task, the programs that do best are disproportionately likely to be consequentialists that want to do well at the task". For important decisions, many consequentialists would instrumentally want to do well at the task. However, for unimportant decisions, there might be consequentialists that want to make good predictions. These consequentialists would still be able to concentrate efforts on versions of the Solomonoff prior that weighted them especially high, so they might outperform other programs in the long run.

It's unclear to me whether or not this behavior would be malign. One reason why it might be malign is that these consequentialists that care about predictions would want to make our universe more predictable. However, while I am relatively confident that arguments about instrumental convergence should hold, speculating about possible preferences of simulated consequentialists seems likely to produce errors in reasoning.

Hail mary

Paul Christiano suggests that humanity was desperate enough to want to throw a "hail mary", one way to do this is to use the Solomonoff prior to construct a utility function that will control the entire future. Since this is a very important decision, we expect consequentialists in the Solomonoff prior to care about influencing this decision. Therefore, the resulting utility function is likely to represent some simulated universe.

If arguments about acausal trade and value handshakes hold, then the resulting utility function might contain some fraction of human values. Again, this leads to much weirdness in many ways.

Speed prior

One reason that the Solomonoff prior contains simulated consequentialists is that its notion of complexity does not penalize runtime complexity, so very simple programs are allowed to perform massive amounts of computation. The speed prior attempts to resolve this issue by penalizing programs by an additional logarithm of the amount of time for which it's run.

The speed prior might reduce the relative weighting of universes with consequentialists because such programs have to be run for a very long time before they start producing reasonable predictions. The consequentialists have to gain control of their universe, understand their fundamental laws of physics, simulate other universes, then manipulate the speed prior. This might all take a very long time, causing consequentialists to be dominated by other programs.

In general, penalizing slowness might cause programs to "waste" less time on simulating consequentialists, devoting more computation towards performing the prediction task. However, it still might be the case that a universe that has very fast consequentialists might still end up dominating.

Evan Hubinger also provides an argument that even very fast programs are possibly malign. The key insight is that even though your program needs to be fast, it can be running programs that are themselves less fast. For example, one possible fast way to solve a classification problem is to search to find a neural network, then use that network for inference. However, if you wanted your search to find a fast neural network, then the search itself might take longer, resulting in a net increase in speed. Thus, time "waste" can manifest in programs that were explicitly optimized to not "waste" time. This "wasted" time could potentially be used for malign optimization, in this case for gaining influence over the speed prior.

Randomized prior

A potential way to reduce the influence consequentialists have on a decision made by the Solomonoff prior is to randomize the particular version of the prior that gets used. For example, we might make the particular universal TM we use dependent on very precise historical data. Thus, it would be very costly for consequentialists to simulate this history, and thus costly to predict which form of the Solomonoff prior we used.

If consequentialists can’t predict which Solomonoff prior we are going to use, no particular civilization of consequentialists will have the large advantage conferred by the anthropic update. Therefore, one might hope that all civilizations of consequentialists will not care about that particular decision.

This argument makes a couple of assumptions. First, it assumes that simulating very precise histories is difficult; it might not be difficult for all universes. Second, it assumes that the universes through which influence is spread cannot coordinate, which might be possible for through acausal means.

Symmetry considerations

The way that humanity reasons is evidence for the way that consequentialists in other universes will reason. If humanity reasons that the Solomonoff prior is malign and therefore is unwilling to use it to make decisions, then consequentialists in other universes might do likewise. These universes would not use the Solomonoff prior to make decisions.

The resulting state is that everyone is worried about the Solomonoff prior being malign, so no one uses it. This means that no universe will want to use resources trying to influence the Solomonoff prior; they aren’t influencing anything.

This symmetry obviously breaks if there are universes that do not realize that the Solomonoff prior is malign or cannot coordinate to avoid its use. One possible way this might happen is if a universe had access to extremely large amounts of compute (from the subjective experience of the consequentialists). In this universe, the moment someone discovered the Solomonoff prior, it might be feasible to start making decisions based on a close approximation.

Recursion

Universes that use the Solomonoff prior to make important decisions might be taken over by consequentialists in other universes. A natural thing for these consequentialists to do is to use their position in this new universe to also exert influence on the Solomonoff prior. As consequentialists take over more universes, they have more universes through which to influence the Solomonoff prior, allowing them to take over more universes.

In the limit, it might be that for any fixed version of the Solomonoff prior, most of the influence is wielded by the simplest consequentialists according to that prior. However, since complexity is penalized exponentially, gaining control of additional universes does not increase your relative influence over the prior by that much. I think this cumulative recursive effect might be quite strong, or might amount to nothing.

121

Ω 43

34 comments, sorted by Highlighting new comments since Today at 10:35 AM
New Comment

Curated. This post does a good job of summarizing a lot of complex material, in a (moderately) accessible fashion.

+1 I already said I liked it, but this post is great and will immediately be the standard resource on this topic. Thank you so much.

"At its core, this is the main argument why the Solomonoff prior is malign: a lot of the programs will contain agents with preferences, these agents will seek to influence the Solomonoff prior, and they will be able to do so effectively."

First, this is irrelevant to most applications of the Solomonoff prior.  If I'm using it to check the randomness of my random number generator, I'm going to be looking at 64-bit strings, and probably very few intelligent-life-producing universe-simulators output just 64 bits, and it's hard to imagine how an alien in a simulated universe would want to bias my RNG anyway.

The S. prior is a general-purpose prior which we can apply to any problem.  The output string has no meaning except in a particular application and representation, so it seems senseless to try to influence the prior for a string when you don't know how that string will be interpreted.

Can you give an instance of an application of the S. prior in which, if everything you wrote were correct, it would matter?

Second, it isn't clear that this is a bug rather than a feature.  Say I'm developing a program to compress photos.  I'd like to be able to ask "what are the odds of seeing this image, ever, in any universe?"  That would probably compress images of plants and animals better than other priors, because in lots of universes life will arise and evolve, and features like radial symmetry, bilateral symmetry, leafs, legs, etc., will arise in many universes.  This biasing of priors by evolution doesn't seem to me different than biasing of priors by intelligent agents; evolution is smarter than any agent we know.  And I'd like to get biasing from intelligent agents, too; then my photo-compressor might compress images of wheels and rectilinear buildings better.

Also in the category of "it's a feature, not a bug" is that, if you want your values to be right, and there's a way of learning the values of agents in many possible universes, you ought to try to figure out what their values are, and update towards them.  This argument implies that you can get that for free by using Solomonoff priors.

(If you don't think your values can be "right", but instead you just believe that your values morally oblige you to want other people to have those values, you're not following your values, you're following your theory about your values, and probably read too much LessWrong for your own good.)

Third, what do you mean by "the output" of a program that simulates a universe? How are we even supposed to notice the infinitesimal fraction of that universe's output which the aliens are influencing to subvert us?  Take your example of Life--is the output a raster scan of the 2D bit array left when the universe goes static?  In that case, agents have little control over the terminal state of their universe (and also, in the case of Life, the string will be either almost entirely zeroes, or almost entirely 1s, and those both already have huge Solomonoff priors).  Or is it the concatenation of all of the states it goes through, from start to finish?  In that case, by the time intelligent agents evolve, their universe will have already produced more bits than our universe can ever read.

Are you imagining that bits are never output unless the accidentally-simulated aliens choose to output a bit?  I can't imagine any way that could happen, at least not if the universe is specified with a short instruction string.

This brings us to the 4th problem:  It makes little sense to me to worry about averaging in outputs from even mere planetary simulations if your computer is just the size of a planet, because it won't even have enough memory to read in a single output string from most such simulations.

5th, you can weigh each program's output proportional to 2^-T, where T is the number of steps it takes the TM to terminate.  You've got to do something like that anyway, because you can't run TMs to completion one after another; you've got to do something like take a large random sample of TMs and iteratively run each one step.  Problem solved.

Maybe I'm misunderstanding something basic, but I feel like we're talking about many angels can dance on the head of a pin.

Perhaps the biggest problem is that you're talking about an entire universe of intelligent agents conspiring to change the "output string" of the TM that they're running in.  This requires them to realize that they're running in a simulation, and that the output string they're trying to influence won't even be looked at until they're all dead and gone.  That doesn't seem to give them much motivation to devote their entire civilization to twiddling bits in their universe's final output in order to shift our priors infinitesimally.  And if it did, the more likely outcome would be an intergalactic war over what string to output.

(I understand your point about them trying to "write themselves into existence, allowing them to effectively "break into" our universe", but as you've already required their TM specification to be very simple, this means the most they can do is cause some type of life that might evolve in their universe to break into our universe.  This would be like humans on Earth devoting the next billion years to tricking God into re-creating slime molds after we're dead.  Whereas the things about themselves that intelligent life actually care about with and self-identify with are those things that distinguish them from their neighbors.  Their values will be directed mainly towards opposing the values of other members of their species.  None of those distinguishing traits can be implicit in the TM, and even if they could, they'd cancel each other out.)

Now, if they were able to encode a message to us in their output string, that might be more satisfying to them.  Like, maybe, "FUCK YOU, GOD!"

The S. prior is a general-purpose prior which we can apply to any problem. The output string has no meaning except in a particular application and representation, so it seems senseless to try to influence the prior for a string when you don't know how that string will be interpreted.

The claim is that consequentalists in simulated universes will model decisions based on the Solomonoff prior, so they will know how that string will be interpreted.

Can you give an instance of an application of the S. prior in which, if everything you wrote were correct, it would matter?

Any decision that controls substantial resource allocation will do. For example, if we're evaluting the impact of running various programs, blow up planets, interfere will alien life, etc.

Also in the category of "it's a feature, not a bug" is that, if you want your values to be right, and there's a way of learning the values of agents in many possible universes, you ought to try to figure out what their values are, and update towards them. This argument implies that you can get that for free by using Solomonoff priors.

If you are a moral realist, this does seem like a possible feature of the Solomonoff prior.

Third, what do you mean by "the output" of a program that simulates a universe?

A TM that simulates a universe must also specify an output channel.

Take your example of Life--is the output a raster scan of the 2D bit array left when the universe goes static? In that case, agents have little control over the terminal state of their universe (and also, in the case of Life, the string will be either almost entirely zeroes, or almost entirely 1s, and those both already have huge Solomonoff priors). Or is it the concatenation of all of the states it goes through, from start to finish?

All of the above. We are running all possible TMs, so all computable universes will be paired will all computable output channels. It's just a question of complexity.

Are you imagining that bits are never output unless the accidentally-simulated aliens choose to output a bit? I can't imagine any way that could happen, at least not if the universe is specified with a short instruction string.

No.

This brings us to the 4th problem: It makes little sense to me to worry about averaging in outputs from even mere planetary simulations if your computer is just the size of a planet, because it won't even have enough memory to read in a single output string from most such simulations.

I agree that approximation the Solmonoff prior is difficult and thus its malignancy probably doesn't matter in practice. I do think similar arguments apply to cases that do matter.

5th, you can weigh each program's output proportional to 2^-T, where T is the number of steps it takes the TM to terminate. You've got to do something like that anyway, because you can't run TMs to completion one after another; you've got to do something like take a large random sample of TMs and iteratively run each one step. Problem solved.

See the section on the Speed prior.

Perhaps the biggest problem is that you're talking about an entire universe of intelligent agents conspiring to change the "output string" of the TM that they're running in. This requires them to realize that they're running in a simulation, and that the output string they're trying to influence won't even be looked at until they're all dead and gone. That doesn't seem to give them much motivation to devote their entire civilization to twiddling bits in their universe's final output in order to shift our priors infinitesimally. And if it did, the more likely outcome would be an intergalactic war over what string to output.

They don't have to realize they're in a simulation, they just have to realize their universe is computable. Consequentialists care about their values after they're dead. The cost of influncing the prior might not be that high because they only have to compute it once and the benefit might be enormous. Exponential decay + acausal trade make an intergalactic war unlikely.

This is great. I really appreciate when people try to summarize complex arguments that are spread across multiple posts.

Also, I basically do this (try to infer the right prior). My guiding navigation is trying to figure out what (I call) the super cooperation cluster would do then do that.

I liked this post a lot, but I did read it as something of a scifi short story with a McGuffin called "The Solomonoff Prior".

It probably also seemed really weird because I just read Why Philosophers Should Care About Computational Complexity [PDF] by Scott Aaronson and having read it makes sentences like this seem 'not even' insane:

The combined strategy is thus to take a distribution over all decisions informed by the Solomonoff prior, weight them by how much influence can be gained and the version of the prior being used, and read off a sequence of bits that will cause some of these decisions to result in a preferred outcome.

The Consequentialists are of course the most badass (by construction) alien villains ever "trying to influence the Solomonoff prior" as they are wont!

Given that some very smart people seem to seriously believe in Platonic realism, maybe there are Consequentialists malignly influencing vast infinities of universes! Maybe our universe is one of them.

I'm not sure why, but I feel like the discovery of a proof of P = NP or P ≠ NP is the climax of the heroes valiant struggle, as the true heirs of the divine right to wield The Solomonoff Prior, against the dreaded (other universe) Consequentialists.

If it's true that simulating that universe is the simplest way to predict our human, then some non-trivial fraction of our prediction might be controlled by a simulation in another universe. If these beings want us to act in certain ways, they have an incentive to alter their simulation to change our predictions.

I find this confusing. I'm not saying it's wrong, necessarily, but it at least feels to me like there's a step of the argument that's being skipped.

To me, it seems like there's a basic dichotomy between predicting and controlling. And this is claiming that somehow an agent somewhere is doing both. (Or actually, controlling by predicting!) But how, exactly?

Is it that:

  • these other agents are predicting us, by simulating us, and so we should think of ourselves as partially existing in their universe? (with them as our godlike overlords who can continue the simulation from the current point as they wish)
  • the Consequentialists will predict accurately for a while, and then make a classic "treacherous turn" where they start slipping in wrong predictions designed to influence us rather than be accurate, after having gained our trust?
  • something else?

My guess is that it's the second thing (in part from having read, and very partially understood, Paul's posts on this a while ago). But then I would expect some discussion of the "treacherous turn" aspect of it -- of the fact that they have to predict accurately for a while (so that we rate them highly in our ensemble of programs), and only then can they start outputting predictions that manipulate us.

Is that not the case? Have I misunderstood something?

(Btw, I found the stuff about python^10 and exec() pretty clear. I liked those examples. Thank you! It was just from this point on in the post that I wasn't quite sure what to make of it.)

My understanding is the first thing is what you get with UDASSA and the second thing would be what you get is if you think the Solomonoff prior is useful for predicting your universe for some other reason (ie not because you think the likelihood of finding yourself in some situation covaries with the Solomonoff prior's weight on that situation)

It seems to me that using a combination of execution time, memory use and program length mostly kills this set of arguments.

Something like a game-of-life initial configuration that leads to the eventual evolution of intelligent game-of-life aliens who then strategically feed outputs into GoL in order to manipulate you may have very good complexity performance, but both the speed and memory are going to be pretty awful. The fixed cost in memory and execution steps of essentially simulating an entire universe is huge.

But yes, the pure complexity prior certainly has some perverse and unsettling properties.

EDIT: This is really a special case of Mesa-Optimizers being dangerous. (See, e.g. https://www.lesswrong.com/posts/XWPJfgBymBbL3jdFd/an-58-mesa-optimization-what-it-is-and-why-we-should-care). The set of dangerous Mesa-Optimizers is obviously bigger than just "simulated aliens" and even time- and space-efficient algorithms might run into them.

Complexity indeed matters: the universe seems to be bounded in both time and space, so running anything like Solomonoff prior algorithm (in one of its variants) or AIXI may be outright impossible for any non-trivial model. This for me significantly weakens or changes some of the implications.

A Fermi upper bound of the direct Solomonoff/AIXI algorithm trying TMs in the order of increasing complexity: even if checking one TM took one Planck time on one atom, you could only check cca 10^250=2^800 machines within a lifetime of the universe (~10^110 years until Heat death), so the machines you could even look at have description complexity a meager 800 bits.

  • You could likely speed the greedy search up, but note that most algorithmic speedups do not have a large effect on the exponent (even multiplying the exponent with constants is not very helpful).
  • Significantly narrowing down the space of TMs to a narrow subclass may help, but then we need to take look at the particular (small) class of TMs rather than have intuitions about all TMs. (And the class would need to be really narrow - see below).
  • Due to the Church-Turing thesis, any limiting the scope of the search is likely not very effective, as you can embed arbitrary programs (and thus arbitrary complexity) in anything that is strong enough to be a TM interpreter (which the universe is in multiple ways).
  • It may be hypothetically possible to search for the "right" TMS without examining them individually (witch some future tech, e.g. how sci-fi imagined quantum computing), but if such speedup is possible, any TMs modelling the universe would need to be able to contain this. This would increase any evaluation complexity of the TMs, making them more significantly costly than the Planck time I assumed above (would need a finer Fermi estimate with more complex assumptions?).

I am not so convinced that penalizing more stuff will make these arguments weak enough that we don't have to worry about them. For an example of why I think this, see Are minimal circuits deceptive?. Also, adding execution/memory constraints penalizes all hypothesis and I don't think universes with consequentialists are asymmetrically penalized.

I agree about this being a special case of mesa-optimization.

adding execution/memory constraints penalizes all hypothesis

In reality these constraints do exist, so the question of "what happens if you don't care about efficiency at all?" is really not important. In practice, efficiency is absolutely critical and everything that happens in AI is dominated by efficiency considerations.

I think that mesa-optimization will be a problem. It probably won't look like aliens living in the Game of Life though.

It'll look like an internal optimizer that just "decides" that the minds of the humans who created it are another part of the environment to be optimized for its not-correctly-aligned goal.

Great post. I encountered many new ideas here.

One point confuses me. Maybe I'm missing something. Once the consequentialists in a simulation are contemplating the possibility of simulation, how would they arrive at any useful strategy? They can manipulate the locations that are likely to be the output/measurement of the simulation, but manipulate to what values? They know basically nothing about how the input will be interpreted, what question the simulator is asking, or what universe is doing the simulation. Since their universe is very simple, presumably many simulators are running identical copies of them, with different manipulation strategies being appropriate for each. My understanding of this sounds less like malign and more like blindly mischievous.

TLDR How do the consequentialists guess which direction to bias the output towards?

Consequentialists can reason about situations in which other beings make important decisions using the Solomonoff prior. If the multiple beings are simulated them, they can decide randomly (because having e.g. 1/100 of the resources is better than none, which is the expectation of "blind mischievousness").

An example of this sort of reasoning is Newcomb's problem with the knowledge that Omega is simulating you. You get to "control" the result of your simulation by controlling how you act, so you can influence whether or not Omega expects you to one-box or two-box, controlling whether there is $1,000,000 in one of the boxes.

Okay, deciding randomly to exploit one possible simulator makes sense.

As for choosing exactly what to see the output cells of the simulation to... I'm still wrapping my head around it. Is recursive simulation the only way to exploit these simulations from within?

At its core, this is the main argument why the Solomonoff prior is malign: a lot of the programs will contain agents with preferences, these agents will seek to influence the Solomonoff prior, and they will be able to do so effectively.

Am I the only one who sees this much less as a statement that the Solomonoff prior is malign, and much more a statement that reality itself is malign? I think that the proper reaction is not to use a different prior, but to build agents that are robust to the possibility that we live in a simulation run by influence seeking malign agents so that they don't end up like this.

If arguments about acausal trade and value handshakes hold, then the resulting utility function might contain some fraction of human values.

I think Paul's Hail Mary via Solomonoff prior idea is not obviously related to acausal trade. (It does not privilege agents that engage in acausal trade over ones that don't.)

I agree. The sentence quoted is a separate observation.

Such a great post.

Note that I changed the formatting of your headers a bit, to make some of them just bold text. They still appear in the ToC just fine. Let me know if you'd like me to revert it or have any other issues.

Looks better - thanks!

Is the link for the 6-byte Code Golf solution correct? It takes me to something that appears to be 32 bytes.

Nope. Should be fixed now.

I like this post, which summarizes other posts I wanted to read for a long time.

Yet I'm still confused by a fairly basic point: why would the agents inside the prior care about our universe? Like, I have preferences, and I don't really care about other universes. Is it because we're running their universe, and thus they can influence their own universe through ours? Or is there another reason why they are incentivized to care about universes which are not causally related to theirs?

I don't really care about other universes

Why not? I certainly do. If you can fill another universe with people living happy, fulfilling lives, would you not want to?

Okay, it's probably subtler than that.

I think you're hinting at things like the expanding moral circle. And according to that, there's no reason that I should care more about people in my universe than people in other universes. I think this makes sense when saying whether I should care. But the analogy with "caring about people in a third world country on the other side of the world" breaks down when we consider our means to influence these other universes. Being able to influence the Solomonoff prior seems like a very indirect way to alter another universe, on which I have very little information. That's different from buying Malaria nets.

So even if you're altruistic, I doubt that "other universes" would be high in your priority list.

The best argument I can find for why you would want to influence the prior is if it is a way to influence the simulation of your own universe, à la gradient hacking.

I personally see no fundamental difference between direct and indirect ways of influence, except in so far as they relate to stuff like expected value.

I agree that given the amount expected influence, other universes are not high on my priority list, but they are still on my priority list. I expect the same for consequentialists in other universes. I also expect consequentialist beings that control most of their universe to get around to most of the things on their priority list, hence I expect them to influence the Solmonoff prior.

In your section "complexity of conditioning", if I am understanding correctly, you compare the amount of information required to produce consequentialists with the amount of information in the observations we are conditioning on. This, however, is not apples to oranges: the consequentialists are competing against the "true" explanation of the data, the one that specifies the universe and where to find the data within it, they are not competing against the raw data itself. In an ordered universe, the "true" explanation would be shorter than the raw observation data, that's the whole point of using Solomonoff induction after all.

So, there are two advantages the consequentialists can exploit to "win" and be the shorter explanation. This exploitation must be enough to overcome those 10-1000 bits. One is that, since the decision which is being made is very important, they can find the data within the universe without adding any further complexity. This, to me, seems quite malign, as the "true" explanation is being penalized simply because we cannot read data directly from the program which produces the universe, not because this universe is complicated.

The second possible advantage is that these consequentialists may value our universe for some intrinsic reason, such as the life in it, so that they prioritize it over other universes and therefore it takes less bits to specify their simulation of it. However, if you could argue that the consequentialists actually had an advantage here which outweighed their own complexity, this would just sound to me like an argument that we are living in a simulation, because it would essentially be saying that our universe is unduly tuned to be valuable for consequentialists, to such a degree that the existence of these consequentialists is less of a coincidence than it just happening to be that valuable.

In your section "complexity of conditioning", if I am understanding correctly, you compare the amount of information required to produce consequentialists with the amount of information in the observations we are conditioning on. This, however, is not apples to oranges: the consequentialists are competing against the "true" explanation of the data, the one that specifies the universe and where to find the data within it, they are not competing against the raw data itself. In an ordered universe, the "true" explanation would be shorter than the raw observation data, that's the whole point of using Solomonoff induction after all.

The data we're conditioning on has K-complexity of one megabyte. Maybe I didn't make this clear.

So, there are two advantages the consequentialists can exploit to "win" and be the shorter explanation. This exploitation must be enough to overcome those 10-1000 bits. One is that, since the decision which is being made is very important, they can find the data within the universe without adding any further complexity. This, to me, seems quite malign, as the "true" explanation is being penalized simply because we cannot read data directly from the program which produces the universe, not because this universe is complicated.

I don't think I agree with this. Thinking in terms of consequentialists competing against "true" explanations doesn't make that much sense to me. It seems similar to making the exec hello world "compete" against the "true" print hello world.

The "complexity of consequentialists" section answers the question of "how long is the exec function?" where the "interpreter" exec calls is a universe filled with consequentialists.

However, if you could argue that the consequentialists actually had an advantage here which outweighed their own complexity, this would just sound to me like an argument that we are living in a simulation, because it would essentially be saying that our universe is unduly tuned to be valuable for consequentialists, to such a degree that the existence of these consequentialists is less of a coincidence than it just happening to be that valuable.

I do not understand what this is saying. I claim that consequentialists can reason about our universe by thinking about TMs because our universe is computable. Given that our universe supports life, it might thus be valuable to some consequentialists in other universes. I don't think the argument takes a stance on whether this universe is a simulation; it merely claims that this universe could be simulated.

the initial conditions of the universe are simpler than the initial conditions of Earth.

This seems to violate a conservation of information principle in quantum mechanics.

perhaps would have been better worded as "the simplest way to specify the initial conditions of Earth is to specify the initial conditions of the universe, the laws of physics, and the location of Earth."

Right, you're interested in syntactic measures of information, more than a physical one  My bad.

I'm trying to wrap my head around this. Would the following be an accurate restatement of the argument?

  1. Start with the Dr. Evil thought experiment, which shows that it's possible to be coerced into doing something by an agent who has no physical access to you, other than communication.
  2. We can extend this to the case where the agents are in two separate universes, if we suppose that (a) the communication can be replaced with an acausal negotation, with each agent deducing the existence and motives of the other; and that (b) the Earthlings (the ones coercing Dr. Evil) care about what goes on in Dr. Evil's universe.
    • Argument for (a): With sufficient computing power, one can run simulations of another universe to figure out what agents live within that universe.
    • Argument for (b): For example, the Earthlings might want Dr. Evil to write embodied replicas of them in his own universe, thus increasing the measure of their own consciousness. This is not different in kind from you wanting to increase the probability of your own survival - in both cases, the goal is to increase the measure of worlds in which you live.
  3. To promote their goal, when the Earthlings run their simulation of Dr. Evil, they will intervene in the simulation to punish/reward the simulated Dr. Evil depending on whether he does what they (the Earthlings) want.
  4. For his own part, Dr. Evil, if he is using the Solomonoff prior to predict what happens next in his universe, must give some probability to the hypothesis that him being in such a simulation is in fact what explains all of his experiences up till that point (rather than him being a ground-level being). And if that hypothesis is true, then Dr. Evil will expect to be rewarded/punished based on whether he carries out the wishes of the Earthlings. So, he will modify his actions accordingly.
  5. The probability of the simulation hypothesis may be non-negligible, because the Solomonoff prior considers only the complexity of the hypothesis and not that of the computation unfolding from it. In fact, the hypothesis "There is a universe with laws A+B+C, which produces Earthlings who run a simulation with laws X+Y+Z which produces Dr. Evil, but then intervene in the simulation as described in #3" may actually be simpler (and thus more probable) than "There is a universe with laws X+Y+Z which produces Dr. Evil, and those laws hold forever".

I think this is wrong, but I'm having trouble explaining my intuitions. There are a few parts;

  1. You're not doing Solomonoff right, since you're meant to condition on all observations. This makes it harder for simple programs to interfere with the outcome.
  2. More importantly but harder to explain, you're making some weird assumptions of the simplicity of meta-programs that I would bet are wrong. There seems to be a computational difficulty here, in that you envision  small worlds trying to manipulate  other worlds, where . That makes it really hard for the simplest program to be one where the meta-program that's interpreting the pointer to our world is a rational agent, rather than some more powerful but less grounded search procedure. If ‘naturally’ evolved agents are interpreting the information pointing to the situation they might want to interfere with, this limits the complexity of that encoding. If they're just simulating a lot of things to interfere with as many worlds as possible, they ‘run out of room’, because .
  3. Your examples almost self-refute, in the sense that if there's an accurate simulation of you being manipulated at time , it implies that simulation is not materially interfered with at time , so even if the vast majority of Solomonoff inductions have attempted adversary, most of them will miss anyway. Hypothetically, superrational agents might still be able coordinate to manipulate some very small fraction of worlds, but it'd be hard and only relevant to those worlds.
  4. Compute has costs. The most efficient use of compute is almost always to do enact your preferences directly, not manipulate other random worlds with low probability. By the time you can interfere with Solomonoff, you have better options.
  5. To the extent that a program  is manipulating predictions so that another other program that is simulating  performs unusually... well, then that's just how the metaverse is. If the simplest program containing your predictions is an attempt at manipulating you, then the simplest program containing you is probably being manipulated.

Wouldn't complexity of earth and conditioning on importance be irrelevant because it would still appear in consequentialists' distribution of strings and in specification of what kind of consequentialists we want? Therefore they will only have the advantage of anthropic update, that would go to zero in the limit of string's length, because choice of the language would correlate with string's content, and penalty for their universe + output channel.