Boltzmann brain decision theory

by Stuart_Armstrong 4 min read11th Sep 201812 comments


Ω 3

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Suppose I told you that you had many Boltzmann brain copies of you. Is it then your duty to be as happy as possible, so that these copies were also happy?

(Now some people might argue that you can't really make yourself happy through mental effort; but you can certainly make yourself sad, so avoiding doing that also counts.)

So, I told you that some proportion of your Boltzmann brain copies were happy and some were sad, it seems that the best thing you could do, to increase the proportion of happy ones, is to be happy all the time - after all, who knows when in your life a Boltzmann brain might "happen"?

But that reasoning is wrong, a standard error of evidential decision theories. Being happy doesn't make your Boltzmann brain copies happy; instead, it ensures that among all the existing Boltzmann brains, only the happy ones may be copies of you.

This is similar to the XOR blackmail problem in functional decision theory. If you pay Omega when they send the blackmail letter ("You currently have a termite problem in your house iff you won't sent me £1000"), you're not protecting your house; instead, you're determining whether you live in a world where Omega will send the letter.

On the other hand, if there were long-lived identical copies of you scattered around the place, and you cared about their happiness in a total utilitarian style way, then it seems there is a strong argument you should make yourself happy. So, somewhere between instantaneous Boltzmann brains and perfect copies, your decision process changes. What forms the boundaries between the categories?

Duration and causality

If a Boltzmann brain only has time to make a single decision, then immediately vanishes, then that decision is irrelevant. So we have to have long-lived Boltzmann brains, where long-lived means a second or more.

Similarly, the decision has to be causally connected to the Boltzmann brain's subsequent experience. It makes no sense if you decide to be happy, and then your brain gets immediately flooded with pain immediately after - or the converse. Your decision only matters if your view of causality is somewhat accurate. Therefore you require long-lived Boltzmann brains who respect causality.

In a previous post, I showed that the evidence seems to suggest that Boltzmann brains caused by quantum fluctuations are generally very short-lived (this seems a realistic result) and that they don't obey causality (I'm more uncertain about this).

In contrast, for Boltzmann brains created by nucleation in an expanding universe, most observer moments belong to Boltzmann brains in Boltzmann simulations: exceptionally long lived, with causality. They are, however, much - much! - less probable than quantum fluctuations.

Decision theory: causal

Assume, for the moment, that you are an unboundely rational agent (congratulations, btw, on winning all the Clay institute prizes, on cracking all public-key encryption, on registering patents on all imaginable inventions, and for solving friendly AI).

You have decent estimates as to how many Boltzmann brains are long-lived with causality, how many use your decision theory, and how many are exact copies of you.

If you are using a causal decision theory, then only your exact copies matter - the ones where you are unsure of whether you are them or you are "yourself". Let be the probability that you are a Boltzmann brain at this very moment, let be an action and decompose your preferences into , where is some utility function and is happiness. By an abuse of notation, I'll write for the expected given that action is taken by the "real" you, for expected given that action is taken by a Boltzmann brain copy of you, and similarly for .

Then the expected utility for action is:

If we restrict our attention to medium-long duration Boltzmann brains, say ten seconds or less (though remember that Boltzmann simulations are an issue!), and assume that is reasonably defined over the real world, we can neglect (since all actions the Boltzmann brain takes will have little to no impact on ), and use the expression:

This formula seems pretty intuitive: you trade off the small increase in happiness in your Boltzmann brain (), with the probability of being a Boltzmann brain (), and the utility and happiness you can expect from your normal life.

Decision theory: functional

If you're using a more sensible functional decision theory, and are a total utilitarian altruist where happiness is concerned, the expression is somewhat different. Let be the set of Boltzmann brains (not necessarily copies of you) that will take decision iff you do. For any given , let be the fact that takes action , let be the probability of existing (not the probability of you being ), and let be the happiness of .

Then the expected utility for action a is:

Note that need not be the utility of b at all - you are altruistic for happiness, not for general goals. As before, if contains only medium-long duration Boltzmann brains (or if the actions of these agents are independent of ), we can simplify to:

Because of the summation, the happiness of the Boltzmann brains can come to dominate your decisions, even if you yourself are pretty certain not to be a Boltzmann brain.

Variations of this, for different levels and types of altruism, should be clear now.

Bounded rationality

But neither of us is unboundedly rational (a shame, really). What should we do, in the real world, if we think that Boltzmann brains are worth worrying about? Assume that your altruism and probabilities point towards Boltzmann brain happiness being the dominant consideration.

A key point of FDT/UDT is that your decisions only make a difference when they make something happen differently. That sounds entirely tautological, but let's think about the moment in which an unbounded rational agent might be taking a different decision in order to make Boltzmann brains happy. When it is doing this, it is applying FDT, and considering the happiness of the Boltzmann brains, and then deciding to be happy.

And the same is true for you. Apart from personal happiness, you should take actions to make yourself happier only when you are using altruistic UDT and thinking about Boltzmann brain problems. So right now might be a good time.

This might feel perverse - is that really the only time the fact is relevant? Is there nothing else you could do - like make yourself into a consistent UDT agent in the first place?

But remember the point in the introduction - naively making yourself happy means that your Boltzmann brain copies will be happy: but this isn't actually increasing the happiness across all Boltzmann brains, just changing which ones are copies of you. Similarly, none of the following will change anything about other brains in the universe:

  • Becoming more of FDT/UDT agent.
  • Becoming more happy in general.
  • Becoming more altruistic in general.

They won't change anything, because they don't have any acausal impact on Boltzmann brain copies. More surprisingly, neither will the following:

  • Make it easier for you to make yourself happy when needed.

That will not make any difference; some Boltzmann brains will find it easy to make themselves happy, others will find it hard. But the action a that Boltzmann brains should take in these situations is something like "make yourself happy, as best you can". Changing the "as best you can" for you doesn't change it for Boltzmann brains.


Ω 3