Revisiting the anthropic trilemma I: intuitions and contradictions

by Stuart_Armstrong3 min read15th Feb 201110 comments


Personal Blog

tl;dr: in which I apply intuition to the anthropic trilemma, and it all goes horribly, horribly wrong

Some time ago, Eliezer constructed an anthropic trilemma, where standard theories of anthropic reasoning seemed to come into conflict with subjective anticipation. rwallace subsequently argued that subjective anticipation was not ontologically fundamental, so we should not expect it to work out of the narrow confines of everyday experience, and Wei illustrated some of the difficulties inherent in "copy-delete-merge" types of reasoning.

Wei also made the point that UDT shifts the difficulty in anthropic reasoning away from probability and onto the utility function, and ata argued that neither the probabilities nor the utility function are fundamental, that it was the decisions that resulted from them that were important - after all, if two theories give the same behaviour in all cases, what grounds do we have for distinguishing them? I then noted that this argument could be extended to subjective anticipation: instead of talking about feelings of subjective anticipation, we could replace it by questions such as "would I give up a chocolate bar now for one of my copies to have two in these circumstances?"

In this post, I'll start by applying my intuitive utility/probability theory to the trilemma, to see what I would decide in these circumstance, and the problems that can result. I'll be sticking with classical situations rather than quantum, for simplicity.

So assume a (classical) lottery where I have ticket with million to one odds. The trilemma presented a lottery winning trick: set up the environment so that if ever I did win the lottery, a trillion copies of me would be created, they would experience winning the lottery, and then they will be merged/deleted down to one copy again.

So that's the problem; what's my intuition got to say about it? Now, my intuition claims there is a clear difference between my personal and my altruistic utility. Whether this is true doesn't matter, I'm just seeing whether my intuitions can be captured. I'll call the first my indexical utility ("I want chocolate bars") and the second my non-indexical utility ("I want everyone hungry to have a good meal"). I'll be neglecting the non-indexical utility, as it is not relevant to subjective anticipation.

Now, my intuitions tell me that SIA is the correct anthropic probability theory. It also tells me that having a hundred copies in the future all doing exactly the same thing is equivalent with having just one: therefore my current utility means I want to maximise the average utility of my future copies.

If I am a copy, then my intuitions tell me I want to selfishly maximise my own personal utility, even at the expense of my copies. However, if I were to be deleted, I would transfer my "interest" to my remaining copies. Hence my utility as a copy is my own personal utility, if I'm still alive in this universe, and the average of the remaining copies, if I'm not. This also means that if everyone is about to be deleted/merged, then I care about the single remaining copy that will come out of it, equally with myself.

Now I've setup my utility and probability; so what happens to my subjective anticipation in the anthropic trilemma? I'll use the chocolate bar as a unit of utility - because, as everyone knows, everybody's utility is linear in chocolate, this is just a fundamental fact about the universe.

First of all, would I give up a chocolate bar now for two to be given to one of the copies if I win the lottery? Certainly not, this loses me 1 utility and only gives me 2/million trillion in return. Would I give up a bar now for two to be given to every copy if I lose the lottery? No, this loses me 1 utility and only give me 2/million in return.

So I certainly do not anticipate winning the lottery through this trick.

Would I give up one chocolate bar now, for two chocolate bars to the future merged me if I win the lottery? No, this gives me an expected utility of -1+2/million, same as above.

So I do not anticipate having won the lottery through this trick, after merging.

Now let it be after the lottery draw, after the possible duplication, but before I know whether I've won the lottery or not. Would I give up one chocolate bar now in exchange for two for me, if I had won the lottery (assume this deal is offered to everyone)? The SIA odds say that I should; I have an expected gain of 1999/1001 ≈ 2.

So once the duplication has happened, I anticipate having won the lottery. This causes a preference reversal, as my previous version would pay to have my copies denied that choice.

Now assume that I have been told I've won the lottery, so I'm one of the trillion duplicates. Would I give up a chocolate bar for the future merged copy having two? Yes, I would, the utility gain is 2-1=1.

So once I've won the lottery, I anticipate continuing having won the lottery.

So, to put all these together:

  • I do not anticipate winning the lottery through this trick.
  • I do not anticipate having won the lottery once the trick is over.
  • However, in the middle of the trick, I anticipate having won the lottery.
  • This causes a money-pumpable preference reversal.
  • And once I've won the lottery, I anticipate continuing to have won the lottery once the trick is over.

Now, some might argue that there are subtle considerations that make my behaviour the right one, despite the seeming contradictions. I'd rather say - especially seeing the money-pump - that my intuitions are wrong, very wrong, terminally wrong, just as non-utilitarian decision theories are.

However, what I started with was a perfectly respectable utility function. So we will need to add other consideration if we want to get an improved consistent system. Tomorrow, I'll be looking at some of the axioms and assumptions one could use to get one.


10 comments, sorted by Highlighting new comments since Today at 10:35 AM
New Comment

I'm pretty certain at this point that most of the confusion results from mixing up probability and subjective anticipation. Subjective anticipation is a loose heuristic, while probability is much more clearly an element of (updateless) normative decision criterion. In particular, observations update anticipation, not probability.

Sometimes, anticipation conflicts with probability, and instead of seeing them as opposed, it's better to accept that anticipation is indeed so and so, and if you were treating it as probability, you would make such and such incorrect decisions, while probability is different, and gives the correct decisions.

So you anticipate something other than your anticipation? That seems like a definition went wrong somewhere.

So once the duplication has happened, I anticipate having won the lottery. This causes a preference reversal, as my previous version would pay to have my copies denied that choice.....

....This causes a money-pumpable preference reversal.

I'm not sure this is really a preference reversal, at least as I understand them. Your terminal values have remained the same, what's changed is the knowledge you have about how effectively you'll be able to implement them.

To make my point clearer let's look at a situation that does not involve copies:

Omega approaches you and gives you a few pieces of information:

  1. One hour from now Omega's cousin, Beyonder is going to offer you a trade. If you give him one chocolate bar now, he'll give you five chocolate bars two hours from then.

  2. In ten minutes you will have a nervous breakdown. This breakdown will have two important effects. First, it will give you temporary amnesia of the previous ten minutes, causing you to forget this entire conversation with Omega. Second, it will cause you to mistakenly believe the odds of your dying in less than an hour are close to .9999. Omega helpfully informs you that he estimates your real probability of dying in less than an hour to be .00000001.

  3. Fortunately, this nervous breakdown will only last an hour. After that you will recover your memory, and have a much more accurate probability estimate of your longevity.

Now, this is obviously bad news. Besides the psychological trauma this breakdown will cause, you'll miss out on a productive trade with Beyonder! After all, since you'll believe that you are almost certain to die in less than an hour you'll certainly be unwilling to give Beyonder a chocolate bar now, in order to receive five after you die!

Fortunately, Omega has a helpful suggestion. If you give him two chocolate bars right now, he'll give one to Beyonder on your behalf (the other bar is to cover administrative fees). You should agree to this. You'll still end up three chocolate bars in the black.

As with the copy lottery, this creates a preference reversal with a money pump. You end up having to give Omega a chocolate bar to deprive your future self of a choice. But I can't think of any way to get out of this money pump. You could develop some sort of axiomatic utility function that would get your crazy self to still make the trade with Beyonder, but that same function would likely cause you to act insane in normal scenarios.

I think all we can do is simply accept that in scenarios where our knowledge about the relevant facts is lost (such as amnesia) we will sometimes make choices our past selves will we wouldn't, because we lost a valuable piece of knowledge that we needed to make the right choice. The copy-lottery is a particularly weird example of such a scenario (we lose the knowledge of whether we're the copy that's going to survive or not), but it doesn't differ from amnesia scenarios in any important decision theoretic way.

This might result in us being money pumped in knowledge-loss scenarios, but these aren't infinite money pumps we're talking about, like in the case of a genuine inconsistency in terminal values. Omega's maxes out at four chocolate bars. I'm leery about the conclusion you drew with the axiomatic approach, so I'm willing to accept limited money pumps in amnesia scenarios.

I will suggest that at least part of the problem is your utility function: defining it in terms of the average utility of your future copies means you should spend all your money on lottery tickets and then commit suicide in all the branches where you didn't win. The operation of dividing the total utility by the number of copies is the problematic part, and in my opinion entirely without justification.

(Granted we might be able to use non-indexical utility to patch this particular failure mode, but I put it to you that even for someone without any current close relationships, quantum suicide is still a bad enough idea to serve as a reductio ad absurdum.)

I'd actually defined the utility of a universe as the average utility of my copies therein, so, depending on what utility I give to universes empty of copies of me, the suicide idea might not be a good one to follow.

But I freely admit my utility is sub-par, and doesn't work. But the more severe problem is not that it leads to crazy answers; the problem is that it allows money pumping. Crazy answers can be patched, but vulnerabilities to constantly losing utility is a great hole in the theory.

I'd actually defined the utility of a universe as the average utility of my copies therein

Why not the sum?

Because I get to say what my intuitions are :-) And they are that if I have a hundred copies all getting exactly the same nice experience, then this is just one positive experience. However, in my next post, I don't get to define my utility based on my intuitions, I have to follow some assumptions, and so I don't end up in the same place...

Right, the fact that dividing by the number of copies of you sometimes gives division by zero is another good reason for not doing it :-)

But your assessment of problem importance is interesting. I would've said it the other way around. In practice, we tend to quickly notice when we are being money pumped, and apply a patch on the fly, so at worst we only end up losing a bit of money, which is recoverable. Crazy policies on the other hand... usually do little damage because we compartmentalize, but when we fail to compartmentalize, the resulting loss may not be recoverable.

This is a valid point for humans (who are not utilitarians at all). but when thinking of an ideal, AI-safe ethics, money pumps are a great falw: because the AI will get pumped again, again, and again, and lose utility. Alternately, the AI will patch its system on the fly; and if we don't know how it does this, it could end up with a crazy policy - its unpredictable.

Maybe; honestly, nobody knows yet. We're still too far from being able to build an AI for which ethics would be a relevant concept, to be able to say what such an ethics should look like. For all we really know, perhaps if and when we get to that point, it might become apparent that a system of supervisor modules to perform on-the-fly patching to reliably keep things within sensible bounds is a better solution than trying for logically perfect ethics, for any mind operating under physically realistic constraints of data and computing power.