Resolving the Dr Evil Problem

by Chris_Leong2 min read10th Jun 20188 comments



Deep in Dr Evil's impregnable fortress (paraphrased):

Dr Evil is just about to complete his evil plan of destroying the Earth, when he receives a message from the Philosophy Defence Force on Mars. They have created a clone in the exact same subjective situation Dr Evil now occupies; he believes he is Dr Evil and is currently in a recreation of the fortress. If the clone of Dr Evil tries to destroy the Earth, they will torture him, otherwise they will treat him well. Dr Evil wants to destroy the Earth, but he would prefer to avoid being tortured much, much more and he is now uncertain about whether he should surrender or not. Should Dr Evil surrender?

The paper then concludes:

I conclude that Dr. Evil ought to surrender. I am not entirely comfortable with that conclusion. For if INDIFFERENCE is right, then Dr. Evil could have protected himself against the PDF’s plan by (in advance) installing hundreds of brains in vats in his battlestation—each brain in a subjective state matching his own, and each subject to torture if it should ever surrender

This article will address two areas of this problem:

  • Firstly, it argue that Dr Evil should surrender, however focusing on a different path, particularly what it means to "know" and how this is a leaky abstraction
  • Secondly, it will argue that hundreds of brains in a vat would indeed secure him against this kind of blackmail

I'll note that this problem is closely related to The AI that Boxes You. Several people noted there that you could avoid blackmail by pre-committing to reboot the AI if it tried to threaten you, although my interest is in the rational behaviour of an agent who has failed to pre-commit.

Why Dr Evil Should Surrender

I think there's a framing effect in the question that is quite misleading. Regardless of whether we say, "You are Dr Evil. What do you do?" or "What should Dr Evil do?" we are assuming that you or a third-party Dr Evil knows that they are Dr Evil? However that's an assumption that needs to be questioned.

If you knew that you were Dr Evil, then knowing that a clone has been created and placed in a situation that appears similar wouldn't change what you should do if you don't care about the clone. However, you strictly can't ever actually know that you are Dr Evil. All you can know is that you have memories of being Dr Evil and you appear to be in Dr Evil's fortress (technically we could doubt this too. Maybe Dr Evil doesn't actually exist, but we don't have to go that far to prove our point).

Before this event, you would have placed the probability of you being Dr Evil really high as you had no reason to believe that you might have been a clone or in a simulation. After you receive the message, you have to rate the chance of you being a clone much higher and this breaks the leaky abstraction that normally allows you to say that you know you are Dr Evil.

If we did say that you knew you were Dr Evil, then on receiving the message, you would somehow have to magically come to un-know something without someone erasing your memories or otherwise interfering with your brain. However, since you only know that you have memories of being Dr Evil, you haven't lost information You've actually gained it, no nothing magical is happening at all.

Why Clones Protect Against the Attack

The idea of creating clones to protect yourself against these kinds of attacks seems weird. However, I will argue that this strategy is actually effective.

I'll first note that the idea of setting up a punishment to prevent you giving in to blackmail isn't unusual at all. It's well known that if you can credibly pre-commit, there's no incentive to blackmail you. If you have a device that will torture you if you surrender, then have no incentive to surrender unless the expected harm from not surrendering exceeds the torture.

Perhaps then this issue is that you want to protect yourself by intentionally limiting messing up your beliefs about what is true? There's no reason why this shouldn't be effective. If you can self-modify yourself to disbelieve any blackmail threat, no-one can blackmail you. One way to do this would be to self-modify yourself to believe you are in a simulation that will end just before you are tortured. Alternatively, you could protect yourself by self-modifying to believe that you would be tortured worse if you accepted the blackmail. Creating the clones is just another way of achieving this, though less effective as you only believe there is a probability that you will be tortured if you surrender.


8 comments, sorted by Highlighting new comments since Today at 7:41 AM
New Comment

Push button. Tortured self will take comfort in the fact that free self is coming for the martians next.

"I am a stubborn git who would destroy the Earth and ignore the possibility of cloning, even if such an action produces negative utility for me" is just another way of saying "I have precommitted to destroying the Earth".

In a world where such a threat is credible, one's prior for being such a clone/simulation should already be pretty high. It's not clear exactly how surprising (aka how much information) the announcement by the PDF is.

If Dr. Evil is rational (but then why is he blowing up the earth?), he'll include this likelihood in his plans, presumably by deciding that it's OK to be tortured if he can meet his earth-destruction goals, or otherwise precommitting the deed in order to remove the possibility of changing his mind with this new threat.

Or to pre-empt the threat by announcing "I've made copies of all the personnel at cloning-capable agencies. If I have any reason to believe you've copied me, you're all getting tortured". Or just destroy Mars too.

I don't think the "make my own counter-clones" strategy works, either. Only if Dr. Evil can control the experiences of more copies than all of the people who want him to stop his plan does it make sense, and it pretty much guarantees that a lot of him will be tortured. If the credible message is "we've made a trillion copies of you, each of which will be tortured if they (think they) destroy the earth)", his clever counter of creating a trillion and one copies to be tortured if he does NOT destroy the earth is _NOT_ a victory if he values not being tortured.

Yes, pre-committing is a good idea to reduce the incentive for people to clone you. However, this post was about the strategy if you haven't pre-committed.

Making a trillion and one copies only reduces the odds of escaping torture to less than 50%. So to really defend against this strategy, you need to be able to create much more than your opponents.

No, my point was that making a trillion and one copies does _NOT_ increase the odds of escaping torture. It makes the odds that a Dr. Evil will be tortured a complete certainty. Dr. Evil himself made copies that will be tortured if and only if other copies are not tortured. This is not a winning tactic if avoiding torture is the goal over all else.

The only way for him to fully prevent torture is to destroy the capability of others to torture him. He _can_ reduce the probability by cooperating with the extortionists, but who's to say they won't torture him anyway for his name (he is Evil, after all), or for some future plan they don't like. The only way for him to carry out his plans is to decide that torture is worth suffering for them.

We're assuming here that Dr Evil doesn't care about his clones. The only reason why his clones being tortured is a threat is because due to his lack of information, he doesn't know his clone is about to be tortured or if he is the clone.

Some people might object and say that the clones are him. I'm very skeptical of those kinds of arguments, but beyond that we can get around this issue. The clones don't actually have to be completely accurate clones of his mental state; just beings with a good-enough backstory and set of implanted memories so that the game isn't given away. One of the clones could have a completely different set of childhood memories, for example, as long as any inconsistencies were smoothed over. Therefore, even if we say that perfect clones of a person are the person, these clones could be different enough to not count without ruining the thought experiment.

Now I'm confused. I'd expected when you described the scenario that it's obvious that the "real" Dr. Evil cannot tell whether he's the real one or a clone. So his clones _are_ definitely him. If he doesn't care about clones, even if he knows that some version(s) of him are the clone, then he just presses the button, right?

Or are you saying that only the cloned versions of him care about the clones? I don't think that's coherent.

"If he doesn't care about clones, even if he knows that some version(s) of him are the clone, then he just presses the button, right?" - No, because if it turns out that he is actually a clone, he will be tortured.

"Or are you saying that only the cloned versions of him care about the clones?" - Yes, Dr Evil only cares about Dr Evil, Dr Evil Clone #1 only cares about Dr Evil Clone #1, Dr Evil Clone #2 only cares about Dr Evil Clone #2 ect. Why isn't that coherent?

"So his clones _are_ definitely him" - Why? The clones could have completely different backstories from the original Dr Evil, just so long that they (the clones themselves) can't tell. That suffices to create the desired ambiguity.