Today's post, Pascal's Mugging: Tiny Probabilities of Vast Utilities was originally published on 19 October 2007. A summary (taken from the LW wiki):

 

An Artificial Intelligence coded using Solmonoff Induction would be vulnerable to Pascal's Mugging. How should we, or an AI, handle situations in which it is very unlikely that a proposition is true, but if the proposition is true, it has more moral weight than anything else we can imagine?


Discuss the post here (rather than in the comments to the original post).

This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was "Can't Say No" Spending, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.

Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.

New Comment
26 comments, sorted by Click to highlight new comments since: Today at 2:40 PM

I have an idea for how this problem could be approached:

any sufficiently powerful being with any random utility-function may or may not exist. It is perfectly possible that our reality is actually overseen by a god that rewards and punishes us for whether we say an even or odd number of words in our life or something equally arbitrary. The likelyhood of the existence of each of these possible beings can be approximated using Solomonoff induction.

I assume that most simulations run by such hypothetical beings wherein we could find ourselves in such a situation would be run by beings who either (1) have no interest in us at all (in which case the Mugger would most likely be a human), (2) are interested in an entirely unpredictable thing resulting from their alien culture, or (3) are interacting with us purely to run social experiments. After all, they would have nothing in common with us and we would have nothing they could possibly want. It would therefore, in any case, be virtually impossible to guess at their possible motivations, as it would be a poorly run social experiment if we could (assuming option three is true).

I would now argue that the existence of Pascal's Mugger does not influence the probability of the existence of a being that would react negatively (for us) to us not giving the 5$ anymore than it influences the probability of the existence of a being with an opposite motivation. The Mugger is equally likely to punish you for being so gullible as he is to punish you for not giving money to someone who threatens you.

Of course none of this takes into consideration how likely the various possible beings are to actually carry out their threat, but that doesn't change anything important about this argument, I think.

In essence, my argument is that such powerful hypothetical beings can be ignored because we have no real reason to assume they have a certain motivation rather than the opposite. Giving the Mugger 5$ is just as likely to save us as shooting the Mugger in the face is. Incidentally, adopting the later strategy will greatly reduce the chance that somebody actually tries to do this.

I realize that this argument seems kind of flawed because it assumes that it really is impossible to guess at the being's motivation but I can't see how this could be done. It's always possible that the being just wants you to think that it wants x, after all. Who can tell what might motivate a mind that is large enough to simulate our universe?

It would therefore, in any case, be virtually impossible to guess at their possible motivations

Virtually impossible is not the same as actually impossible. It's not a question of being 50.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000001% sure. If you're 50+1/sqrt(3^^^^3)% sure, that's enough to dominate your decision. You can't be that unsure.

[-][anonymous]13y00

An AI should treat a Pascal's Mugger as an agent trying to arbitrarily gain access to it's root systems without proper authority, or phrased more simply, an attack.

To explain why, consider this statement in the original article:

But a silicon chip does not look over the code fed to it, assess it for reasonableness, and correct it if not. An AI is not given its code like a human servant given instructions. An AI is its code. What if a philosopher tries Pascal's Mugging on the AI for a joke, and the tiny probabilities of 3^^^^3 lives being at stake, override everything else in the AI's calculations? What is the mere Earth at stake, compared to a tiny probability of 3^^^^3 lives?

If something is allowed to override EVERYTHING on a computer, it seems functionally identical to saying that it has root access.

Since Pascal's Mugging is commonly known and discussed on the internet, having it be equivalent to a root password would be a substantial security hole, like setting your root password to "password"

An AI would presumably have to have some procedure in case someone was attempting unauthorized access. That procedure would need to trigger FIRST, before considering the argument on the merits. Once that procedure is triggered, there argument is no longer being considered on the merits, it is being considered as an attack. Saying "Well, but what if there REALLY ARE 3^^^^3 lives at stake?" seems to be equivalent to saying "Well, but what if the prince of Nigeria REALLY IS trying to give me 1 million dollars according to the email in my spam box?"

There's probably something that I'm missing, so sorry if this solution has already been posted in the original thread. I don't really have the "oomph" the read them all... Anyway, hasn't this class of problems already been solved in chapter 5 of Jaynes' book?

If the AI has some tiny probability that the data he has received originated through some kind of deception, I think it's only sensible that the hypothesis that the mugger is lying steals all the probability mass in the posterior distribution, at least linearly with the number of people he claims he can affect (but I would say exponentially).

The expected utility shouldn't really be calculated on the posterior of the hypothesis "mugger possess magical power" but on the posterior of "mugger can affect the Matrix + mugger is lying".

ETA: This allows you to control the posterior probability of the hypothesis independently from the claim of the mugger, thereby shielding the AI from acting on enormous disutility depending only from slightly less enormous improbability.

Will someone who downvoted explain what's wrong with this solution? Feedback is much appreciated, thanks :)

[-][anonymous]13y00

I did not downvote you, but I suspect it might not be the solution and might be the opening statement, most likely the bolded section.

There's probably something that I'm missing, so sorry if this solution has already been posted in the original thread. I don't really have the "oomph" the read them all... Anyway, hasn't this class of problems already been solved in chapter 5 of Jaynes' book?

I would guess that someone wanted you to have taken the time to read the previous thread.

Has Eliezer come up with a solution to this?

Why isn't something like this the answer?

The statement "Do X or I will cause maximum badness according to your desires by using magic powers," is so unlikely to be true that I don't know how one can justify being confident that the being uttering the statement would be more likely to do as it says than to do the opposite - if you give the being five dollars as it asked, it creates and painfully kills 3^^^^3 people, if you do not, nothing happens (when it had asked for five dollars as payment for not creating and torturing people).

How can you say that a magic being that either cares about your money or is obviously testing you would likely do as it said it would?

If one attempts to do calculations taking all permutations of Pascal's mugging into account, one gets ∞ − ∞ as the result of all one's expected utility calculations.

What are the consequences of that?

We have no idea how to do expected utility calculations in these kind of situations. Furthermore, even if the AI figured out some way, e.g., using some form of renormalization, we have to reason to believe the result would at all resemble our preferences.

I was hoping for Eliezer's answer. If you have an answer, I'd advise posting it separately.

As for your answer, suppose it's more likely that he'll torture 3^^^^3 people if you give him the money. Now you can't give him the money. Now he's just Pascal mugging you into not giving him money. It's the same principle.

Also, the same principle could be done in infinitely many ways. I'm sure there's some way in which it can make the correct choice to be one you wouldn't have done.

It's the same principle.

It's not at all the same. This is not a problem invoking Omega. If you want that go to the lifespan dilemma.

If we know Omega has 3^^^^3 sided dice and will kill the people if it lands on the one, then I'd shut up and calculate.

Pascal's wager involves much more uncertainty than that. It involves uncertainty about the character speaking. Once a being is claiming it has magic and wants you to do something, to the extent one believes the magic part, one loses one's base of reference to judge the being as truthful, non-whimsical, etc.

Are you arguing that he's more likely to torture them if you give him the money, that the probabilities are the same to within one part in 3^^^^3, or that since it's not a dice, probability works fundamentally differently?

My response was assuming the first. The second one is ridiculous, and I don't think anyone would consider that if it weren't for the bias of giving round numbers for probabilities. If it's the third one, I'd suggest reading probability is in the mind. You don't know which side the die will land on, this is no different than not knowing what kind of a person the character is.

suppose it's more likely that he'll torture 3^^^^3 people if you give him the money

That's a different problem than Pascal's Wager. Taking it back to the original, it would be like saying "Convert to Christianity pro forma for a chance at heaven rather than no chance of heaven, ignoring all other magical options." The problem with this isn't the quantities of utility involved, it's the assumption that a god who cares about such conversions to Christianity is the only option for a divine, rather than a God of Islam who would burn Christian converts hotter than atheists, or a PC Christian god who would have a heaven for all who were honest with themsleves and didn't go through pro forma conversions. The answer to the wager is that the random assumption that all forms of magic but one have less probability than that one story about magic is a dumb assumption.

It's fine to consider Pascal's Wager*, where Pascal's Wager* is under the assumption that our interlocutor is trustworthy, but that's a different problem and is well articulated as the lifespan dilemma, which is legitimately posed as a separate problem.

As probability is in the mind, when I ask "what would a magical being of infinite power be doing if it asked me for something in a context where it was disguised as a probably not magical being?" My best guess is that it is a test with small consequences, and I can't distinguish between the chances of "It's serious" and "It's a sadistic being who will do the opposite of what it said."

The problem with this isn't the quantities of utility involved, it's the assumption that a god who cares about such conversions to Christianity is the only option for a divine, rather than a God of Islam who would burn Christian converts hotter than atheists, or a PC Christian god who would have a heaven for all who were honest with themsleves and didn't go through pro forma conversions.

Each of these possibilities has some probability associated with it. Taking them all into account, what is the expected utility of being a Christian? One may ignore those to make the question simpler, but unless all the possibilities cancel out nicely, you're still going to end up with something.

The answer to the wager is that the random assumption that all forms of magic but one have less probability than that one story about magic is a dumb assumption.

Perhaps no one outweighs all the rest, but if you add them all together, they'd point in one general direction. It's so close to zero that if you tried to calculate it, you'd barely be able to do better than chance. You'd still be able to do better, though.

You'd still be able to do better, though.

I think there is a significant chance you are right, but that it is less than .5. I hope others can add to this discussion. I am reminded of this, if you tell me I am seeing an actual banana that I am holding, rather than an image my brain made of a collection of atoms, then...I don't even know anymore.

I have always failed to see what problem, and this confuses me greatly. To me it seams obvious that any sane entity would be dominated by considerations like this.

In this particular situation thou, the probability increase of possibly donating the money to SIAI and increasing the probability of a friendly singularity that can hack out of the matirx and use the computing power to create 4^^^^4 units of fun way outweighs it thou. And since the size of the threat correlates with the size of this reward it'll always increase so that this is true.

[-]XiXiDu13y-10

More here, where I ask about the "solution" put forth by Robin Hanson.

Since then the only attempted "solution" I know of is outlined in this post by Holden Karnofsky from GiveWell.

Also see this comment by Eliezer and the post he mentions. I didn't know about the Lifespan Dilemma before reading that comment. It seems to be much more worrying than Pascal's Mugging. I haven't thought about it much, but to just arbitrarily refuse some step seems to be the best heuristic so far.

There should be a limit to utility based on the pattern theory of identity, a finite number of sentient patterns, and identical patterns counting as one.

I phrased this as confidently as I did in the hopes it would provoke downvotes with attached explanations of why it is wrong. I am surprised to see it without downvotes and granting that even more surprised to see it without upvotes.

In truth I am not so certain of some of the above, and would appreciate comments. I'm asking nicely this time! Is identity about being in a pattern? Is there a limit to the number of sentient patterns? Do identical patterns count as one for moral purposes?

Finally: is it truly impossible to infinitely care about a finite thing?

Finally: is it truly impossible to infinitely care about a finite thing?

In a finite universe, I'd say it's impossible, at least from a functional standpoint and assuming an agent with a utility function. The agent can prefer a world where every other bit of matter and energy other than the cared-about thing is in its maximally dis-preferred configuration and the cared-about thing is in a minimally satisfactory configuration to a world where every other bit of matter and energy is in its maximally preferred configuration and the cared-about thing is in a nearly-but-not-quite minimally satisfactory configuration, but that's still a finite degree of difference. (It's a bit similar to how it's impossible to have 'infinite money' in practice, because once you own all the things, money is pointless.)

Finally: is it truly impossible to infinitely care about a finite thing?

Yes. Or, equivalently, you can care about it to degree 1 and care about everything else 0. Either way the question isn't a deep philosophical one, it's just what the implication of certain types of trivial utility function represent.

Unfortunately that solution works only for human reasoners, not for AIs.

Physicists deal with this issue daily, and they invented renormalization and cutoff techniques to make divergent things converge. This has been discussed before, not sure why it won't work.

I don't think physicists actually have the right answer when they do that. You can use Feynman path integrals for quantum physics, and it will get the right answer if you cheat like that, but I'd bet that it's actually more related to linear equations, which don't require cheating.

Physicists use renormalization and cutoff techniques. The universe doesn't.

Also, Pascal's mugging seems to be looking at a special case where it does converge. If you actually used Solomonoff induction, it wouldn't converge because of the possibility of this sort of thing, whether or not someone ever actually makes a threat.