# 12

Eliezer Yudkowsky wrote that Robin Hanson solved the Pascal's mugging thought experiment:

Robin Hanson has suggested penalizing the prior probability of hypotheses which argue that we are in a surprisingly unique position to affect large numbers of other people who cannot symmetrically affect us. Since only one in 3^^^^3 people can be in a unique position to ordain the existence of at least 3^^^^3 other people who are not symmetrically in such a situation themselves, the prior probability would be penalized by a factor on the same order as the utility.

I don't quite get it, is there a post that discusses this solution in more detail?

To be more specific, if a stranger approached me, offering a deal saying, "I am the creator of the Matrix. If you fall on your knees, praise me and kiss my feet, I'll use my magic powers from outside the Matrix to run a Turing machine that simulates 3^^^^3 copies of you having their coherent extrapolated volition satisfied maximally for 3^^^^3 years." Why exactly would I penalize this offer by the amount of copies being offered to be simulated? I thought the whole point was that the utility, of having 3^^^^3 copies of myself experiencing maximal happiness, does outweigh the low probability of it actually happening and the disuility of doing what the stranger asks for?

I would love to see this problem being discussed again and read about the current state of knowledge.

I am especially interested in the following questions:

• Is the Pascal's mugging thought experiment a "reduction to the absurd" of Bayes’ Theorem in combination with the expected utility formula and Solomonoff induction?1
• Could the "mugger" be our own imagination?2
• At what point does an expected utility calculation resemble a Pascal's mugging scenario and should consequently be ignored?3

1 If you calculate the expected utility of various outcomes you imagine impossible alternative actions. The alternatives are impossible because you already precommited to choosing the outcome with the largest expected utility. Problems: 1.) You swap your complex values for a certain terminal goal with the highest expected utility, indeed your instrumental and terminal goals converge to become the expected utility formula. 2.) Your decision-making is eventually dominated by extremely small probabilities of obtaining vast utility.

2 Insignificant inferences might exhibit hyperbolic growth in utility: 1.) There is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome. 2.) The extrapolation of counterfactual alternatives is unbounded, logical implications can reach out indefinitely without ever requiring new empirical evidence.

3 Extrapolations work and often are the best we can do. But since there are problems like 'Pascal's Mugging', that we perceive to be undesirable and that lead to an infinite hunt for ever larger expected utility, I think it is reasonable to ask for some upper and lower bounds regarding the use and scope of certain heuristics. We agree that we are not going to stop pursuing whatever terminal goal we have chosen just because someone promises us even more utility if we do what that agent wants. We might also agree that we are not going to stop loving our girlfriend just because there are many people who do not approve our relationship and who together would experience more happiness if we divorced than the combined happiness of us and our girlfriend being married. Therefore we already informally established some upper and lower bounds. But when do we start to take our heuristics seriously and do whatever they prove to be the optimal decision?

# 12

New Comment

The idea behind Pascal's mugging is that the complexity penalty to a theory of the form "A person's decision will have an affect on the well being of N people" grows assymptotically slower than N. So given weak evidence (like a verbal claim) that narrows down the decision and effect to something specific, and large enough N, the hugeness of the expected payoff will overcome the smallness of the probability.

Hanson's idea is that given that "A person's decision will have an affect on the well being of N people" the prior probability that you, and not someone else, is the person who gets to make that decision is 1/N. This gets multiplied by the complexity penalty, and we have the probability of payoff getting smaller faster than the payoff gets bigger.

This is all very convenient for us humans, who value things similar to us. If a paperclip maximizing AGI faced a Pascal's Mugging, with the payoff in non-agenty paperclips, it would assign a much higher probability that it, and not one of the paperclips, makes the crucial decision. (And an FAI that cares about humans faces a less extreme version of that problem.)

Yeah, that was my immediate line of thought too, but... I've never seen Eliezer being that blind in his area of expertise. Maybe he sees more to the asymmetry than just the anthropic considerations? Obviously Robin's solution doesn't work for Pascal's mugging in the general case where an FAI would actually encounter it, and yet Eliezer claimed Robin solved an FAI problem. (?!) (And even in the human case it's silly to assume that the anthropic/symmetry-maintaining update should correlate exactly with how big a number the prankster can think up, and even if it does it's not obvious that such anthropic/symmetry-maintaining updates are decision theoretically sane in the first place.) Aghhh. Something is wrong.

Aghhh. Something is wrong.

I know I'd like to see Eliezer retract the endorsement of the idea. Seems to be very reckless thinking!

I think I remember some very quiet crossover point between like 2007 and 2008 when Eliezer switched from saying 'the infinite utilities cancel each other out' to 'why do you think a superintelligence would use a policy like your artificially neat approximation where all the probabilities just happen to cancel each other out nicely, instead of, say, actually trying to do the math and ending up with tiny differences that nonetheless swamp the calculation?' with respect to some kind of Pascalian problem or class of Pascalian problems. This was in OB post comment threads. That's kind of like an implicit retraction of endorsement of fuzzy Pascalian 'solutions' (if I'm actually remembering it correctly) but admittedly it's not, like, an actual retraction.

I still think I might be missing some detail or intuition that Eliezer isn't missing that could be charitably extracted from Robin's argument... but yeah, if I had to bet I'd say it was a (hopefully rare) slip of the brain on Eliezer's part, and if so it'd be nice to get a clarifying comment from Eliezer, even if I'm not sure it's at all (socially) reasonable to expect one.

even if I'm not sure it's at all (socially) reasonable to expect one.

There is no social debt to be paid in humble recompense. Rather it would be useful to have some form of signal that Eliezer's current thinking is not broken in areas that are somewhat important.

Hanson's idea is that given that "A person's decision will have an affect on the well being of N people" the prior probability that you, and not someone else, is the person who gets to make that decision is 1/N.

What is the crucial difference between being 1 distinct person, of N people making N distinct decisions, and being 1 of N distinct people? In other words, why would the ability to make a decision, that is inaccessible to other decision makers, penalize the prior probability of its realization more than any other feature of distinct world-state?

I will probably have to grasp anthropic reasoning first. I am just a bit confused that if only 1 of N people faces a certain choice it becomes 1/N times more unlikely to be factual.

I am just a bit confused that if only 1 of N people faces a certain choice it becomes 1/N times more unlikely to be factual.

That only 1 of N people face the choice doesn't make it less likely that the choice exists, it makes less likely the conjunction of the choice existing, and that you are the one that makes the choice.

[-][anonymous]11y 0

and that you are the one that makes the choice.

Each of the people in question could be claimed (by the mugger) to be making this exact same choice.

To be more specific, if a stranger approached me, offering a deal saying, "I am the creator of the Matrix. If you fall on your knees, praise me and kiss my feet, I'll use my magic powers from outside the Matrix to run a Turing machine that simulates 3^^^^3 copies of you having their coherent extrapolated volition satisfied maximally for 3^^^^3 years." Why exactly would I penalize this offer by the amount of copies being offered to be simulated?

There will be a penalty, and a large one, but it isn't going to be directly dependent on the number of people. Compare the following:

A: There is an avatar from outside the Matrix who is able and possibly willing to simulate 3^^^^3 copies of you.
B: There is an avatar from outside the Matrix who is able and possibly willing to simulate BusyBeaver(3^^^^3) copies of you.

P(A) is greater than P(B). But it is not even remotely BusyBeaver(3^^^^3)/3^^^^3 times greater than P(B). Most of the improbability is in there ridiculous out of the matrix trickster, not the extent of his trickery. If it is lack symmetry that is taken to be the loophole it is easy enough to modify the scenario to make things symmetrical. Trickster: "I'll simulate this universe up to now a gazillion times and after now the sims get lots of utility". Concern about how surprising it is for you to be one with the power becomes irrelevant.

Is the Pascal's mugging thought experiment a "reduction to the absurd" of Bayes’ Theorem in combination with the expected utility formula and Solomonoff induction?

I consider expected utility maximisation to be a preference. A fairly sane sounding preference but still just as 'arbitrary' as whether you prefer to have mass orgasms or eat babies. (The 'expected' part, that is.)

I think the main problem with Hanson's solution is that it relies on the coincidence that the objects with moral worth are the same as the objects that can make decisions. If glargs were some object with moral worth but not the ability to make decisions, then finding yourself able to determine the fate of 3^^^3 glargs wouldn't be unusually asymmetric, and so wouldn't have a probability penalty.

If glargs were some object with moral worth but not the ability to make decisions, then finding yourself able to determine the fate of 3^^^3 glargs wouldn't be unusually asymmetric, and so wouldn't have a probability penalty.

There's more to the asymmetry than just the anthropic considerations though. Like, having that much influence over anything you actually care about is super improbable unless you've already pumped a lot of optimization into the world.

It is perhaps worth noting that this happens a lot in real life and yet people don't seem to have any problem ignoring the asymmetry. Specifically I am thinking about people who think that God has chosen them and only them for some sacred and very important task, and similar delusions. Even ignoring trying to explain why you would find yourself as God's chosen prodigy, which is itself pretty weird, how do you explain why any person roughly as ordinary as you would be singled out in the first place?

Even so, the kicker remains that someone like Eliezer who's trying to take over I mean optimize the universe would still seem to need to grapple with the lingering anthropic improbabilities even if he could justify his apparently-naively asymmetrical expectation of future world optimization by pointing at the causal chain stemming from his past optimization, 'cuz he would also have to explain how he ended up as himself in such a seemingly promising strategic position in the first place. But I don't know who actually tries to reason via anthropics qua anthropics these days, instead of daringly adopting some garbled bastardization of UDT-like reasoning, which I think lets you neatly dissolve what looks like mysterious double counting.

Hm... I've been thinking about it for a while (http://www.gwern.net/mugging) and still don't have any really satisfactory argument.

Now, it's obvious that you don't want to scale any prior probability more or less than the reward because either way involves you in difficulties. But do we have any non-ad hoc reason to specifically scale the size of the reward by 1/n, to regard the complexity as equal to the size?

I think there may be a Kolmogorov complexity argument that size does equal complexity, and welcome anyone capable of formally dealing with it. My intuition goes: Kolmogorov complexity is about lengths of strings in some language which map onto outputs. By the pigeonhole principle, not all outputs have short inputs. But various languages will favor different outputs with the scarce short inputs, much like a picture compressor won't do so hot on music files. 3^^^^3 has a short representation in the usual mathematical language, but that language is just one of indefinitely many possible mathematical languages; there are languages where 3^^^^3 has a very long representation, one 3^^^^3 long even, or longer still! Just like there are inputs to `zip` which compress very well and inputs which aren't so great.

Now, given that our particular language favors 3^^^^3 with such a short encoding rather than many other numbers relatively nearby to 3^^^^3, doesn't Pascal's mugging start looking like an issue with our language favoring certain numbers? It's not necessary to favor 3^^^^3, other languages don't favor it or actively disfavor it. Those other languages would seem to be as valid and logically true as the current one, just the proofs or programs may be longer and written differently of course.

So, what do you get when you abstract away from the particular outputs favored by a language? When you add together and average the languages with long encodings and the ones with short encodings for 3^^^^3? What's the length of the average program for 3^^^^3? I suggest it's 3^^^^3, with every language that gives it a shorter encoding counterbalanced by a language with an exactly longer encoding.

What's the length of the average program for 3^^^^3? I suggest it's 3^^^^3, with every language that gives it a shorter encoding counterbalanced by a language with an exactly longer encoding.

For a sufficiently crazy set of languages you could make this true for 3^^^^3, but in general what's simple in one language is still fairly simple elsewhere. If 3+3 takes b bits to describe in language A it takes b+c bits in language B where c is the length of the shortest interpreter for language B in language A (edit: or less :P).

So, I only have a mystical folk understanding of algorithmic probability theory and thus can't formally deal with it, but it seems to be a practical problem that you can't look at all possible languages or average them to get a meaningful ultimate low-level language, and currently it seems that the only way to pick out a set of languages as the relevant ones is to look at your environment, which I think isn't a formal concept yet, and if it was implementable it seems you'd still end up with a ton of continuously-diverging languages where '3^^^3' had lower Kolmogorov complexity than, say, '1208567128390567103857.42 times the number of nations officially recognized by the the government of France, squared'. It would also be weird to expect 'googol minus 17' to have lower Kolmogorov complexity across basically any set of languages than 'googol'. If I horribly misinterpreted your argument somewhere along the way I apologize.

Well, it may be a practical problem, but it doesn't bother me that this 'averaged' complexity language is uncomputable - most things to do with KC are uncomputable, after all. The question is what justification can we come up with for a proportional prior.

currently it seems that the only way to pick out a set of languages as the relevant ones is to look at your environment, which I think isn't a formal concept yet

We can apparently talk sensibly about different languages and different Turing machines. The language for complexity may be a language underneath whatever the real Turing machine is, and corresponding to starting states of the Turing machine - much like dovetailing runs all possible programs. This is probably one of the things that any formalization would have to grapple with.

you'd still end up with a ton of continuously-diverging languages where '3^^^3' had lower Kolmogorov complexity than, say, '1208567128390567103857.42 times the number of nations officially recognized by the the government of France, squared'.

I don't follow.

It would also be weird to expect 'googol minus 17' to have lower Kolmogorov complexity across basically any set of languages than 'googol'.

I don't see any weirdness about any number smaller than googol having a smaller complexity; that's exactly the property that stops a mugging, the monotonically increasing complexity.

Why would it be weird? I think we have a use-mention problem here. You are confusing A̡͊͠͝ with an fairly complicated procedure which generates the number A̡͊͠͝ using an offset from a random but even larger number you have apparently blessed with the short representation 'googol'. (Why you would favor this 'googol' I have no idea, when A̡͊͠͝ is so much more useful in my idiosyncratic branch of the multiverse.)

Perhaps I should ask a clarifying question. These symbols you're interested in finding the Kolmogorov complexity of across an infinite set of languages. Are they real numbers? Are they real numbers that would be interpreted from their contexts as representing differential reward (however that is objectively defined)? Are they numbers-in-general? Are they quantitative or qualitative symbols in general? Are they any symbols quantitative or qualitative that can be interpreted from their contexts as representing differential reward (however that is objectively defined)? We may be talking about different things.

They would be finite binary strings, I think. Anything else, and I'm not sure how to apply pigeonhole.

Right, but you can convert any of the above into (approximate) finite binary strings by interpreting from their contexts how they would have been represented as such, no? I guess why I'm asking is that I expect a lot of number-like-things to show up a lot across the multiverse, and that some of these numbers are going to be mathematically more interesting than others, just how Graham's number might be discovered by alien mathematicians but neither we nor aliens care about the number 315167427357825136347, which is basically infinitely smaller. An infinite ensemble is infinite but we expect math to be very similar in large swaths of it at least. So my intuitions are reasonably confident that your proposal would only work if we're limiting our search to some set of quantities that doesn't include the very skewing mathematically interesting ones (like 'infinity').

I guess why I'm asking is that I expect a lot of number-like-things to show up a lot across the multiverse, and that some of these numbers are going to be mathematically more interesting than others, just how Graham's number might be discovered by alien mathematicians but neither we nor aliens care about the number 315167427357825136347, which is basically infinitely smaller. An infinite ensemble is infinite but we expect math to be very similar in large swaths of it at least.

What's the smallest un-interesting number? But isn't that a rather interesting number...

Graham's number may be interesting to us and aliens a lot like us, but so what? I doubt it's interesting over all or even most of, say, a Tegmark level IV multiverse.

So my intuitions are reasonably confident that your proposal would only work if we're limiting our search to some set of quantities that doesn't include the very skewing mathematically interesting ones (like 'infinity').

Limiting your search is, I think, exactly why our current abstractions are a bad base for a universal prior like what is being discussed in the Mugging.

As you hint at, any damn decision you could possibly make has at least a 1/3^^^3 chance of causing at least 3^^^3 utility (or -3^^^3 utility). Talking about muggers and simulations and anthropics and stuff is just a distraction from the main problem, which is that as far as I know we don't yet have a very good justification for pretending that there's any particular straightforward symmetry-enforcing (meta-)rule keeping utility from swamping probabilities.

Unrelated observation: That we don't seem to have a problem with probabilities swamping utility might mean that the fact that utility is additive where probabilities are multiplicative is messing with our intuitions somehow.

The solution is flawed. What if he instead says "I will give a single person 3^^^3 utilons of pain"?

Anyway, the idea is that the probability of that situation can't go above 3^^^3 to one against. You'd be one of the people he creates, not the one he's asking.

XiXiDu, I'd like to point out how positively surprised I am at what I personally see as a sharp upturn in the general quality of your thoughts, comments, and posts since I've been paying attention to them. In some somewhat low-level part of my brain you've gone from "possibly well-intentioned person who unfortunately cannot make a coherent argument" (sorry!) to "person whose posts and ideas often make me pause and think". That doesn't sound like much but given human psychology it probably is.

"I am the creator of the Matrix. If you fall on your knees, praise me and kiss my feet, I'll use my magic powers from outside the Matrix to run a Turing machine that simulates 3^^^^3 copies of you having their coherent extrapolated volition satisfied maximally for 3^^^^3 years."

Relevant to my decision: I don't assign all that much utility to having 3^^^^3 copies of me satisfied. In fact, the utility I assign to 3^^^^3 copies of me being satisfied is, as far as I can tell, less than double the utility I assign to two copies of me being similarly uber-satisfied.

I am actually finding it difficult to come up with a serious Pascal's Offer scenario. As far as I know my utility function doesn't even reach the sort of extremes necessary to make such an offer sound tempting.

As far as I know my utility function doesn't even reach the sort of extremes necessary to make such an offer sound tempting.

I'll have your lottery tickets, then!

I'll have your lottery tickets, then!

If you can offer me something that gives me greater utility than my lottery tickets then I will naturally be willing to trade. This should not be surprising.

The problem with Pascal's Mugging is that in a bounded rationality situation, you cannot possibly have a probability on the order of 1/(3^^^^3) - it's incomprehensibly low. Thus, a naive Bayesian expected utility maximizer might jump at this offer every time it's offered despite it never being true - allowing one to get it to do anything for you, instead of whatever it was supposed to be maximizing.

It occurs to me that this problem shouldn't crop up - you could make the effective limits on calculated utility be the same as the effective limits on calculated probability (and MAX = 1/MIN) - thus VERY HIGH utility times VERY LOW probability = MAX*MIN = 1, which is probably lower than the expected utility for whatever it was doing in the first place.

I'll have to go re-read the original discussion.

ETA: Oh right, the whole point was that there needs to be some justification for the symmetrically-low probability, because it is not obvious from the structure of the problem like in Pascal's Wager. Duh.

To be more specific, if a stranger approached me, offering a deal saying, "I am the creator of the Matrix. If you fall on your knees, praise me and kiss my feet, I'll use my magic powers from outside the Matrix to run a Turing machine that simulates 3^^^^3 copies of you having their coherent extrapolated volition satisfied maximally for 3^^^^3 years." Why exactly would I penalize this offer by the amount of copies being offered to be simulated?

Essentially, you just call BS on the promise of huge utility.

Where is the huge utility?!? Show me the huge utility!!!

Essentially, you call BS on the promise of huge utility. Where is the huge utility? Show me the huge utility!

Define 'huge utility'? Rain recently offered everyone on Less Wrong a deal, "For every non-duplicate comment replying to this one praising me for my right action, I will donate \$10 to SIAI [...]." This isn't different from the deal I proposed, only insofar as it is much more likely that the dealer will deliver. But the uncertainty in my example is being outweighed by the possibility of a huge payoff.

Where do you draw the line, what are the upper and lower bounds? That is the essential question.

I don't mean to imply a threshold between huge utilities and other kinds - just that the larger the utilities promised you by strangers requesting up-front errands, the more evidence you shoud look for - to ensure that you are not being taken for a ride.

The whole "extraordinary claims require extraordinary evidence" bit.