There seems to be some continuing debate about whether or not it is rational to appease a Pascal Mugger. Some are saying that due to scope insensitivity and other biases, we really should just trust what decision theory + Solomonoff induction tells us. I have been thinking about this a lot and I'm at the point where I think I have something to contribute to the discussion.

Consider the Pascal Mugging "Immediately begin to work only on increasing my utility, according to my utility function 'X', from now on, or my powers from outside the matrix will make minus 3^^^^3 utilons happen to you and yours."

Any agent can commit this Pascal's mugging (PM) against any other agent, at any time. A naive decision-theoretic expected-utility optimizer will always appease the mugger. Consider what the world would be like if all intelligent beings were this kind of agent.

When you see an agent, any agent, your only strategy would be to try to PM it before it PMs you. More likely, you will PM each other simultaneously, in which case the agent which finishes the mugging first 'wins'. If you finish mugging at the same time, the mugger that uses a larger integer in its threat 'wins'. (So you'll use the most compact notation possible and things like, "minus the Busy Beaver function of Graham's number utilons".)

This may continue until every agent in the community/world/universe has been PMed. Or maybe there could be one agent, a Pascal Highlander, who manages to escape being mugged and has his utility function come to dominate...

Except, there is nothing stipulating that the mugging has to be delivered in person. With a powerful radio source, you can PM everyone in your future light-cone unfortunate enough to decode your message, potentially highjacking entire distant civilizations of decision-theory users.

Pascal's mugging doesn't have to be targeted. You can claim to be a Herald of Omega and address your mugging "to whoever receives this transmission"

Another strategy might be to build a self-replicating robot (itself too dumb to be mugged) which has a radio which broadcasts a continuous fully general PM, and send it out into space. Then you commit suicide to avoid the fate of being mugged.

Now consider a hypothetical agent which completely ignores muggers. And mugs them back.

Consider what could happen if we build an AI which is friendly in every possible respect except that it appeases PMers.

To avoid this, you might implement a heuristic that ignores PMs on account of the prior improbability of being able to decide the fate of so many utilons, as Robin Hanson suggested. But an AI using naïve expected utility + SI may well have other failure modes roughly analagous to PM that we won't think of until its too late. You might get agents to agree to pre-commit to ignore muggers, or to kill them, but to me this seems unstable. A bandaid that's not addressing the heart of the issue. I think an AI which can envision itself being PMed repeatedly by every other agent on the planet and still evaluate appeasement as the lesser evil cannot possibly be a Friendly AI, even if it has some heuristic or ad hoc patch that says it can ignore the PM.

Of course there's the possibility that we **are** in a simulation which is occasionally visited by agents from the mother universe, which really does contain 3^^^^3 utilons/people/dustspecks. I'm not convinced acknowledging this possibility changes anything. There's nothing of value that we, as simulated people, could give our Pascal Mugging simulation overlords. Their only motivation would be as absolute sadistic sociopaths, but if that's the reality of the multiverse, in the long term we're screwed no matter what we do, even with friendly AI. And we certainly wouldn't be in any way morally responsible for their actions.

Edit 1: fixed typos

This is a decent argument against appeasement in the specific Pascal's Mugging case, but I think it falls for the pattern of people being too specific in trying to solve this problem.

Pascal's Mugging is a special case of the phenomenon wherein absolute values of delta-utils are much higher than changes in probabilities. In English, you can always construct a positive expected-utility action simply by increasing the utility since the probability won't go down fast enough, because it can't.

I myself have privately postulated half a dozen 'solutions' to the specific Pascal's Mugging scenario, and I think some of them might actually work for the specific scenario, but none of them resolve the general problem of probabilities not corresponding to utilities. (And I don't want to share them, because explaining what's wrong with them with respect to the specific form of Pascal's Mugging is much more difficult than mentioning them.)

Since no one else in this thread or other threads seem to acknowledge this, I might be wrong.

I apologize if this has been brought up before, but I have a somewhat unrelated question: how much should I believe the mugger ?

If someone comes to me, and says, "do what I say or I punch a random person in the face", I'd be inclined to believe him. If he says, "...or I punch three people in the face simultaneously", I would be less likely to believe him... but still, he might be some kind of Chuck Norris or something. But if he says, "...I'll punch 3^^^3 people", this would imply that he's some sort of an uber-divine mega-god, and why should I believe that ?

In other words, the more negative utilons the agent is threatening to contribute, the lower my probability of him being able to do it becomes. Thus, the expected value should converge to zero.

The basic idea is that

"a compactly specified wager can grow in size much faster than it grows in complexity."We might be talking about different things here, I'm not sure. In his original post, Eliezer seems to be concerned with agents who evaluate beliefs

solely(or perhaps just primarily) in terms of their algorithmic complexity. Such agents would not be very smart though; at least, not instrumentally. There are lots of simple beliefs that are nonetheless wrong, f.ex. "the Earth is flat", or "atoms are perfectly circular, just like planetary orbits".I thought we were concerned with agents who evaluate beliefs by looking not only at their complexity, but also at the available evidence, and their projected effects on the world if the beliefs in question were true. Such agents would be closer to our human scientists than pure philosophers.

The problem with Pascal's Mugging, then, is not only that the initial probability of the agent's claim being true is low, but that it gets even lower with each additional person he claims to be able to punch in the face. Even if we grant that we're living in the Matrix (just for example), with each additional punch-victim, we must grant that:

Merely believing that we live in the Matrix is not enough; we must also believe that the agent has the power to do what he claims to be able to do, and with each victim he claims to be able to affect, his burden of proof grows larger and larger, and the negative expected value of his actions grows smaller and smaller.

People saying this are not aware of the latest work on the topic.

I'm not sure what "if this utility function is bounded below in absolute value by an unbounded computable function, then the expected utility of any input is undefined. This implies that a computable utility function will have convergent expected utilities iff that function is bounded" means. The utility function is only well defined within a certain range of utilities? So confronted with pascal's mugging it will just spit out an error message? So then what does the AI using that utility function actually decide? Maybe it just crashes? A Friendly AI has to do better than that. (If someone can spell out the actual implications for me that would be great.)

Well, the conclusion is a bit simpler than the rest of the argument, so I'll just explain that. Basically, if the utility function is a computable function and is unbounded, i.e. there are not upper and lower limits on the utilities of possible (given your current knowledge) states of reality, then calculating the expected utility using the Solomonoff prior gives a divergent series (you can think of it like this series, but note that the techniques used to technically assign a sum to that series even though it doesn't have one cannot work here).

Worse than that. Confronted with

anyexpected utility calculation, assuming we have a computable and unbounded utility function, there is no answer. Intuitively, you can think of this as due to the fact that every expected utility calculation includes a Pascal's mugging; even if I don't threaten you with powers from beyond the matrix, the probability that I have them anyway isn't zero.Well it's impossible to actually implement Solomonoff induction in our universe, as far as we know, so we couldn't build that AI. We do have the problem that our best current model of inference, which we would like to use as a guide to creating an AI, does not actually answer questions about expected utility, and is thus not that great a guide.

I agree. Until then, what I do in practice is notice my confusion, try to do and promote research on this problem, and then make my expected utility calculations without taking into account Pascal's mugging type possibilities (even though I don't have a perfect way of telling what counts as a Pascal's mugging and yes, there have been times when it wasn't obvious).

My brain is frozen by trying to imagine the full consequences of this.

I guess it all adds up to normality, but let me tell that this seemingly innocent normality is composed by rather scary things.

Careful plumbing the depths of that; you could end up going Rokowards or becoming Will Newsomeish.

For what it's worth there are some people who think I'm pretty cool. Up the variance!

Note that few of my seemingly-strange beliefs are due to or related to speculation about supposed implications of decision theory. Only my theology and its related "objective morality" speculation can be (partially) blamed on that. My reasons for interest in e.g. psi & UFOs are purely experiential and hermeneutical/Aumannesque.

(Warning: The rest of this comment might include some nonsense.)

The closest I've come to mixing experiential/experimental predictions and decision theory is an idea inspired by the Oxford school's derivation of the Born rule from rationality assumptions;—I speculatively postulated that there are coordination/decision problems, e.g. generalizations of Nesov's counterfactual mugging, where decision theoretic rationality suggests decision policies which differ from those suggested by a naive "linear Schroedinger equation plus Born rule" formulation of the universal wave function. This would be because our conception of probability "may be just an approximation that is only relevant in special situations which meet certain independence [e.g. non-interactivity?] assumptions around the agent's actions" (context here). Currently QM is inseparable from probability. In the proposed alternative framework, instead of conservation of probability, there would be a more general conservation of "importance measure". (FWIW I think my Leibnizian variant of theism would make the math a lot more elegant, 'cuz UDT-like "rational agents" with

instantaneously time-stampedutility functions are conspicuously complex/unparsimonious & weird to think about, physically speaking. If rationality ultimately requires certain theoretical kinds of consistency then that fact alone could greatly simplify the formalism by reducing the space of admissible decision policies.)I suppose it would also make for an offbeat attack on a few unsolved problems in physics. E.g., the preferred basis problem. Simon Saunders and David Wallace have pointed out the similarities between the choice of a preferred basis in QM and the choice of a preferred foliation in relativity: these

choicesof apreferredbasis & foliation are themselves decision problems and thus would at least theoretically be naturally representable as a coherent part of the proposed "influence"-centric (decision-policy-centric) formalism. Thus the generality of updatelessness allows us to circumvent otherwise problematic chicken-and-egg "which came first, the Born rule or Bayesian rationality" arguments.In any case I think such a re-formulation of QM would only be mathematically tractable/elegant if we had a more complete decision theory with a more coherent ontology of agency (e.g. no arbitrarily-timed arbitrarily-instantaneous time-stamping of utility functions), which is a whole 'nother tricky "FAI"-relevant research project.

For the record, I do think you're unspeakably cool, with multiple connotations included. I just think becoming more like you is something the average LWer should approach with awareness and no small measure of trepidation.

What? Since when? Probability is in the mind.

I dunno, those seem pretty different.

"QM is inseperable from probability" == "you can't do QM without probability," not "you can't do probability without QM."

Will and Vassar aren't the first to think of it. Deutsch and Wallace were doing it years ago; it's even made it into Discover Magazine.

I think your pretty cool, actually, but I also think you are nutty.

In light of khafra's comment, could you go into a bit more detail about this? I don't want to harm anyone's sanity, so I'd like to improve my models of how people handle being told such things.

Warning: A sensitive person should

notread this, to avoid nightmares.BX. Yrg'f tvir hc pbzcnegzragnyvmngvba sbe n zbzrag, naq ybbx ng guvatf va n arne zbqr. Jung'f gehr, vf nyernql gehr. Jung rknpgyl vf vg yvxr, ybbxvat vagb Pguhyuh'f znq rlrf?

Svefg nffhzcgvba, Grtznex zhygvirefr. Vs lbh qba'g funer guvf nffhzcgvba, lbh ner fcnerq bs hapbhagnoyr hafcrnxnoyr ubeebef. Bar bs gurfr nqwrpgvirf vf abg zrnaf yvgrenyyl: zngurzngvpnyyl fcrnxvat, n frg bs ovanel cebtenzf bs svavgr yratgu vf pbhagnoyr. Ba gur bgure unaq, gur "hafcrnxnoyr ubeebef" cneg vf

yvgrenyylgehr: gurer ner guvatf jr pbafvqre ubeevoyr, naq fbzr bs gurz ner fb ubeevoyr gung rira gur fubegrfg pbzcerffrq fgevat qrfpevovat gurve ubeebe vf gbb ybat gb or cebabhaprq va na nirentr uhzna yvsrgvzr. Gur bayl tbbq guvat vf gung Fbybzbabss cevbe erqhprf gur cebonovyvgl bs gur hafcrnxnoyr ubeebef, nygubhtu znalfcrnxnoyrubeebef trg fvzvyne cevbe cebonovyvgl guna gur havirefr jr pbairagvbanyyl vaunovg abj. Oevat vg gb gurarnezbqr naq vzntvar nyy gur cnva npebbff gur Grtznex zhygvirefr. Lbh yvgrenyyl pna'g: orpnhfr sbe nalguvat lbhe oenva pna vzntvar, 3^^^3 gvzrf jbefr guvat vf unccravat fbzrjurer. Oevat vg gb gurarnezbqr: vg'f gehr; vg'f gur erny cnva bs rirel xvaq.Frpbaq nffhzcgvba, ngbzf unir ab vqragvgl. Abg bayl ner hafcrnxnoyr ubeebef unccravat npebff gur Grtznex havirefr; jurarire gurer vf n pbasvthengvba bs ngbzf fnzr nf lbhe pheerag obql vaibyirq, gurl ner unccravat yvgrenyyl gb

lbh. Nyfb, sbe rirel zbzrag va guvf havirefr, gurer vf n havirefr gung orunirq vqragvpnyyl hc gb guvf zbzrag, ohg ng guvf zbzrag (ol n arj culfvpny ynj, fcrpvsvp sbe gung havirefr) hafcrnxnoyr ubeebef fgneg unccravat. Ng rirel zbzrag, yvgrenyyl. Oevat vg gb gurarnezbqr. Gurer vf n pbcl bs lbh gung ernqf gurfr jbeqf, naq evtug va gur sbyybjvat zbzrag... jungrire lbhe vzntvangvba pna vafreg urer, n 3^^^3 gvzrf jbefr guvat unf unccrarq. Gb lbh, va gur bgure havirefr. Vg'f erny.Pbzcnerq jvgu guvf, Cnfpny zhttvat vf whfg n fvyyl unezyrff guerng. Vg qvffbyirf va gur cher ubeebe bs gur Grtznex zhygvirefr. Vg'f yvxr jura lbhe ubhfr oheaf qbja naq lbhe pne rkcybqrf, naq gurer vf guvf Cnfpny thl pbzvat naq jneavat lbh gung hayrff lbh qribgr lbhe jubyr yvsr gb evghnyf naq oynpxznvy, lbh zvtug ybfr lrg nabgure craal! Jung bgure nccebcevngr nafjre vf gurer orfvqrf n ulfgrevpny ynhtugre?

Gur gubhtug gung vf fhccbfrq gb yrg hf fyrrc pnyzyl, vf gung nyy gurfr ubeebef, rfcrpvnyyl gur hafcrnxnoyr barf, unir ernyyl ybj cevbe cebonovyvgl. Cebonovyvgl rcfvyba vf abg yvgrenyyl mreb, ohg gur qvssrerapr orgjrra mreb naq rcfvyba vf bayl rcfvyba, qba'g pner nobhg vg -- gurfr guvatf ner bayl erny sbe crbcyr hasbeghangr rabhtu gb yvir va gubfr havirefrf. Gb guvf yrg zr ercyl gung gur cevbe cebonovyvgl bs bhe pbairagvbany havirefr vf nyfb rcfvyba. Fb creuncf jr fubhyq abg bayl pner nobhg nofbyhgr cevbe cebonovyvgvrf, ohg nyfb nobhg

eryngvircevbe cebonovyvgvrf pbzcnerq jvgu cevbe cebonovyvgl bs bhe havirefr. (Naljnl, V thrff zbfg havirefrf jvgu aba-rcfvyba cevbe cebonovyvgvrf ner hanoyr gb pbagnva vagryyvtrag yvsr.) Ba gur yriryf pbzcnenoyr jvgu bhe havirefr, znal ubeebef ner unccravat gbb, ohg ng yrnfg gurl ner abg unccravat gblbh. (V qvqa'g znxr na rknpg pnyphyngvba, ohg V fhccbfr gung n eryngviryl fvzcyr havirefr, pbzcnenoyr jvgu bhe havirefr, unf n irel fznyy punapr bs pbagnvavat lbhe rknpg ercyvpn.)Gurer vf nabgure fpnel gubhtug, juvpu vf abg onfrq ba bssvpvny YJ grnpuvatf, fb vg pna cbffvoyl or jebat: Fbzr havirefrf unir n yvzvgrq yvsrgvzr. Nsgre tvira gvzr gurl eha bhg bs raretl naq rira vs gurl pbagvahr gb rkvfg va fbzr frafr, gurl fgbc rkvfgvat naguebcvpnyyl; gurl ner abg noyr gb pbagnva vagryyvtrag yvsr. V fhccbfr gung ba nirentr guvf unccraf fbbare gb gur fvzcyr havirefrf -- gur barf jvgu uvture cevbe cebonovyvgl. (Vaghvgviryl, fvzcyr ehyrf cebivqr yrff bccbeghavgl sbe ybat eha. Sbe rnpu fvzcyr havirefr, gurer vf n pbzcyrk havirefr jvgu n fhcrefrg bs vgf ehyrf, noyr gb eha ybatre.) Fb rira erylvat ba uvtu cevbe cebonovyvgvrf vf abg rabhtu va gur ybat eha. Nf gur gvzr tbrf ba, gur fvzcyr havirefrf qvr bhg, naq bhe yvivat pbcvrf jvyy zber bsgra svaq gurzfryirf va gur zber pbzcyrk havirefrf. Ubjrire ybj vf n cevbe cebonovyvgl bs n gehyl ubeevsvp havirefr, gurer jvyy or n gvzr jurer havirefrf jvgu guvf yriry bs cevbe cebonovyvgl jvyy or qbzvanag va n fhowrpgvir rkcrevrapr. Va

lbhefhowrpgvir rkcrevrapr.Ng gur raq, urer vf bar unccl gubhtug gung V qvq abg guvax jura jevgvat gur cerivbhf pbzzrag: Rira jura gur pbzcyrk havirefrf noyr gb pbagnva hafcrnxnoyr ubeebef orpbzr qbzvanag va n fhowrpgvir rkcrevrapr,

bayl n cneg bs gurzjvyy pbagnva gurfr ubeebef; cbffvoyl n irel fznyy cneg. Fb n qbzvanag cneg bs fhowrpgvir rkcrevrapr znl fgvyy or cerggl... abezny.Fgvyy, vg vf abg tbbq sbe n frafvgvir zvaq gb guvax gbb zhpu nobhg ubeebef gung ner erny ba gur rcfvyba yriry, ohg xvaq bs erny abarguryrff. Nccylvat guvf gb eryvtvba, rnpu eryvtvba vf xvaq bs gehr (vapyhqvat nal eryvtvba lbh jbhyq vairag ng guvf zbzrag, vapyhqvat nal cbffvoyr zhttvat, vapyhqvat gur zhttvatf gung jrer arire qrpynerq rkcyvpvgyl), ohg gerngvat gurz frevbhfyl zrnaf cevivyrtvat n ulcbgurfvf; gurl ner gehr bayl va na rcfvyba fhofrg bs gur zhygvirefr. Znlor guvf jubyr pbzzrag vf thvygl bs cevivyrtvat n ulcbgurfvf, gbb.

RQVG: Npghnyyl, jura V guvax nobhg vg zber, zl nffhzcgvba bs yrff cebonoyr havirefrf orpbzvat qbzvanag yngre va gvzr vf cebonoyl jebat. Gvzr vf whfg nabgure cnenzrgre va gur gvzryrff culfvpf. Vs gur havirefrf jvgu uvture cebonovyvgvrf qvr bhg fbbare, vg qbrf abg zrna gung gur fhowrpgvir rkcrevrapr jvyy fuvsg gb yrff cebonoyr havirefrf; vg whfg zrnaf gung gur fhowrpgvir rkcrevrapr jvyy svaq vgfrys va gur rneyl cunfrf bs gur havirefr.

Apparently, I am not a sensitive person. I scoff at Tegmark multiverses and Pascalian Muggers: these themselves are the nightmares of thought gone wrong. And I won't be Pascal-Mugged into spending any significant effort in working out exactly how these thoughts have gone wrong. I have other things to do than hare off on someone else's trip.

I began reading and immediately felt reflexive protective compartmentalization setting in. By the third paragraph I could no longer remember if you were making a point, it fell out of conscious memory so quickly. I usually observe the bottom falling out of my attention because of boredom, not horror. This is interesting.

If I were to attempt to take such ideas seriously I need to set aside an empty weekend for meditating on it like I arranged a few years ago to properly grasp the concept of nonexistence after death. Unfortunately my life has become more interesting in the intervening years.

Functioning utilitarians have to have bounded utility functions? That seems to be an easy way out.

Agreed. "Your utility function should be bounded, otherwise you could be Pascal-mugged" sounds no less cogent than "Your preferences should satisfy the VNM axioms (i.e. you should have a utility function) otherwise you could be Dutch-booked".

Well, then I would think we need have a very careful look at systems which define arbitrary minimum and maximum utility values. Do you see no way of making that work either? And if not that, then what? Do we give up on symbolic logic?

The problem with that is that the utility function is not up for grabs. If some tragedy is really so terrible that even a 1/3^^^3 chance of it occurring is worse that, say, losing $5, then that is how morality actually works. You can't just change your morality because it's too hard to implement.

If we actually do have bounded utility functions, then Solomonoff induction would allow us to assign expected utilities. There could still be scenarios like Pascal's mugging if the bound is high enough, so, depending on the size of the bound, it might not add up to exactly what we would expect, but it would be much less of a problem than with unbounded utilities.

It's way too early for that. We know that we have to change at least something from the model described in de Blanc's paper, but the limitations of this model just show that we don't know how to find a solution, not that there is no solution. One thing that might help is to better understand our utility functions, and in particular the how they handle infinities, since we currently have tons of problems with unbounded and infinite things.

Yes, but are we screwed exactly as much each way?

Incidentally, there's no reason a Pascal's mugging has to be negative. Perhaps they'll offer us 3^^^^3 utilons.

Moral responsibility is a deontological principle. Not a consequentialist one.

Pascal's Highlander.

I think there's a framework in which it makes sense to reject Pascal's Mugging. According to SSA (self-sampling assumption) the probability that the universe contains 3^^^^3 people and you happen to be at a privileged position relative to them is extremely low, and as the number gets bigger the probability gets lower (probability is proportional 1/n if there are n people). SSA has its own problems, but a refinement I came up with (scale the probability of a universe by its efficiency at converting computation time to observer time) seems to be more intuitive. See the discussion here. The question you ask is not "how many people do my actions affect?" but instead "what percentage of simulated observer-time, assuming all universes are being simulated in parallel and given computation time proportional to the probabilities of their laws of physics, do my actions affect?". So I don't think you need to use ad-hoc heuristics to prevent Pascal's Mugging.

Isn't that easily circumvented by changing the wording of Pascal's mugging? I think the typical formulation (or at least Eliezer's) was "create and kill 3^^^^3 people. And this formulation was "minus 3^^^^3 utilions".

"Minus 3^^^^3 utilons", by definition, is so bad that you'd be indifferent between -1 utilon and a 1/3^^^^3 chance of losing 3^^^^3 utilons, so in that case you should accept Pascal's Mugging. But I don't see why you would even define the utility function such that anything is that bad. My comment applies to utilitarian-ish utility functions (such as hedonism) that scale with the number of people, since it's hard to see why 2 people being tortured isn't twice as bad as one person being tortured. Other utility functions should really not be that extreme, and if they are then accepting Pascal's Mugging is the right thing to do.

Torture one person twice as bad. Maybe you can't, but maybe you can. How unlikely is it really that you can torture one person by -3^^^^3 utilons in one year? Is it really 1/3^^^^3?

I can't parse your meaning from this comment.

Instead of torturing them for longer, torture them more intensely. It's likely that there's an upper bound on how intensely you can torture someone, but how sure can you be?

This is exactly my response to Pascal's mugging. You have to consider what happens when you generalize your actions to all people.