## LESSWRONGLW

I think there's a framework in which it makes sense to reject Pascal's Mugging. According to SSA (self-sampling assumption) the probability that the universe contains 3^^^^3 people and you happen to be at a privileged position relative to them is extremely low, and as the number gets bigger the probability gets lower (probability is proportional 1/n if there are n people). SSA has its own problems, but a refinement I came up with (scale the probability of a universe by its efficiency at converting computation time to observer time) seems to be more intuitive. See the discussion here. The question you ask is not "how many people do my actions affect?" but instead "what percentage of simulated observer-time, assuming all universes are being simulated in parallel and given computation time proportional to the probabilities of their laws of physics, do my actions affect?". So I don't think you need to use ad-hoc heuristics to prevent Pascal's Mugging.

Isn't that easily circumvented by changing the wording of Pascal's mugging? I think the typical formulation (or at least Eliezer's) was "create and kill 3^^^^3 people. And this formulation was "minus 3^^^^3 utilions".

# 10

There seems to be some continuing debate about whether or not it is rational to appease a Pascal Mugger. Some are saying that due to scope insensitivity and other biases, we really should just trust what decision theory + Solomonoff induction tells us. I have been thinking about this a lot and I'm at the point where I think I have something to contribute to the discussion.

Consider the Pascal Mugging "Immediately begin to work only on increasing my utility, according to my utility function 'X', from now on, or my powers from outside the matrix will make minus 3^^^^3 utilons happen to you and yours."

Any agent can commit this Pascal's mugging (PM) against any other agent, at any time. A naive decision-theoretic expected-utility optimizer will always appease the mugger. Consider what the world would be like if all intelligent beings were this kind of agent.

When you see an agent, any agent, your only strategy would be to try to PM it before it PMs you. More likely, you will PM each other simultaneously, in which case the agent which finishes the mugging first 'wins'. If you finish mugging at the same time, the mugger that uses a larger integer in its threat 'wins'. (So you'll use the most compact notation possible and things like, "minus the Busy Beaver function of Graham's number utilons".)

This may continue until every agent in the community/world/universe has been PMed. Or maybe there could be one agent, a Pascal Highlander, who manages to escape being mugged and has his utility function come to dominate...

Except, there is nothing stipulating that the mugging has to be delivered in person. With a powerful radio source, you can PM everyone in your future light-cone unfortunate enough to decode your message, potentially highjacking entire distant civilizations of decision-theory users.

Pascal's mugging doesn't have to be targeted. You can claim to be a Herald of Omega and address your mugging "to whoever receives this transmission"

Another strategy might be to build a self-replicating robot (itself too dumb to be mugged) which has a radio which broadcasts a continuous fully general PM, and send it out into space. Then you commit suicide to avoid the fate of being mugged.

Now consider a hypothetical agent which completely ignores muggers. And mugs them back.

Consider what could happen if we build an AI which is friendly in every possible respect except that it appeases PMers.

To avoid this, you might implement a heuristic that ignores PMs on account of the prior improbability of being able to decide the fate of so many utilons, as Robin Hanson suggested. But an AI using naïve expected utility + SI may well have other failure modes roughly analagous to PM that we won't think of until its too late. You might get agents to agree to pre-commit to ignore muggers, or to kill them, but to me this seems unstable. A bandaid that's not addressing the heart of the issue. I think an AI which can envision itself being PMed repeatedly by every other agent on the planet and still evaluate appeasement as the lesser evil cannot possibly be a Friendly AI, even if it has some heuristic or ad hoc patch that says it can ignore the PM.

Of course there's the possibility that we are in a simulation which is occasionally visited by agents from the mother universe, which really does contain 3^^^^3 utilons/people/dustspecks. I'm not convinced acknowledging this possibility changes anything. There's nothing of value that we, as simulated people, could give our Pascal Mugging simulation overlords. Their only motivation would be as absolute sadistic sociopaths, but if that's the reality of the multiverse, in the long term we're screwed no matter what we do, even with friendly AI. And we certainly wouldn't be in any way morally responsible for their actions.

Edit 1: fixed typos