*For background, see here.*

In a comment on the original Pascal's mugging post, Nick Tarleton writes:

[Y]ou could replace "kill 3^^^^3 people" with "create 3^^^^3 units of disutility according to your utility function". (I respectfully suggest that we all start using this form of the problem.)

Michael Vassar has suggested that we should consider any number of identical lives to have the same utility as one life. That could be a solution, as it's impossible to create 3^^^^3 distinct humans. But, this also is irrelevant to the create-3^^^^3-disutility-units form.

Coming across this again recently, it occurred to me that there might be a way to generalize Vassar's suggestion in such a way as to deal with Tarleton's more abstract formulation of the problem. I'm curious about the extent to which folks have thought about this. (Looking further through the comments on the original post, I found essentially the same idea in a comment by g, but it wasn't discussed further.)

The idea is that the Kolmogorov complexity of "3^^^^3 units of disutility" should be *much higher* than the Kolmogorov complexity of the number 3^^^^3. That is, the utility function should grow only according to the complexity of the scenario being evaluated, and not (say) linearly in the number of people involved. Furthermore, the domain of the utility function should consist of *low-level descriptions* of the state of the world, which won't refer directly to words uttered by muggers, in such a way that a mere discussion of "3^^^^3 units of disutility" by a mugger will not typically be (anywhere near) enough evidence to promote an *actual* "3^^^^3-disutilon" hypothesis to attention.

This seems to imply that the intuition responsible for the problem is a kind of fake simplicity, ignoring the complexity of value (negative value in this case). A confusion of levels also appears implicated (talking about utility does not itself significantly affect utility; you don't suddenly make 3^^^^3-disutilon scenarios probable by talking about "3^^^^3 disutilons").

What do folks think of this? Any obvious problems?

Is your utility function such that there is

somescenario for which you assign -3^^^^3 utils? If so, then the Kolmogorov complexity of "3^^^^3 units of disutility" can't be greater than K(your brain) + K(3^^^^3), since I can write a program to output such a scenario by iterating through all possible scenarios until I find one which your brain assigns -3^^^^3 utils.A prior of 2^-(K(your brain) + K(3^^^^3)) is not nearly small enough, compared to the utility -3^^^^3, to make this problem go away.

Given that there's no definition for the value of a util, arguments about how many utils the universe contains aren't likely to get anywhere.

So let's make it easier. Suppose the mugger asks you for $1, or ey'll destroy the Universe. Suppose we assume the Universe to have 50 quadrillion sapient beings in it, and to last for another 25 billion years ( = 1 billion generations if average aliens have similar generation time to us) if not destroyed. That means the mugger can destroy 50 septillion beings. If we assign an average being's life as worth $100000, the... (read more)

One day I would like to open up an inverse casino.

The inverse casino would be full of inverse slot machines. Playing the inverse slot machines costs negative twenty-five cents - that is, each time you pull the bar on the machine, it gives you a free quarter. But once every few thousand bar pulls, you will hit the inverse jackpot, and be required to give the casino several thousand dollars (you will, of course, have signed a contract to comply with this requirement before being allowed to play).

You can also play the inverse lottery. There are ten million inverse lottery tickets, and anyone who takes one will get one dollar. But if your ticket is drawn, you must pay me fifteen million dollars. If you don't have fifteen million dollars, you will have various horrible punishments happen to you until fifteen million dollars worth of disutility have been extracted from you.

If you believe what you are saying, it seems to me that you should be happy to play the inverse lottery, and believe there is literally no downside. And it seems to me that if you refused, I could give you the engineer's answer "Look, (

buys ticket) - a free dollar, and nothing bad happened to me!"And if you are willing to play the inverse lottery, then you should be willing to play the regular lottery, unless you believe the laws of probability work differently when applied to different numbers.

Sorry, I know I said I'd stop, and I will stop after this, but that 3E22 number is just too interesting to leave alone.

The last time humanity was almost destroyed was about 80,000 years ago, when a volcanic eruption reduced the human population below 1,000. So say events that can destroy humanity happen on average every hundred thousand years (conservative assumption, right?). That means the chance of a humanity-destroying event per year is 1/100,000. Say 90% of all humanity destroying events can be predicted with at least one day's notice by eg asteroid monitoring. This leaves hard-to-detect asteroids, sudden volcanic eruptions, weird things like sudden methane release from the ocean, et cetera. So 1/1 million years we get an unexpected humanity destroying event. That means the "background rate" of humanity destroying events is 1/300 million days.

Suppose Omega told you, the day before the LHC was switched on, that tomorrow humankind would be destroyed. If 1/3E22 were your true probability, you would say "there's still vastly less than one in a billion chance the apocalypse has anything to do with the LHC, it must just be a coincidence." Even if you were the LH... (read more)

This requirement (large numbers that refer to sets have large kolmogorov complexity) is a weaker version of my and RichardKenneway's versions of the anti-mugging axiom. However, it doesn't work for all utility functions; for example, Clippy would still be vulnerable to Pascal's Mugging if using this strategy, since he doesn't care whether the paperclips are distinct.

There is an idea here, but it's a little muddled. Why should complexity matter for Pascal's mugging?

Well, the obvious answer to me is that, behind the scenes, you're calculating an expected value, for which you need a probability of the antagonist actually following through. More complex claims are harder to carry out, so they have lower probability.

A separate issue is that of having bounded utility, which is possible, but it should be possible to do Pascal's mugging even then, if the expected value of giving them money is higher than the expected value ... (read more)

I think that the more general problem is that if the absolute value of the utility that you attach to a world-state increases faster than does its complexity decreases given the current situation then the very possibility of that world-state existing will cause it to hijack the entirety of your utility function (assuming that there are no other world-states in your utility function which go FOOM in a similar fashion.)

Of course, utility functions are not constructed to avoid this problem, so I think that it's incredibly likely that each unbounded utility function has at least one world-state which would render it hijackable in such a manner.

Don't see how your idea defeats this:

A corollary is a necessary condition for friendliness: if the utility function of an AI can take values much larger than the complexity of the input, then it is unfriendly. This kills Pascal's mugging and paperclip maximizers with the same stone. It even sounds simple and formal enough to imagine testing it on a given piece of code.

I stumbled across this fix and unfortunately discovered what I consider to be a massive problem with it - it would imply that your utility function is non-computable.

OK. So in order for this to work, it needs to be the case that your prior has the property that: P(3^^^3 disutility | I fail to give him $5) << 1/3^^^3.

Unfortunately, if we have an honest Kolmogorov prior and utility is computable via a complexity << 3^^^3 Turing machine, this cannot possibly be the case. In particular, it is a Theorem that for any computable function C (whose Tu... (read more)

The problem, as stated, seems to me like it can be solved by precommitting not to negotiate with terrorists--this seems like a textbook case.

So switch it to Pascal's Philanthropist, who says "I offer you a choice: either you may take this $5 bill in my hand, or I will use my magic powers outside the universe to grant you 3^^^^3 units of utility."

But I'm actually not intuitively bothered by the thought of refusing the $5 in that case. It's an eccentric thing to do, but it may be rational. Can anybody give me a formulation of the problem where taking the magic powers claim seriously is obviously crazy?

The way around Pascal's mugging is to have a bounded utility function. Even if you are a paperclip-maximizer, your utility function is not the number of paperclips in the universe, it is some bounded function that is monotonic in the number of paperclips but asymptotes out. You are only linear in paperclips over small numbers of paperclips. This is not due to exponential discounting but because utility doesn't mean anything other than the function that we are maximizing the expected value of. It has an unfortunate namespace collision with the other utility... (read more)

I really like this suggestion. One esthetic thing it has going for it: complexity should be a terminal value for human-relatable intelligent agents anyway. It seems gauche for simple pleasures (orgasms, paperclips) to yield unbounded utility.

When denizens here say "value is complex" what they mean is something like "the things which humans want have no concise expression". They don't

literallymean that a utility countermeasuringthe extent to which those values are met is difficult to compress. That would not make any sense.