Pascal's Mugging and One-shot Problems

[-]Donald Hobson7y60

If you literally maximize expected number of paperclips, using standard decision theory, you will always pay the casino. To refuse the one shot game, you need to have a nonlinear utility function, or be doing something weird like median outcome maximization.

Choose action A to maximixe m such that P(paperclip count>m|a)=1/2

A well defined rule, that will behave like maximization in a sufficiently vast multiverse.

[-]Gurkenglas7y10

What do you mean by a sufficiently large multiverse? If your first choice loses many paperclips in 40% of cases and wins's few in the rest, you would take it and a maximizer wouldn't.

[-]Donald Hobson7y10

If you were truly alone in the multiverse, this algorithm would take a bet that had a 51% chance of winning them 1 paperclip, and a 49% chance of loosing 1000000 of them.

If independant versions of this bet are taking place in 3^^^3 parallel universes, it will refuse.

For any finite bet, for all sufficiently large $N$ If the agent is using TDT and is faced with the choice of whether to make this bet in $N$ multiverses, it will behave like an expected utility maximizer.

[-]Dagon7y30

1) You're correct that "known finite iterations" can be treated as "single-shot" by defining a complete strategy and not caring about intermediate states. "unknown ending conditions" may or may not be reducible in this way.

2) You can't get away from utility. You have to define how much better a universe with X - 10n + 3^^^^3 paperclips is than a universe with X or a universe with X - 10n (where X is starting paperclips, n is number of wagers you'll make before giving up or hitting the jackpot).

3) using ludicrous numbers breaks most people's intuitions (cf scope insensitivity), and you should explain why you don't use a 100-sided die and a payout of a trillion paperclips.

[-]Mathilde7y10

Thank you, this is good to know. I'll have to think about this some more.
Hm, I was working under the assumption that the "utility" with paperclips was just the number of paperclips. A universe with X - 10n + 3^^^^3 paperclips is better than a universe with just X paperclips by 3^^^^3 - 10n. Is this not a proper utility function?
The casino version evolved from repeated alterations to Pascal's Mugging, so it retained the 3^^^^3 from there. I had written a paragraph where I mentioned that for one-shot problems, even a more realistic probability could qualify as a Pascal's Mugging, though I had used a 1/million chance of a trillion paperclips instead of 1/100. I ended up editing that paragraph out, though.

Working with a 1/100 probability, it's less obviously a bad idea to pay up, of course. I don't know where to draw the line between "this is a Pascal's Mugging" and "this is good odds", so I'm less confident that you shouldn't pay up for a 1/100 probability. I think it becomes a more obviously bad idea if we up the price of the casino, for example to 1 million paperclips. This still gives positive EU to paying, but has a fairly steep price compared to doing nothing unless you get pretty lucky.

Looking back, I think that one of the factors in my decision to retain such ludicrous numbers was that it seemed more persuasive. I apologise for this.

All that being said, thank you very much for your reply!

[-]Slider7y20

I think your analysis of "maximise" just compares x>y without regard how much bigger x is which is kind of a natural consequence for subtracting expected utility out. However it does highlight that if our goal is "maximise paperclips" it doesn't really say whether "win harder" is relevant or not. That is 2>1 but so is 1000>1. So for cases when an outcome is not a constant amount of paperclips we need more rules than what the object of attention is. So a paperclip maximiser is actually underspecified.

[-]Mathilde7y10

Very interesting, thank you!

I think "maximising" still makes sense in one-shot problems. 2>1 and 1000>1, but it's also the case that 1000>2, even without expected utility. The way I see it, EU is a method of comparing choices based on their average utility, but the "average" turns out to be a less useful metric when you only have one chance.

So for cases when an outcome is not a constant amount of paperclips we need more rules than what the object of attention is. So a paperclip maximiser is actually underspecified.

If this is true, it would imply that in a one-shot problem, a utility function is not enough on its own to determine what is the "optimal" choice when you want to "maximise" (get the highest value you can) on that utility function. This would be a pretty big result, I think.

I think that if there is a part that is underspecified, though, it's not the paperclip maximiser, but the word "optimal". What does it mean for a choice to be "optimal" relative to other choices, when it might turn out better or worse depending on luck? I haven't been able to answer that question.

[-]Slider7y20

Many times opinions how to handle uncertainty get baked into the utility functions. That is a standard naive construction is to say "be risk neutral" and value paperclips linearly for their amount. But I could imagine a policy for which more paperclips is always better but from a default position of 100% 2 paperclips it wouldn't choose a option of 0.1% 1 paperclips, 49.9% 2 paperclips and 50% 3 paperclips. One can construct a "risk averse" function where the new function can simply be optimised. But does it really mean the new function is not a paper clip maximation function?

[-]Mathilde7y20

You're absolutely right. I was starting to get at this idea from another of the comments, but you've laid out where I've gone wrong very clearly. Thank you.

[-]Gurkenglas7y20

Are you rejecting Pascal's mugging because of the prospect of relying on uncertain models that you do not expect to confirm?

Is all your intuition captured by maximizing utility over all but the extreme billionth of the distribution?

Here's a one-shot problem for your intuition to answer: You get to design the probability distribution to draw the number of paperclips from, except that its expectation must be at most its negative kolgomorov complexity. What distribution makes for a good choice?

[-]Mathilde7y10

Thank you for your response!

Are you rejecting Pascal’s mugging because of the prospect of relying on uncertain models that you do not expect to confirm?

My intuition is that in a one-shot problem, gambling everything on an extremely low probability event is a bad idea, even when the reward from that low probability event is very high, because you are effectively certain to lose. This is the basis for me not paying up in Pascal's Mugging and in the casino problem in the post.

I'm trying to keep my reasoning simple, so in my examples I always assume that there are no infinities, no unknown unknowns, every outcome of every choice is statistically independent, and all the assigned probabilities are statistically correct (if there is a 1/6 chance of an outcome and you get to repeat the problem, you will get that outcome on average 1/6 of the time).

Is all your intuition captured by maximizing utility over all but the extreme billionth of the distribution?

Honestly, I have no idea how to solve the problem. My intuition is hopelessly muddled on this, and every idea I've been able to come up with seems flawed, including the one you've just asked about.

Here's a one-shot problem for your intuition to answer: You get to design the probability distribution to draw the number of paperclips from, except that its expectation must be at most its negative kolgomorov complexity. What distribution makes for a good choice?

My first thought is 1/googolplex chance of losing 3^^^^3 paperclips, and the rest of the probability giving as many paperclips as the kolmogorov complexity constraint allows. I could do better by increasing the probability of the loss, for example 1/googol would be a better probability. However, I have no idea where to draw the line, at what point it stops being a good idea to increase the probability.

[-]Gurkenglas7y10

In a market of bettors that draw the line of how much risk to take at different points, the early game will be dominated by the most risk-taking folks and as the game grows older, the line that was chosen by the current winners moves. Perhaps your intuition is merely the product of evolution playing this game for as long as it took for the line to reach its current point?

LESSWRONG
LW

LESSWRONG
LW

2

Pascal's Mugging and One-shot Problems

2

2