# 8

Personal Blog

In a lot of discussion here, there's been talk about how decision algorithms would do for PD, Newcomb's Probmel, Parfit's Hitchhiker, and Counterfactual Mugging.

There's a reasonable chain there, but especially for the last one, there's a bit of a concern I've had about the overal pattern. Specifically, while we're optimizing for extreme cases, we want to make sure we're not hurting our decision algorithm's ability to deal with less bizzare cases.

Specifically, part of the reasoning for the last one could be stated as "be/have the type of algorithm that would be expected to do well when a Counterfactual Mugger showed up. That is, would have a net positive expected utility, etc..." This is reasonable, espectially given that there seems to be lines of reasoning (like Timeless Decision Theory) that _automatically_ get this right using the same rules that would get it to succeed with PD or any other such thing. But I worry about, well, actually it would be better for me to show an example:

Consider the Pathological Decision Challenge.

Omega shows up and presents a Decision Challenge, consisting of some assortment of your favorite decision theory puzzlers. (Newcomb, etc etc etc...)

Unbeknownst to you, however, Omega also has a secret additional test: If the decisions you make are all something _OTHER_ than the normal rational ones, then Omega will pay you some huge superbonus of utilions, vastly dwarfing any cost to loosing all of the individual challenges...

However, Omega also models you and if you would have willingly "failed" _HAD YOU KNOWN_ about the extra challenge above, (but not this extra extra criteria), then you get no bonus for failing everything.

A decision algorithm that would tend to win in this contrived situation would tend to lose in regular situations, right? Again, yes, I can see the argument that being the type of algorithm that can be successfully counterfactually mugged can arise naturally from a simple rule that automatically gives the right answer for many other more reasonable situations. But I can't help but worry that as we construct more... extreme cases, we'll end up with this sort of thing, were optimizing our decision algorithm to win in the latest "decision challenge" stops it from doing as well in more, for lack of a better word, "normal" situations.

Further, I'm not sure yet how to more precisely separate out pathalogical cases from more reasonable "weird" challenges. Just to clarify, this post isn't a complaint or direct objection to considering things like Newcomb's problem, just a concern I had about a possible way we might go wrong.

# 8

New Comment

I'm pretty sure that any time you attach a rider like "if you would choose the option that to your knowledge is best then instead someone pokes you in the eye" rational agents will get poked in the eye. And this isn't a problem, 'cause that sort of rider is far weirder than all those other contrived decision problems.

Yes, I know the explicit negation is what makes it pathological in this case. My problem above was deliberately contrived. The point is that I'm not sure how to more precisely separate such pathological cases from non pathological cases. I'm pretty sure we wouldn't be able to establish a theorem to the effect of "any situation that lacks an explicit 'inverter' will be well behaved and compatible with the ability to make reasonable decisions in reasonable situations"

I think the important part is that the agent doesn't know about it, not the negation. Throw in information we assume the agent can't fully take into account and the agent fails when the information would have been important. Maybe your point isn't that simple but I'm not seeing it.

Maybe that's sufficient to take care of the issue, but I'm not sure I can prove that it is. ie, my point was more "As obvious as the pathology of this one was, can we be sure that we won't run into analogous... but subtler and harder to detect incompatibilities in, say, whatever next week's souped up extension of counterfactual mugging someone comes up with or whatever?"

My worry isn't about this specific instance of a pathological decision problem. This specific instance was simply meant to be a trivial illustration of the issue. Maybe I'm being silly, and we'll be able to detect any of these cases right away, but maybe not. I haven't really thought yet of a way of precisely stating the problem, but basically it's a "while we're having all this fun figuring out how to make a decision algorithm that wins in these weird mindbending cases... let's make sure doing so doesn't sacrifice its ability to win for more ordinary cases."

[-][anonymous]14y0

Pathological cases are unlikely. Non-pathological cases are likely. This is because rational agents do whatever makes non-pathologicality likely rather than unlikely. If people are getting Pathologically Mugged on a regular basis, you end up with a more normal decision problem as people adapt to the pathological mugging.

So how often does actual Counterfactual Mugging come up?

Homeowner's insurance.

ETA: Actually, insurance of any kind. I'll explain with a simple dialogue:

"Hey, give me \$1000."

"WTF? Why?"

"What's with the attitude? If I had found out your house was going to burn down in the next year, I was going to give you \$300,000. Well, turns out it's not. But if you weren't willing to give me \$1000 in every safe year, I wouldn't plan on giving you \$300,000 in the years your house burns down! Now, are you going to give me \$1000, or are you going to condemn to poverty all of your copies living in worlds where your house burns down?"

If no one's pointed out this insight before, and it really is an insight, please pass this analogy along.

[-][anonymous]14y2

If insurance companies predict that my house will burn down next year, they do not simulate me to determine whether I would have paid the \$1,000 had they counterfactually mugged me. They just ask me to pay \$1,000 every year, before my house potentially burns. This is a critical difference.

Or are insurance companies more psychic than I thought?

That's not necessary for the parallel to work. In my post, the insurer is stating how things look from your side of the deal, in a way that shows the mapping to the counterfactual mugger. (And by the way, if insurers predict your house will burn down they don't offer you a policy -- not one as cheap as \$1000. If they sell you one at all, they sell at a price equal to the payout, in which case it's just shuffling money around.)

What creates a mapping to Newcomb's problem (and by transition, the counterfactual mugging) is the inability to selectively set a policy so that it only applies at just the right time to benefit you. With a perfect predictor (Omega), you can't "have a policy of one-boxing" yet conceal your intent to actually two-box.

This same dilemma arises in insurance, without having to assume a perfect, near-acausal predictor: you can't "decide against buying insurance" and then make an exception over the time period where the disaster occurs. All that's necessary for you to be in that situation is that you can't predict the disaster significantly better than the insurer (assuming away for now the problems of insurance fraud and liability insurance, which introduce other considerations).

[-][anonymous]14y1

The analogy is that both situations use expected utility to make a decision. The principal difference is that when you are buying insurance, you do expect your house to burn to the ground, with a small probability. Counterfactual mugging suggests an improbable conclusion that the updated probability distribution is not what you should base your decisions on, and this aspect is clearly absent from normal insurance.

Counterfactual mugging suggests an improbable conclusion that the updated probability distribution is not what you should base your decisions on, and this aspect is clearly absent from normal insurance.

No, normal insurance has this aspect too: you don't regret buying insurance once you learn the updated probability distribution, so you shouldn't base your decision on it either.

you don't regret buying insurance once you learn the updated probability distribution

I don't regret it, because I remember that it was the best I could do given the information I had at the time. But if I knew when deciding whether or not to buy insurance that I would not be sued or become liable for large amounts, then I wouldn't buy it. And I don't want to change my decision algorithm to ignore information just because I didn't have it some other time.

Where did I suggest that throwing away information is somehow optimal or of different optimality than in Newcomb's problem or the counterfactual mugging?

I don't know whether or not you intended to say that. But if you didn't, then what did you mean by

normal insurance has this aspect too: you don't regret buying insurance once you learn the updated probability distribution, so you shouldn't base your decision on it either

To me that looks like you are saying that it doesn't matter if you decide whether to buy insurance before or after you learn what will happen next year.

Please read it in the context of the comment I was replying to. Vladimir_Nesov was trying to show how my mapping of insurance to Newcomb didn't carry over one important aspect, and my reply was that when you consistently carry over the mapping, it does.

That is the context that I read it in. He pointed out that counterfactual mugging is equivalent to insurance only if you fail to update on the information about which way the coin fell before deciding (not) to play. You responded that this made no difference because you didn't regret buying insurance a year later (when you have the information but don't get to reverse the purchase).

I guess I should have asked for clarification on what he meant by the "improbable conclusion" that the counterfactual mugging suggests. I thought he meant that the possibility of being counterfactually mugged implies the conclusion that you should pre-commit to paying the mugger, and not change your action based upon finding that you were on the losing side.

If that's not the case, we're starting from different premises.

In any case, I think the salient aspect is the same between the two cases: it is optimal to precommit to paying, even if it seems like being able to change course later would make you better off.

That's not really a counterfactual mugging though, is it? ie, it doesn't fit the template of "I decided to flip a fair coin give you ten dollars if it came up heads if I predicted (and I'm really good at predicting) that if it came up tails you would give me five dollars. I flipped the coin yesterday and it came up tails. So... do you give me five dollars?"

EDIT: to respond to your edit... what insurance company would actually do that? ie, you first have to sign up with them, etc etc. And if their actuaries compute that there's a reasonable likelihood that they'll have to pay out to you... They avoid you or give you nastier premiums or such in the first place. I guess I could see an argument that insurance could be viewed as RELATED to something like an iterated counterfactual mugging, though...

There are a few ways you can look at this to make it seem more relevant. I think you can transform the counterfactual mugging into insurance such that each step shouldn't change your answer.

But let me put it this way instead: Imagine that you're going to insert your consciousness into a random "you" across the multiverse. In some of those your house (or other valuable) burns down (or otherwise descends into entropy). Would you rather be thrown into a "you" who had bought insurance, or hadn't bought insurance?

Remember, it's not an option to only buy insurance in the ones where your house burns down, i.e. to separate the "yous" into a) those whose houses didn't burn down and didn't buy insurance, vs. b) those whose houses did burn down and did buy insurance. This inseparability, I think, captures the salient aspects of the counterfactual mugging because it's (presumably) not an option to "be the type to pay the mugger" only in those cases where the coin flip favors you.

(I daydreamed once about some guy whose house experiences a natural disaster, so he goes to an insurance company with which he has no policy, and when it's explained to him that they only pay out to people who have a policy with them, he rolls his eyes and tries to give them money equal to a month's premium, as if that will somehow make them pay out.)

A decision algorithm that would tend to win in this contrived situation would tend to lose in regular situations, right?

Yes. There is No Free Lunch. For every possible algorithm it is possible to create a problem in which the algorithm fares poorly. An algorithm optimized for any problem which gives a payoff in utility for being irrational will tend lose in regular situations. Also, decisions made based on pathological priors will tend to lose. This includes having inaccurate priors about the likely behaviour of the superintelligence that you are playing with.

Right, which is why I say it's misguided to search for a truly general intelligence; what you want instead is an intelligence with priors slanted toward this universe, not one that has to iterate through every hypothesis shorter than it.

Making a machine that's optimal across all universe algorithms means making it very suboptimal for this universe.

That's true. At least I think it is. I can't imagine what a general intelligence that could handle this universe and an anti-Occamian one optimally would look like.

What is "inaccurate prior"? Prior that is not posterior enough, that is state of knowledge based on too little information/evidence? Frequentist connotations.

Good point Vladimir. What phrase would I use to convey not just having too little evidence but having evidence that just happens to be concentrated in a really inconvenient way. Perhaps I'll just go with 'bad priors'. Such as the sort of prior distribution you would have when you had just drawn three red balls out of a jar without replacement, know that the five balls left are red or blue but have no clue that you've just drawn the only three reds. Not so much lacking evidence but having evidence that is bad/pathological/improbable/bad/inconvenient.

Consider the Pathological Decision Challenge.

Omega shows up and presents a Decision Challenge, consisting of some assortment of your favorite decision theory puzzlers. (Newcomb, etc etc etc...)

Unbeknownst to you, however, Omega also has a secret additional test: If the decisions you make are all something OTHER than the normal rational ones, then Omega will pay you some huge superbonus of utilions, vastly dwarfing any cost to loosing all of the individual challenges...

However, Omega also models you and if you would have willingly "failed" HAD YOU KNOWN about the extra challenge above, (but not this extra extra criteria), then you get no bonus for failing everything.

That is not Omega. Omega as presented for the purpose of Newcomblike problems is known, for the sake of the hypothetical, to be trustworthy. He does not deceive us about our utility payoffs. And yes, that does include being technically truthful but leaving off a whole utility-payoff category. If the is not clear to the audience from the description given in the problem definition then the problem definition needs to be more pedantic.

Consider, for example, Vladmir's original definition of counterfactual mugging. He throws in "the Omega is also known to be absolutely honest and trustworthy, no word-twisting, so the facts are really as it says". It should be fairly clear to the reader that 'unbeknownst to you" is to be considered out of scope of the exercise.

If you want a demigod who plays games that happen to involve him making us have an inaccurate knowledge of his arbitrary utility payoffs then you need to invent a new name.

None of Parfit's Hitch-hiker, Prisoner's Dilemma, Newcomb's or Counterfactual Mugging rely on the kind of 'payoff for being irrational' difficulties you present. They are all instances where a decision algorithm that wins will also win in the regular situations that they caricaturize.

[-][anonymous]14y2

Omega really needs to stop killing copies of me.

It's just not right.

The reason this seems wrong is because you're leaving out a crucial step: generating a probability distribution over the set of worlds you could be in, predicated on what you've seen already. When you meet a supposed Omega, you should start off believing that he's a human trying to fool you, which means that you should not pay up in counterfactual mugging and should two-box in Newcomb's problem. It takes an extraordinary amount of evidence to dislodge that conclusion even a little. Only after you have that evidence can you accurately map out the remaining possibilities. Either Omega gives you the outcome based on your decision as he claims, he gives you a fixed outcome regardless of your decision, or he deviates the algorithm he claims to follow in an obvious way (he doesn't pay up when he claims he would), or he deviates from the algorithm he claims to follow in a non-obvious way (such as an easter-egg bonus or penalty given when your behavior matches a particular pattern). Once you have probabilities for each of those, you just plug the scenarios into your utility function, multiply by the probability, sum up the expected utility for each decision algorithm, and follow whichever gives the highest expected utility.

I've had similar concerns but haven't gotten around to writing anything about it here - mostly because I figured it had already been addressed in hundreds of comments that have been posted on these topics, and I didn't want to go through them all.

Here's my basic thoughts on the subject. When we are constructing our decision theory, we need to make sure we setting ourselves up to optimize on the correct domain. If Omega exists and is actually a trickster, specifically good at fooling you into believing he's not Omega, but is rather Epsilon, the younger, smaller, and more honest of the two brothers and with a taste for your money, then designing your decision theory to be susceptible to being counterfactually mugged is a bad idea. Though I suppose I'm splitting hairs, because how would be choose between decision theories? A meta-decision theory? I guess that brings us back to the simultaneously insightful and vague "rational agents win!"

Some situations you have to consider, but not necessarily bind to predetermined actions. If the right balance is to miss out on this one particular situation, that's the solution you assign to the situation, once you consider it. Rigor of the theory is in ability to mechanistically comprehend all of its domain: if there are dark corners where the answer is unclear, a new theory has to resolve that perplexity, one way or the other.