The sin of updating when you can change whether you exist

(I'm posting this comment under this public throwaway account for emotional reasons.)

What made you think the torture hypothetical was a good idea? When I read a math book, I expect to read about math. If the writer insists on using the most horrifying example possible to illustrate their math for no reason at all, that's extremely rude. Granted, sometimes there might a good reason to mention horrifying things, like if you're explicitly making a point about scope-insensitive moral intuitions being untrustworthy. But if you're just doing some ordinary mundane decision theory, you can just say "lose $50," or "utility −10." Then your more sensitive readers might actually learn some new math instead of wasting time and energy desperately trying not to ask whether life is worth living for fear that the answer is obviously that it isn't.

[-]Benya12y150

Thank you for the feedback, and sorry for causing you distress! I genuinely did not take into consideration that this choice could cause distress, and it could have occurred to me, and I apologize.

On how I came to think that it might be a good idea (as opposed to missing that it might be a bad idea): While there's math in this post, the point is really the philosophy rather than the math (whose role is just to help thinking more clearly about the philosophy, e.g. to see that PBDT fails in the same way as NBDT on this example). The original counterfactual mugging was phrased in terms of dollars, and one thing I wondered about in the early discussions was whether thinking in terms of these low stakes made people think differently than they would if something really important was at stake. I'm reconstructing, it's been a while, but I believe that's what made me rephrase it in terms of the whole world being at stake. Later, I chose the torture as something that, on a scale I'd reflectively endorse (as opposed, I acknowledge, actual psychology), is much less important than the fate of the world, but still important. But I entirely agree that for the purposes of this post, "paying $1" (any small negative effect) would have made the point just as well.

[-]Eliezer Yudkowsky12y90

Yes, I now regret making this mistake at the dawn of the site and regret more having sneezed the mistake onto other people.

[-]Shmi12y10

Had you used small and large numbers instead of the terms torture and dust specks, the whole post would have been trivial. I learned a fair bit about my own thinking in the aftermath of reading that infamous post, and I suspect I am not the only one. I even intentionally used politically charged terms in my own post.

[-]ChrisHallquist12y20

Username explicitly linked to torture vs. dust specks as a case where it makes sense to use torture as an example. Username is just objecting to using torture for general decision theory examples where there's no particular reason to use that example.

[-]Shmi12y10

If hypothetical torture is a trigger for you, you are probably reading a wrong site.

[-]Scott Garrabrant12y90

That doesn't change the fact that there is no reason to involve torture in this thought experiment. (or most of the thought experiments we put it in)

I think as a general rule, we should try to frame problems into positive utility as opposed to negative utility, unless we have a reason not to.

One reason for this is that people feel guilt for participating in a thought experiment where they have to choose between two bad things, and they do not feel the same guilt for choosing between two good things. Another is that people might have a feeling like they have a moral obligation to avoid certain bad scenarios no matter what, and this might interfere with their ability to compare them to other things rationally. I do not think that people often have the same feeling of moral obligation to irrationally always seek a certain good scenario.

[-]Username12y20

Probably.

[-]Squark12y70

Is there a difference of principle between this scenario and smoking lesion?

[-]Wei Dai12y30

There is a much more principled possibility, which I'll call pseudo-Bayesian decision theory, or PBDT. PBDT can be seen as re-interpreting updating as saying that you're indifferent about what happens in possible worlds in which you don't exist as a conscious observer, rather than ruling out those worlds as impossible given your evidence.

I can see why you'd rule out being completely indifferent about what happens in possible worlds in which you don't exist, but what about something in between being fully updateless and fully updateful? What if you cared less (but isn't completely indifferent) about worlds in which you don't exist as a conscious observer, or perhaps there are two parts to your utility function, an other-regarding part which is updateless and a self-regarding part which changes as you make observations?

Suppose you face a counterfactual mugging where the dollar amounts are $101 and $100 instead of the standard $10000 and $100. If you're fully updateless then you'd still pay up, but if you cared less about the other world (or just the version of yourself in the other world) then you wouldn't pay. Being fully updateless seems problematic for reasons I explained in Where do selfish values come from? so I'm forced to consider the latter as a possibility.

[-]Johannes Treutlein9y20

Imagine that Omega tells you that it threw its coin a million years ago, and would have turned the sky green if it had landed the other way. Back in 2010, I wrote a post arguing that in this sort of situation, since you've always seen the sky being blue, and every other human being has also always seen the sky being blue, everyone has always had enough information to conclude that there's no benefit from paying up in this particular counterfactual mugging, and so there hasn't ever been any incentive to self-modify into an agent that would pay up ... and so you shouldn't.

I think this sort of reasoning doesn't work if you also have a precommitment regarding logical facts. Then you know the sky is blue, but you don't know what that implies. When Omega informs you about the logical connection between sky color, your actions, and your payoff, then you won't update on this logical fact. This information is one implication away from the logical prior you precommitted yourself to. And the best policy given this prior, which contains information about sky color, but not about this blackmail, is not to pay: not paying will a priori just change the situation in which you will be blackmailed (hence, what blue sky color means), but not the probability of a positive intelligence explosion in the first place. Knowing or not knowing the color of the sky doesn't make a difference, as long as we don't know what it implies.

(HT Lauro Langosco for pointing this out to me.)

[-]FeepingCreature12y10

I think the "transparent Newcomb" version of the blue-sky scenario is more well-known? Might want to point that out.

There is a much more principled possibility, which I'll call pseudo-Bayesian decision theory, or PBDT. PBDT can be seen as re-interpreting updating as saying that you're indifferent about what happens in possible worlds in which you don't exist as a conscious observer, rather than ruling out those worlds as impossible given your evidence.

I can't follow the math but this feels like it'd inexorably lead to Quantum Suicide.

[-]lmm12y01

The naive answer to that situation still sounds correct to me. (At least, if one is to be consistent with the notion that one-boxing is the correct answer in the original Newcomb paradox).

[-]Squark12y90

To see the difference between these two scenarios, ask the following question: "what policy should I precommit to before the whole story unravels?" In Newcomb, you should clearly precommit to one-boxing: it causes Omega to put lots of money in the first box. Here, precommitting to push the button is BAD: it doesn't influence FOOM vs. DOOM, only influences in which scenario Omega hands you the button + whether you get tortured.

[-]Shmi12y-20

I asked a question in the l-zombies thread, but got no reply. Maybe you can answer here.

[-]Scott Garrabrant12y20

It does not seem to me like he is referring to the "finished program" reference class at all. There is a class of l-zombies, and finished programs that are never run are in this class, but so are unfinished programs. The finished programs are just members of the l-zombies reference class that are particularly easy to point to.

[-]Benya12y00

Sorry about that; I've had limited time to spend on this, and have mostly come down on the side of trying to get more of my previous thinking out there rather than replying to comments. (It's a tradeoff where neither of the options is good, but I'll try to at least improve my number of replies.) I've replied there. (Actually, now that I spent some time writing that reply, I realize that I should probably just have pointed to Coscott's existing reply in this thread.)

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

17

The sin of updating when you can change whether you exist

17

17