Tendencies in reflective equilibrium

[-]Raemon14y210

Fun fact: A fellow Rationalist and I were doing Rejection Therapy. My friend chose to do Pascal's Mugging (the positive version - if you give me $5 now, a package of $50000 will appear at your doorstep tomorrow morning).

The subject came extremely close to actually giving him the $5, even though the subject only had five dollars and needed it to get home. (My friend added that a cab would arrive in five minutes if he waited at a particular intersection and take him home for free). He only stopped when I burst out laughing. (It took maybe a 5-10 minute conversation to build up to that point)

We talked to him about it afterwards to ask about his motivations. He said the logic made sense to him and my friend did a good job of maintaining the persona.

7steven046114y

I hope your friend never planned to actually accept the $5.

3Raemon14y

I'm not sure. It wasn't an event he had planned for.

3Darmani14y

I wouldn't have. The negative feelings from accepting the $5 would greatly outweigh the monetary value, even though I knew I almost certainly would never see the subject again... ...and would have been wrong; I ran into him last week.

3utilitymonster14y

Best rejection therapy ever.

0[anonymous]14y

OMG, how did this get here I am not good with computer.

[-]MBlume14y170

I don't know if anyone knows exactly what Bob is doing, but at a stab, he's seeing how many unpleasant feelings get generated by imagining the crime, then proposing a jail sentence that activates about an equal amount of unpleasant feelings. If the thought of a homeless man makes images of crime more readily available and so increases the unpleasant feelings, things won't go well for the homeless man.

To defend poor Bob for a moment, it's worth noting that we don't respond to numbers well in a vacuum. A theft involving a hedge fund manager invokes a frame in which a million dollars isn't that much. A theft involving a homeless person invokes a frame in which a thousand dollars is a lot. I suspect that this magnitude distortion explains more of Bob's behavior than general negative affect towards homeless people.

ETA: Both to mitigate that annoying LW effect where the top-voted comment on an excellent article is always a correction or quibble, and just because it's plain true, I should add that I'm thoroughly enjoying this sequence, and that my rate-of-checking-LW has risen sharply over the last couple weeks since I've been looking for another installment.

1Alexei14y

One of the ways to be more correct is to use frameworks of reasoning rather than your intuition. When you see a question like: "What sounds like a fair punishment for a homeless man who steals $1,000?", you should quickly create a framework for answering questions like that. Yvain's example for that kind of framework is "jail time = (10 * amount stolen)/net worth". This significantly helps to anyone be more consistent.

3JGWeissman14y

If you want to not just be consistant, but consistantly reflect your preferences (or reflective equilibrium of tendencies), you should validate your framework against a wide range of hypotheticals in the domain before actually using it in the specific case that prompted you to create it. (Or try to meet the higher criteria of consistancy, not just that you judgments on a sequence of situations are consistant with each other, but that they are also consistant with the judgements made by a copy of you who sees the situations in a different order.)

2Vladimir_Nesov14y

It's "consistent".

[-]JGWeissman14y220

At least I spelled it the same way every time ;)

2Alexei14y

Absolutely. You start with a framework of reasoning and you make it less wrong. :)

[-]endoself14y150

I have not yet accepted that consistency is always the best course in every situation. For example, in Pascal's Mugging, a random person threatens to take away a zillion units of utility if you don't pay them $5. The probability they can make good on their threat is miniscule, but by multiplying out by the size of the threat, it still ought to motivate you to give the money. Some belief has to give - the belief that multiplication works, the belief that I shouldn't pay the money, or the belief that I should be consistent all the time - and right now, consistency seems like the weakest link in the chain.

No, no, no!

There are an infinite number of possible Pascal's muggings, but people only look at them one at a time. Why don't you keep the $5 in case you need it for the next Pascal's mugger who offers you 2^zillion units of utility? That is a much better bet if you only look at those two possible muggings.

The real problem is that utility functions, as we calculate them now, do not converge. This is a reason to be confused, not a reason to bite such ridiculous bullets.

6wedrifid14y

(As you acknowledge, but with more emphasis) this is an excuse, not a real reason. You do not really care about having money primarily so that you can be prepared for the next pascal's mugger. (Completing the pattern associated with Pascal's Wager does not fit here.)

4endoself14y

For me, this is an acknowledgement of confusion, not an excuse. I think that finding a decision theory that can make sense of this is extremely important and I try to act accordingly.

1wedrifid14y

I would call the other half of what you had to say the confusing part - liked the linked paper by the way. It's the 'but you need to save it for other possible muggings' would be straightforward game theory if the confusing part didn't happen before we even got to 'which mugger do we pay?' considerations.

5endoself14y

I agree; that was just an intuition pump to demonstrate the absurdity of only considering one mugger. EDIT: I think of this intuition pump as very persuasive because it is part of how I came to this conclusion in the first place.

[-]Darmani14y141

My tendency is to assume that the homeless man would steal the $1000 via violent means, whereas the hedge fund manager would steal the $1 million using nonviolent deception. In addition to a belief that violent crime is actually worse, there is also the bias that it is easier to visualize. A homeless man stealing $1000 looks like a man pointing a gun at a cashier. A hedge fund manager stealing $1 million looks like a guy at a computer with a spreadsheet open.

Of course, I work at a hedge fund manager right now, so I have additional biases.

[-]MixedNuts14y140

I have not yet accepted that consistency is always the best course in every situation. For example, in Pascal's Mugging, a random person threatens to take away a zillion units of utility if you don't pay them $5. The probability they can make good on their threat is miniscule, but by multiplying out by the size of the threat, it still ought to motivate you to give the money. Some belief has to give - the belief that multiplication works, the belief that I shouldn't pay the money, or the belief that I should be consistent all the time - and right now, consistency seems like the weakest link in the chain.

Not upvoted, for this paragraph. You can't become right by removing beliefs at random until the remaining belief pool is consistent, but if you're right then you must be consistent.

Why does some belief have to give, if you reject consistency? If you're going to be inconsistent, why not inconsistently be consistent as well?

Also, you are attempting to be humorous by including beliefs like "multiplication works", but not beliefs like "at the 3^^^3rd murder, I'm still horrified" or "Solomonoff induction works", right?

We are but humble bounded rationalists, who have to use heuritistic soup, so we might have to be inconsistent at times. But to say that even after careful recomputation on perfectly formalized toy problems, we don't have to be consistent? Oh, come on!

5cousin_it14y

Agreed. Here's an idea that just occurred to me: you could replace Solomonoff induction with a more arbitrary prior (interpreted as "degree of caring" like Wei Dai suggests) and hand-tune your degree of caring for huge/unfair universes so Pascal's mugging stops working. Informally, you could value your money more in universes that don't contain omnipotent muggers. This approach still feels unsatisfactory, but I don't remember it suggested before...

1Nisan14y

Is this different from jimrandomh's proposal to penalize the prior probability of events of utility of large magnitude, or komponisto's proposal to penalize the utility?

3Spurlock14y

For weakest implicit belief, I think I would have nominated "That I have the slightest idea how to properly calculate the probability of the mugger following through on his/her threat". Also, Torture vs. Specks seems like another instance where many of us are willing to sacrifice apparent consistency. Most coherent formulations of utilitarianism must choose torture, yet many utilitarians are hesitant to do so. In both cases, it seems like what we're doing isn't abandoning consistency, but admitting to the possibility that our consistent formula (e.g. naive utilitarianism) isn't necessarily the optimal / subjectively best / most reflectively equilibrial one. We therefore may choose to abandon it in favor of the intuitive answer (don't pay the mugger, choose specks, etc), not because we choose to be inconsistent, but because we predict the existence of a Better But Still Consistent Formula not yet known to us. Of course, as Yvain notes, we can take pretty much any set of arbitrary preferences and create a "consistent" formula by adding enough terms to the equation. The difference is that the Better But Unknown formula above is both consistent and something we'd be in reflective equilibrium about.

1TrE14y

By "Dust vs. Specks" you surely mean "torture vs. dust specks", and with "Specks", you want to say "torture", don't you?

0Spurlock14y

Fixed thanks. But no, I meant specks. It seems like utilitarianism (as opposed to just typical intuitive morality) commands you to inflict Torture. You only want to choose specks because your brain doesn't multiply properly, etc. Of course, not everyone agrees that Utilitarianism picks Torture, but the argument for Torture is certainly a utilitarian one. So in this case picking Specks anyway seems like a case of overriding (at least naive versions of) utilitarianism.

0TrE14y

Wait... Are you sure that should be specks? If so, I am confused.

1Spurlock14y

Wow. Sorry, you're obviously right. Brain totally misfired on me I guess.

[-]Jonathan_Graehl14y50

There are several good reasons why I should not assign a probability of 66% to heads and 66% to tails, but one of the clearest is this: you can make me a bet that I will give you $2 if it lands on tails and you give me $1 if it lands on heads, and then a second bet where I give you $2 if it lands on heads and you give me $1 if it lands on tails.

Got it.

Whichever way the coin lands, I owe you $1 and you owe me $2 - I have gained a free dollar.

Huh? You swapped "you" for "I" here (compared to above).

[-]Wei Dai14y40

I'm still confused about what point Yvain might be making by substituting "tendency" for "intuition" in this formulation of reflective equilibrium. I can think of two possibilities, but neither of them seems like something he might endorse.

When we reflect on what we really want, we should take into consideration not just our intuitions, but our behavioral tendencies. (But Yvain previously wrote "NO NEGOTIATION WITH UNCONSCIOUS".)
After we've reached reflective equilibrium, our behavioral tendencies can be said to be our pref

... (read more)

0Scott Alexander14y

I didn't have any particular interesting agenda for that word choice. If I had to justify it, I would say that to me "intuition" implies a belief (for example, I have an intuition that people who steal more money ought to be punished) and "tendency" implies an action (for example, when asked how much to punish a thief, I might respond "five years"). I am trying to carefully avoid language that implies the existence of beliefs, not because I have strong opinions on the matter but because I'm unsure.

[-]Wei Dai14y40

Reflective equilibrium is usually described in terms of "considered judgments" or "intuitions". (Your own FAQ uses "intuitions".) Do we gain any new insights (or other benefits) from thinking about reflective equilibrium in terms of "tendencies" instead?

[-]Eliezer Yudkowsky14y30

I don't think that Pascal's Mugging puts pressure on Bayesianism, I think it puts pressure on Solomonoff-type priors - Robin's anthropic answer is the one I currently find most appealing. The Lifespan Dilemma puts a lot more pressure on EU, in my book.

3cousin_it14y

But it doesn't seem to address the case where the mugger threatens to torture 3^^^^3 kittens...

[-]DanielLC14y30

Valuing consistency is silly. If someone suggests putting one thief in jail and letting another go free, you won't object because it's inconsistent. You'll either object because you don't think thieves should go to jail, or because you don't think the should go free. Inconsistency just makes it easier to give a reason why it's wrong. You don't need to know whether or not a given person thinks thieves should be jailed to convince them that that isn't the best thing to do.

If you don't accept Pascal's mugging, you have to have some reason for it. The same goe... (read more)

[-]lessdazed14y30

by multiplying out by the size of the threat, it still ought to motivate you to give the money. Some belief has to give - the belief that multiplication works, the belief that I shouldn't pay the money, or the belief that I should be consistent all the time - and right now, consistency seems like the weakest link in the chain.

What gives is the belief that by multiplying out by the size of the threat, it still ought to motivate me to give the money. Multiplication works, I shouldn't pay the money, and I should be consistent.

2Nornagest14y

I think this is probably the sanest answer that doesn't throw out consistency, but there are still some distinctly weird things about it. To motivate you not to give up money, a threat to inflict $RIDICULOUSNUMBER units of disutility has to be proportionately incredible -- but there's no particular reason to think that disutility is even roughly linear in 1/credibility, and a number of reasons not to. Straight multiplication also suggests that for any fixed ridiculous threat there's always some amount of money that a rational agent will be willing to pay to ward it off, but I think I'd be more comfortable biting that bullet.

9lessdazed14y

It's not at all obvious that someone threatening to inflict disutility if I don't comply with certain demands would treat me worse if I don't comply with the demands than if I do. One can't simply say "It is rational to one box on Newcomb's problem", because one might live in a universe in which an entity, say Sampi (if not Omega itself) executes one boxers painfully and rewards two-boxers. The possibility that someone will inflict $RIDICULOUSNUMBER units of disutility on me is as latent in the question "give me money or I will inflict $RIDICULOUSNUMBER units of disutility on you" as it is in the question "paper or plastic", and not because it's possible bag choice will have a significant impact on my life. If I can't distinguish the credibility of the threat (that the speaker can and will act as they say) from zero, then I can't distinguish it from the opposite outcome, that they will act opposite of as they say, as I cant distinguish the possibility of the opposite outcome from zero. On a personal note, the night before last, I had a wild dream (no laws of physics were violated, not so much laws of congress) that ended similar to how the movie "The Game" starring Michael Douglas ended. Well, it actually ended with me waking up - which is even more to the point. Thins oughtn't be simply accepted at face value. Today I misread the following no fewer than two times, I think three times though I cant swear to that: I read the first sentence as: "Marc Hauser, the primatologist psychologist at Harvard who recently accused me of mistreating evidence and graduate students, has resigned." That made less and less sense as the post went on, so I took it from the top several time until I finally caught my error. It's far more likely that I am misunderstanding someone threatening $RIDICULOUSNUMBER than that they can carry out their threat, and also more likely that I'll misspeak and say "yes" when I mean no, and say "no" when I mean yes, or mistakenly hand over a one dol

0torekp14y

In support of this point, I'd like to point out that the ridiculous powers required to inflict $RIDICULOUSNUMBER sanctions are so far removed from our experience, that we have no idea how such an agent could be expected to act. It could do the opposite of what it claims (perhaps it hates cowards) as easily as fulfill its threats, given that we know nothing of its motives.

[-]Unnamed14y20

I don't know if anyone knows exactly what Bob is doing, but at a stab, he's seeing how many unpleasant feelings get generated by imagining the crime, then proposing a jail sentence that activates about an equal amount of unpleasant feelings.

See the outrage heuristic, Kahneman & Frederick (2002) (pdf).

[-]Vladimir_Nesov14y20

Any tendency that has reached reflective equilibrium in your current state is about as close to a preference as you're going to get.

But if you know your destination, you're already there. In principle, there is no need to wait for a tendency to manifest, or even to require that the conditions making the tendency manifest ever hold, if you know the way it'd go (not that you should just step back and watch). There are also one-off decisions that require knowing what to do this one time, where the intuition about reflective equilibrium applies less, and it... (read more)

0Will_Newsome14y

Qualitatively speaking it might be worth making this distinction, but algorithmically speaking---from a superintelligence's perspective, say, or a decision theory researcher's---I can't see any good reasons why there would be discrete changes or fundamental conceptual differences between the levels of abstraction. This lack of rigid partitions might also be desirable e.g. if 95% of your decision algorithm suddenly gets erased and you want to infer as much as possible from the remaining 5%; not only the lost "utility function" terms but also the meta-level implicit patterns as well as the highest level implicit decision theoretic policy, ideally using each of those as information to reconstruct the others even in the event of their complete annihilation. (You'd have to do this anyway if all you had left was some fragment of a time-stamped UDT policy lookup table (branch table?).) ETA: To motivate even thinking about the problem of corrupted hardware a little more, imagine that an agent is running XDT and is trying to make sense of humans' (or humanity's, humanity's ancestors, God's, consciousnesses-trapped-in-rocks's)... we'll call them 'decision policy-ish-like thingies', but the "creator"-bound XDT agent only has partial information about any of many of its potential creators for any of many plausible reasons. Also there is the more philosophical motivation of re-thinking the 'reasoning' that was done by the environment/universe in the process of creating our values---genetic/memetic evolution, atmospheric accidents, falling into "wrong"-in-hindsight attractors generally, and basically all causal chains or logical (teleological) properties of the universe that "resulted" at least partially in humans having their "current" values. Thinking things through from first principles, taking the idea of avoiding lost purposes to its logical conclusion (or non-conclusion)---not just searching for causal validity and not accepting even as an "initial dynamic" whatever poin

-1Will_Newsome14y

(Conservation of expected [moral] evidence.)

[-]quwgri1y10

Pascal's Mugging.
The problem is that the probability "if I don't pay this person five dollars, there will be a zillion sufferings in the world" existed before this person told you about it.
This probability has always existed.
Just as the probability "if I pay this person five dollars, there will be a zillion sufferings in the world" has always existed.
Just as the probability "if I raise my right hand, the universe will disappear" has always existed.
Just as the probability "if I don't raise my right hand, the universe will disappear" has always existed.
You c... (read more)

[-]Vaniver14y10

...except that the Dutch book itself assumes consistency. If I believe that there is a 66% chance of it landing on heads, but refuse to take a bet at 2:1 odds - or even at 1.5:1 odds even though I should think it's easy money! - then I can't be Dutch booked. I am literally too stupid to be tricked effectively. You would think this wouldn't happen too often, since people would need to construct an accurate mental model to know when they should refuse such a bet, and such an accurate model would tell them they should revise their probabilities - but time af

... (read more)

2Kaj_Sotala14y

People may simultaneously have contradicting beliefs. The mind is not one unified entity: one part of it can believe in X, while another believes in not-X. Refusing the bet may simply mean that the part of them which is in control of behavior at that particular moment doesn't believe there's a 66% chance. It doesn't mean that some other part of them might not genuinely believe there's a 66% chance, and that part may be in control in other situations.

2SarahNibs14y

They have an injunction against explicit gambling, which is not a bad idea when you're inconsistent. And their injunction isn't always explicit, either.

2jimmy14y

I wonder if that's true. Next time I'm in that situation, I'm going to offer up a few dollars on a bet that is genuinely and obviously good for the other person just to see if they're smart enough to take it.

2SarahNibs14y

I finally had this come up naturally. I offered a coin flip, my $15 against his $10. He declined. Then I offered two coinflips - if both were heads, I got his $10, otherwise he got my $15. He declined. He has an explicit injunction against gambling. In the second case, he said "Well that I would accept" and I asked "Okay, do you accept it? Because I'm offering" and he said "...no, I'd feel bad if I took your $15, or if I lost my $10". (paraphrased)

1jimmy14y

Thanks for reminding me! I found a person that claimed an injunction against it as well, but I decided to put it off to see if I could get him to make the bet when he wasn't self primed with his injunction against gambling. He said "I don't like gambling", and then claimed nonlinear utility at bets risking $10, but he accepted $1.5 risking $1. I won :)

0SarahNibs14y

Good point, I should downvote my post for claiming too much generality. I'll do the same and report. :D

1MixedNuts14y

They don't have a single consistent belief it's 0.66. From each decision they make you can infer a belief, but you'll soon notice it's not consistent, though it may be stable within some conditions. Maybe they always act like it's 0.66 when it's the first bet, but like it's 0.33 when they're offered a second one.

[-]Richard_Kennaway14y00

For example, in Pascal's Mugging, a random person threatens to take away a zillion units of utility if you don't pay them $5. The probability they can make good on their threat is miniscule, but by multiplying out by the size of the threat, it still ought to motivate you to give the money.

Why? Hasn't this been gone over before? Tiny number * big number = not determined by the words "tiny" and "big".

0Nisan14y

Could you explain your position a bit more? ETA: Ah, RichardKennaway expressed his position on Pascal's Mugging here.

-3Alexei14y

This has been gone over before and the result is that you should give the $5 dollars, because 3^^^3 is just a ridiculously huge number. EDIT: What I meant to say was completely opposite to what I said. You should not give the $5 dollars, even though 3^^^3 is just a ridiculously huge number.

3CarlShulman14y

It has been gone over before and the result is that one shouldn't give money to such a mugger. See this comment and following discussion. First, you probably have a bounded utility function (to the extent you have a utility function at all), so it's impossible for the mugger to actually offer that much utility (there is a tiny probability of being able to deliver many happy life-years or the like, but that's different from the utility of your personal utility function). Second, if we increase the bound of the utility function, for any particular bound (no matter how high) there will be alternatives more likely to deliver vast utility than giving in to the mugger (conditional on vast utility being attainable, it is very unlikely the mugger's obviously bogus offer is a good use of funds).

0Alexei14y

Hah! Oh the tragedy of a simple typo. I meant to type 'not give', and 'even though'. Wow, I hate when I accidentally say the opposite of what I wanted.

2Richard_Kennaway14y

The result of certain decision theories is to say that. That is a problem with those theories. That it is a problem, is the gist of the posting you cited.

2DanielLC14y

I think it's more accurate to say that it's a ridiculously round number. That is, it's both huge and simple. If someone tried to mug you with a random number between 3^^^3 and 3^^^^3, you wouldn't take it, since that number is as complex, and therefor unlikely, as it is big. Edit: I changed my mind on this. The unlikeliness would come from him stating the number. Once he does that, the number is now very simple. Namely: it's the number he just stated. That said, the paradox from expected utility not converging is just due to the round ones.

0Alexei14y

I don't think it really matters at that point. I would not treat the situation differently if the mugger said "3^^^3" or if he explicitly stated some number "34084549...843".

9wedrifid14y

I would pay $5 to not have to listen to the mugger explicitly state a number that long.

1TheOtherDave14y

I once offered a similar deal to a tuba player on a subway platform.

4JGWeissman14y

I don't think you are appreciating the complexity penalty of the (presumably not very compressible) data hidden behind that ellipses, if the number is meant to be on the order fo 3^^^3.

-2Alexei14y

Well, see, I would disagree with your presumption. The data might look random to you, but I could just point out that all the digits are actually taken from PI, starting with 3^^3rd digit. That simplifies the complexity tremendously. Or I could say I got those digits randomly. That again simplifies the complexity, because generating that number was simple.

3JGWeissman14y

If my presumption that the digits are not very compressible is wrong, then you have not really responded to Daniel's point about the ridiculous roundness of the number (where roundness is one way a number can be compressible). No. Getting "random" digits is not simple, or even an available action, for a deterministic generator. Saying to get "random" data can feel simple because you are just pointing at some source of data that you are ignorant about, but really, you have to account for the complexity of that source of data.

0Hul-Gil14y

I wonder how many $5 transfers I would get if I actually tried this sort of mugging on LessWrong. Physical proximity isn't required, after all. Would you, or anyone, actually Paypal me $5 if I made the zillion-units threat? Or is this a case of intellectual acceptance, emotional reluctance?

2endoself14y

Please don't do this. We don't want to drive people who take their beliefs seriously away from the site.

6wedrifid14y

I don't mind if he does. It will encourage those who have silly beliefs to think them through a bit more clearly. Mind you I would advise against making the threat. Because the rational response to threats is not necessarily compliance.

1Hul-Gil14y

You think that would drive people away from the site? You and wedrifid seem to take this astonishingly seriously. I thought it was clear I was merely musing about whether or not people really accepted Alexei's conclusion. The mugging's already been done by someone else, apparently, in any case; and I wasn't saying I'd do it, only that I wondered if people (Alexei) really believed paying up was the most rational response. See his edit, which makes it a moot point.

3wedrifid14y

I think your 'seriousness' evaluator is somewhat broken. It is generally frowned upon to make try his kind of thing explicit even when hypothetical but try to imagine the kind of actions I would take if I thought you actually represented a zillion-unit threat. Hint: they do not include blog comments.

0Hul-Gil14y

I don't understand the first sentence. Are you saying you were just being facetious in your advice? Unless you were, I think my seriousness evaluation is just fine. Your responses may not be serious compared to, say, an actual zillion-unit threat-response, but I am surprised that you'd bring up the possibility of the latter at all. I understood your advisory quite well; what was somewhat astonishing was that you apparently felt someone on LessWrong might take a fanciful version of a fanciful thought experiment seriously enough to engage in a "non-blog-comment rational response"! (!) It was also surprising to see that endoself felt people would be bothered enough by a Pascal's Mugging to leave the site. These on top of the fact that I had no intention of actually posting a mugging, and meant my post to be a mere musing ("I wonder if ... Would anyone") on intellectual vs actual acceptance. It is I who do not take this seriously enough, it appears! Though I myself have no intent to actually attempt said mugging, as stated before, I will point out that according to CarlShulman, user TimFreeman has already done so, and no shitstorm ensued... AFAIK.

1CarlShulman14y

None, the last time this was done by TimFreeman. And there's no plausible set of assumptions under which paying the $5 is better than alternative uses of the money. See the comment linked to above.

0Hul-Gil14y

Ah, someone beat me to it, I see. Not a single transfer, eh? I agree re: the advisability of paying up; Alexei's comment led me to believe he thought the opposite, but I see from his edit he agrees too.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

51

Tendencies in reflective equilibrium

51

51