Anthropic Decision Theory IV: Solving Selfish and Average-Utilitarian Sleeping Beauty

[-]Manfred14y10

The altruistic average utilitarian runs into the same collective-decision-making correction that makes Psy-kosh's non-anthropic problem so hard.

What would the non-anthropic problem look like with selfish agents rather than average utilitarian ones?

First, the coordinator flips a coin, and randomly chooses people to be deciders from a group of 10 people.

If heads -> 1 decider is told to decide.

If tails -> 9 deciders are told to decide.

One decider has payoffs (yea -> $100, nay-> $700). 9 deciders have individual payoffs (yea -> $1000, nay-> $700).

If during this game, you are told that you are a decider but not how the coin landed, how would you (I'm mostly asking Stuart here) go about using UDT to (note - edited) tell you the value of a ticket to this game? Is this ticket price the same or different from the ticket price in the isomorphic average-utilitarian version?

[-]Stuart_Armstrong14y00

Yes, selfish agents are more problematic than average utilitarians. They have precommitment issues: what they decide before knowing who they are is not what they would decide after - this does, however, leave them open to money pumps, so you could argue that they should go the UDT or precommitment route.

If they don't have access to precommitments, then the discontinuity that happens when they learn who they are means that they no longer behave the same way as average utilitarians would (whos utility function doesn't change when they find out who they are).

[-]Manfred14y00

Okay. So, going the UDT route, what are the prices people would pay? (also known as the "correct" route - done by choosing the optimal strategy, not just the best available action)

In the selfish non-anthropic problem, we evaluate the payoff of the strategies "always yea" and "always nay." If heads (0.5), and if picked as decider (0.1), yea gives 100 and nay 700. If tails (0.5) and picked as decider (0.9), yea gives 1000 and nay 700. Adding these expected utilities gives an expected payoff of 455 from "always yea" and an expected payoff of 350 from "always nay." (corrected)

However, in the "isomorphic" altruistic case, you don't actually care about your personal reward - you simply care about the global result. Thus if heads (0.5), "always yea" always gives 100 and "always nay" always gives 700. And if tails, "always yea" always gives 1000 and "always nay" always gives 700. So in that case the payoffs are "yea" 550 and "nay" 700.

So this "isomorphic" stuff doesn't look so isomorphic.

[-]Stuart_Armstrong14y20

In the selfish case, you forgot the 0.5: the payoff is 455 for "always yea", 350 for "always nay".

And you seem to be comparing selfish with selfless, not with average utilitarian.

For an average utilitarian, under "always yea", 100 is given out once in the heads world, and 1000 is given out 9 times in the tails world. These must be shared among 10 people, so the average is 0.5(100+1000x9)/10=455. For "always nay", 700 is give out once in the heads world, and 9 times in the tails world, giving 0.5(700 + 9x700)/10=350, same as for the selfish agent.

[-]Manfred14y20

Ah, good point. I made a mistake in translating the problem into selfish terms. In fact, that might actually solve the non-anthropic problem...

EDIT: Nope.

[-]Stuart_Armstrong14y00

Why nope? ADT (with precommitements) simplifies to a version of UDT in non-anthropic situations.

[-]Manfred14y00

The reason it doesn't solve the problem is because the people who want to donate to charity aren't doing it so that the other people also participating in the game will get utility - that is, they're altrusits, but not average utilitarians towards the other players. So the formulation is a little more complicated.

[-]Stuart_Armstrong14y00

They're selfless, and have coordinated decisions with precommitments - ADT will then recreate the UDT formulation, since there are no anthropic issues to worry about. ADT + selflessness tends to SIA-like behaviour in the Sleeping beauty problem, which isn't the same as saying ADT says selfless agents should follow SIA.

[-]Manfred14y00

Well, yes, it recreates the UDT solution (or at least it does if it works correctly - I didn't actually check or anything). But the problem was never about just recreating the UDT solution - it's about understanding why the non-UDT solution doesn't work.

[-]Stuart_Armstrong14y00

Because standard decision theory doesn't know how to deal properly with identical agents and common policies?

[-]Manfred14y00

I think I've figured out a good counterexample to your reasoning in the case of the selfish sleeping beauty. The problem is indeed this "isomorphic" stuff. Imagine 3 copies in 3 different worlds, one heads world and two tails worlds. This is isomorphic to the average utilitarian case: the probability of being copy 1, 2 or 3 should not change just by changing the label on "worlds," if all choose tails, they will all get the same outcome average utilitarian case: -x in the heads world and +1 in the two tails worlds. And yet average utilitarian reasoning suggests that in this case, they should pay 0.66 for the bet. The procedure for calculating the expected utility is different from before in a way not covered when we checked what was "isomorphic".

[-]Stuart_Armstrong14y00

Imagine 3 copies in 3 different worlds, one heads world and two tails worlds

What do you mean by that? That there is a second coin toss after the first one if tails comes up? And changing that label to world very much changes what an average utilitarian would care about - if I average utility over the number of people in the world, then who is and isn't in that world is very important.

[-]Manfred14y00

Well, yeah, it's important. That's the point. It's important but it's not one of the criteria used for isomorphism.

As for how to set the problem in 3 worlds, there are a variety of ways that preserve probabilities (or multiply them by an overall factor, which won't change decisions). For example, you could do that second coin toss. An example set of worlds would be HH - 1 person, HT - 0 people, TH - 1 person, TT - 1 person.

Of course, if you don't want to mess with worlds, an equivalent effect would come from changing whether or not the total expected utility is a weighted or unweighted average over worlds, which is also not one of the criteria for isomorphism.

[-]Stuart_Armstrong14y00

Of course, if you don't want to mess with worlds, an equivalent effect would come from changing whether or not the total expected utility is a weighted or unweighted average over worlds, which is also not one of the criteria for isomorphism.

You seem to be arguing that total utilitarians must reach the same conclusions as average utilitarians, which seems tremendously overstrong to be a useful requirement.

[-]Manfred14y00

You seem to be arguing that total utilitarians must reach the same conclusions as average utilitarians

Rather, I'm arguing the reverse - I'm saying that since that's demonstrably false, the isomorphism axiom is false as stated.

[-]Stuart_Armstrong14y00

The isomorphism axiom says that selfish and average utilitarian agents should make the same decisions.

The total vs average utilitarians comparison fails the "same utility outcomes for each possible linked decision". Total utllitarians get more utility than average utilitarians in the tails world (and the same utility in the heads world), so they do not get the same utility outcomes.

[-]Manfred14y10

There are two different ways of having decision-makers act like total utilitarians. One is for them to add the utilities up and then take the unweighted average. Another is for them to be average utilitarians who use weighted averages instead of unweighted averages. The first is not "isomorphic" to average utilitarianism, but the second one is.

The difference between ordinary average utilitarians and these average/total utilitarians is not (at least not explicitly) in the possible decisions, the probabilities, or the utilities. It is the in the algorithm they run, which takes the aforementioned things as inputs and spits out a decision. I'm reminded of this comment of yours.

[-]Stuart_Armstrong14y10

Another is for them to be average utilitarians who use weighted averages instead of unweighted averages

Not sure I get this; the weighted average of 1 and 1, for all weights, is 1. Their sum is 2. Therefore these weighted avereaging agents cannot be total utilitarians.

The point you made in your first comment in this series is relevant. I've strengthened the conditions of the isomorphism axiom in the post to say "same setup", which basically means same possible worlds with same numbers of people in them.

[-]Manfred14y10

Not sure I get this; the weighted average of 1 and 1, for all weights, is 1.

So in the heads world, the average utility is -x. In the tails world, the average utility is 1-x. An unweighted average means that the decision maker goes "and so the utility I evaluate for buying at x is (1-2x)/2". A weighted average means the decision maker goes "and so the utility I evaluate for buying at x is (2-3x)/2".

For an example of a decision maker who takes a weighted average, take the selfish agents in my poorly-modified non-anthropic problem. They multiply the payoff and the probability of the "world coming into existence" (the coin landing heads) to get the payoff to a decider in that world, but weight the average by the frequency with which they're a decider.

[-]Stuart_Armstrong14y00

A good question from Wei Dai:

In the selfish sleeping beauty case, assuming the incubator variant, what does your solution say a Beauty should do if we tell her that she is in Room 1 and ask her what price she would pay for a lottery ticket that pays $1 on Heads?

Selfish agents have problems with precommitments. When you tell them "you are in Room 1", this is not only changing their information, but also changing their utility. To wit:

Before being told where they are, selfish and average utilitarians have identical preferences over outcomes. When they consider the potential "if I was to be told I was in room 1 and offered that lottery ticket for $x, would I take it", they calculate the expected utility of doing so as 1-x in the heads world and -x/2 in the tails world (from the selfish point of view, "in the tails world, there's a 50% chance I won't have to pay, even if I commit to accepting"), hence 1-3/2x in total. So she would precommit to taking the deal if x<2/3.

However, after being told that she is in room 1, the selfish SB's preferences change: her previous utility in the tails world was the average of the utilities of the SBs in room 1 and 2, but now it is entirely the utility of the SB in room 1. Her expected utility is now 1-x -x, so she would only want to take it for x<1/2.

So, if she sticks to her precommitments, she would accept x<2/3; if she breaks her precommitments (leaving her possibly money-pumpable), she would only accept x<1/2.

[-]Wei Dai14y20

Suppose we pull 100 people off the street. We throw a coin and tell 99 subjects how coin actually landed, and tell the remaining subject the opposite. Suppose you hear "heads". How much would you (assuming you're a selfish agent) pay for a ticket that pays $1 on heads?

The intuitive answer is <$0.99, but section 3.3.3 says the answer should be <$0.50. To generalize this example, an agent following section 3.3.3 never "updates" (in the sense of changing their betting odds) on any evidence unless the evidence allows them to rule out being in some possible world, but since we appear to live in an infinite universe (so everything that can happen does happen somewhere), such an agent would never update.

Since section 3.3.3 is both reflectively inconsistent and highly counter-intuitive even in simple straightforward cases, I feel fairly confident that it's not the right solution to "selfish preferences". I think a better direction to look is UDASSA (or AIXI, which treats anthropic reasoning in a similar way), which is more SIA-like. I still don't know how to deal with copying or death (i.e., issues related to quantum immortality) but at least they don't run into so much trouble on these simpler problems.

[-]Stuart_Armstrong14y00

The intuitive answer is <$0.99, but section 3.3.3 says the answer should be <$0.50

? I don't see this at all.

By section 3.3.3, I assume you mean the isomorphism between selfish and average-utilitarian? From an average utilitarian perspective (which is the same as a total utilitarian for fixed populations), buying that ticket for x after hearing "heads" will lose one person x in the tails world, and gain 99 people 1-x in the heads world. So the expected utility is (1/2)(1/100)(-x+99(1-x)), which is positive for x< 99/100.

ADT is supposed to reduce to a simplified version of UDT in non-anthropic situations; I didn't emphasise this aspect, as I know you don't want UDT published.

[-]Wei Dai14y10

? I don't see this at all.

Section 3.3.3 says that a selfish agent should make the same decisions as an average-utilitarian who averages over just the set of people who may be "me", right? That's why it says that in the incubator experiment, a selfish agent who has been told she is in Room 1 should pay 1/2 for the ticket. An average-utilitarian who averages over everyone who exists in a world would pay 2/3 instead.

So in my example, consider an average-utilitarian whose attention is restricted to just people who have heard "heads". Then buying a ticket loses an average of x in the tails world, and gains an average of 1-x in the heads world, so such an restricted-average-utilitarian would pay x<1/2.

(If this is still not making sense, please contact me on Google Chat where we can probably hash it out much more quickly.)

[-]Stuart_Armstrong14y10

We'll talk on google chat. But my preliminary thought is that if you are indeed restricting to those who have heard heads, then you need to make use of the fact that this objectively much more likely to happen in the heads world than in the tails.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

3

Anthropic Decision Theory IV: Solving Selfish and Average-Utilitarian Sleeping Beauty

3

3

Altruistic average utilitarian Sleeping Beauty

Selfish Sleeping Beauty

Summary of results