Anthropic Decision Theory IV: Solving Selfish and Average-Utilitarian Sleeping Beauty

1Manfred

0Stuart_Armstrong

0Manfred

2Stuart_Armstrong

2Manfred

0Stuart_Armstrong

0Manfred

0Stuart_Armstrong

0Manfred

0Stuart_Armstrong

0Manfred

0Stuart_Armstrong

0Manfred

0Stuart_Armstrong

0Manfred

0Stuart_Armstrong

1Manfred

1Stuart_Armstrong

1Manfred

0Stuart_Armstrong

2Wei Dai

0Stuart_Armstrong

1Wei Dai

1Stuart_Armstrong

New Comment

The altruistic average utilitarian runs into the same collective-decision-making correction that makes Psy-kosh's non-anthropic problem so hard.

What would the non-anthropic problem look like with selfish agents rather than average utilitarian ones?

First, the coordinator flips a coin, and randomly chooses people to be deciders from a group of 10 people.

If heads -> 1 decider is told to decide.

If tails -> 9 deciders are told to decide.

One decider has payoffs (yea -> $100, nay-> $700). 9 deciders have *individual* payoffs (yea -> $1000, nay-> $700).

If during this game, you are told that you are a decider but not how the coin landed, how would you (I'm mostly asking Stuart here) go about using UDT to (*note - edited*) tell you the value of a ticket to this game? Is this ticket price the same or different from the ticket price in the isomorphic average-utilitarian version?

Yes, selfish agents are more problematic than average utilitarians. They have precommitment issues: what they decide before knowing who they are is not what they would decide after - this does, however, leave them open to money pumps, so you could argue that they should go the UDT or precommitment route.

If they don't have access to precommitments, then the discontinuity that happens when they learn who they are means that they no longer behave the same way as average utilitarians would (whos utility function doesn't change when they find out who they are).

Okay. So, going the UDT route, what are the prices people would pay? (also known as the "correct" route - done by choosing the optimal strategy, not just the best available action)

In the selfish non-anthropic problem, we evaluate the payoff of the strategies "always yea" and "always nay." If heads (0.5), and if picked as decider (0.1), yea gives 100 and nay 700. If tails (0.5) and picked as decider (0.9), yea gives 1000 and nay 700. Adding these expected utilities gives an expected payoff of 455 from "always yea" and an expected payoff of 350 from "always nay." (*corrected*)

However, in the "isomorphic" altruistic case, you don't actually care about your personal reward - you simply care about the global result. Thus if heads (0.5), "always yea" always gives 100 and "always nay" always gives 700. And if tails, "always yea" always gives 1000 and "always nay" always gives 700. So in that case the payoffs are "yea" 550 and "nay" 700.

So this "isomorphic" stuff doesn't look so isomorphic.

In the selfish case, you forgot the 0.5: the payoff is 455 for "always yea", 350 for "always nay".

And you seem to be comparing selfish with selfless, not with average utilitarian.

For an average utilitarian, under "always yea", 100 is given out once in the heads world, and 1000 is given out 9 times in the tails world. These must be shared among 10 people, so the average is 0.5(100+1000x9)/10=455. For "always nay", 700 is give out once in the heads world, and 9 times in the tails world, giving 0.5(700 + 9x700)/10=350, same as for the selfish agent.

Ah, good point. I made a mistake in translating the problem into selfish terms. In fact, that might actually solve the non-anthropic problem...

EDIT: Nope.

Why nope? ADT (with precommitements) simplifies to a version of UDT in non-anthropic situations.

The reason it doesn't solve the problem is because the people who want to donate to charity aren't doing it so that the other people also participating in the game will get utility - that is, they're altrusits, but not average utilitarians towards the other players. So the formulation is a little more complicated.

They're selfless, and have coordinated decisions with precommitments - ADT will then recreate the UDT formulation, since there are no anthropic issues to worry about. ADT + selflessness tends to SIA-like behaviour in the Sleeping beauty problem, which isn't the same as saying ADT says selfless agents should follow SIA.

Well, yes, it recreates the UDT solution (or at least it does if it works correctly - I didn't actually check or anything). But the problem was never about just recreating the UDT solution - it's about understanding why the non-UDT solution *doesn't* work.

Because standard decision theory doesn't know how to deal properly with identical agents and common policies?

I think I've figured out a good counterexample to your reasoning in the case of the selfish sleeping beauty. The problem is indeed this "isomorphic" stuff. Imagine 3 copies in 3 *different* worlds, one heads world and two tails worlds. This is isomorphic to the average utilitarian case: the probability of being copy 1, 2 or 3 should not change just by changing the label on "worlds," if all choose tails, they will all get the same outcome average utilitarian case: -x in the heads world and +1 in the two tails worlds. And yet average utilitarian reasoning suggests that in this case, they should pay 0.66 for the bet. The procedure for calculating the expected utility is different from before in a way not covered when we checked what was "isomorphic".

Imagine 3 copies in 3 different worlds, one heads world and two tails worlds

What do you mean by that? That there is a second coin toss after the first one if tails comes up? And changing that label to world very much changes what an average utilitarian would care about - if I average utility over the number of people in the world, then who is and isn't in that world is very important.

Well, yeah, it's *important*. That's the point. It's important but it's not one of the criteria used for isomorphism.

As for how to set the problem in 3 worlds, there are a variety of ways that preserve probabilities (or multiply them by an overall factor, which won't change decisions). For example, you could do that second coin toss. An example set of worlds would be HH - 1 person, HT - 0 people, TH - 1 person, TT - 1 person.

Of course, if you don't want to mess with worlds, an equivalent effect would come from changing whether or not the total expected utility is a weighted or unweighted average over worlds, which is also not one of the criteria for isomorphism.

Of course, if you don't want to mess with worlds, an equivalent effect would come from changing whether or not the total expected utility is a weighted or unweighted average over worlds, which is also not one of the criteria for isomorphism.

You seem to be arguing that total utilitarians must reach the same conclusions as average utilitarians, which seems tremendously overstrong to be a useful requirement.

You seem to be arguing that total utilitarians must reach the same conclusions as average utilitarians

Rather, I'm arguing the reverse - I'm saying that since that's demonstrably false, the isomorphism axiom is false as stated.

The isomorphism axiom says that selfish and average utilitarian agents should make the same decisions.

The total vs average utilitarians comparison fails the "same utility outcomes for each possible linked decision". Total utllitarians get more utility than average utilitarians in the tails world (and the same utility in the heads world), so they do not get the same utility outcomes.

There are two different ways of having decision-makers act like total utilitarians. One is for them to add the utilities up and then take the unweighted average. Another is for them to be average utilitarians who use weighted averages instead of unweighted averages. The first is not "isomorphic" to average utilitarianism, but the second one is.

The difference between ordinary average utilitarians and these average/total utilitarians is not (at least not explicitly) in the possible decisions, the probabilities, or the utilities. It is the in the algorithm they run, which takes the aforementioned things as inputs and spits out a decision. I'm reminded of this comment of yours.

Another is for them to be average utilitarians who use weighted averages instead of unweighted averages

Not sure I get this; the weighted average of 1 and 1, for all weights, is 1. Their sum is 2. Therefore these weighted avereaging agents cannot be total utilitarians.

The point you made in your first comment in this series is relevant. I've strengthened the conditions of the isomorphism axiom in the post to say "same setup", which basically means same possible worlds with same numbers of people in them.

Not sure I get this; the weighted average of 1 and 1, for all weights, is 1.

So in the heads world, the average utility is -x. In the tails world, the average utility is 1-x. An unweighted average means that the decision maker goes "and so the utility I evaluate for buying at x is (1-2x)/2". A weighted average means the decision maker goes "and so the utility I evaluate for buying at x is (2-3x)/2".

For an example of a decision maker who takes a weighted average, take the selfish agents in my poorly-modified non-anthropic problem. They multiply the payoff and the probability of the "world coming into existence" (the coin landing heads) to get the payoff to a decider in that world, but weight the average by the frequency with which they're a decider.

A good question from Wei Dai:

In the selfish sleeping beauty case, assuming the incubator variant, what does your solution say a Beauty should do if we tell her that she is in Room 1 and ask her what price she would pay for a lottery ticket that pays $1 on Heads?

Selfish agents have problems with precommitments. When you tell them "you are in Room 1", this is not only changing their information, but also changing their utility. To wit:

Before being told where they are, selfish and average utilitarians have identical preferences over outcomes. When they consider the potential "if I was to be told I was in room 1 and offered that lottery ticket for $x, would I take it", they calculate the expected utility of doing so as 1-x in the heads world and -x/2 in the tails world (from the selfish point of view, "in the tails world, there's a 50% chance I won't have to pay, even if I commit to accepting"), hence 1-3/2x in total. So she would precommit to taking the deal if x<2/3.

However, after being told that she is in room 1, the selfish SB's preferences change: her previous utility in the tails world was the average of the utilities of the SBs in room 1 and 2, but now it is entirely the utility of the SB in room 1. Her expected utility is now 1-x -x, so she would only want to take it for x<1/2.

So, if she sticks to her precommitments, she would accept x<2/3; if she breaks her precommitments (leaving her possibly money-pumpable), she would only accept x<1/2.

Suppose we pull 100 people off the street. We throw a coin and tell 99 subjects how coin actually landed, and tell the remaining subject the opposite. Suppose you hear "heads". How much would you (assuming you're a selfish agent) pay for a ticket that pays $1 on heads?

The intuitive answer is <$0.99, but section 3.3.3 says the answer should be <$0.50. To generalize this example, an agent following section 3.3.3 never "updates" (in the sense of changing their betting odds) on any evidence unless the evidence allows them to rule out being in some possible world, but since we appear to live in an infinite universe (so everything that can happen does happen somewhere), such an agent would never update.

Since section 3.3.3 is both reflectively inconsistent and highly counter-intuitive even in simple straightforward cases, I feel fairly confident that it's not the right solution to "selfish preferences". I think a better direction to look is UDASSA (or AIXI, which treats anthropic reasoning in a similar way), which is more SIA-like. I still don't know how to deal with copying or death (i.e., issues related to quantum immortality) but at least they don't run into so much trouble on these simpler problems.

The intuitive answer is <$0.99, but section 3.3.3 says the answer should be <$0.50

? I don't see this at all.

By section 3.3.3, I assume you mean the isomorphism between selfish and average-utilitarian? From an average utilitarian perspective (which is the same as a total utilitarian for fixed populations), buying that ticket for x after hearing "heads" will lose one person x in the tails world, and gain 99 people 1-x in the heads world. So the expected utility is (1/2)(1/100)(-x+99(1-x)), which is positive for x< 99/100.

ADT is supposed to reduce to a simplified version of UDT in non-anthropic situations; I didn't emphasise this aspect, as I know you don't want UDT published.

? I don't see this at all.

Section 3.3.3 says that a selfish agent should make the same decisions as an average-utilitarian who averages over just the set of people who may be "me", right? That's why it says that in the incubator experiment, a selfish agent who has been told she is in Room 1 should pay 1/2 for the ticket. An average-utilitarian who averages over everyone who exists in a world would pay 2/3 instead.

So in my example, consider an average-utilitarian whose attention is restricted to just people who have heard "heads". Then buying a ticket loses an average of x in the tails world, and gains an average of 1-x in the heads world, so such an restricted-average-utilitarian would pay x<1/2.

(If this is still not making sense, please contact me on Google Chat where we can probably hash it out much more quickly.)

We'll talk on google chat. But my preliminary thought is that if you are indeed restricting to those who have heard heads, then you need to make use of the fact that this objectively much more likely to happen in the heads world than in the tails.

A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.

In the previous post, I looked at a decision problem when Sleeping Beauty was selfless or a (copy-)total utilitarian. Her behaviour was reminiscent of someone following SIA-type odds. Here I'll look at situations where her behaviour is SSA-like.

Altruistic average utilitarian Sleeping BeautyIn the incubator variant, consider the reasoning of an Outside/Total agent who is an average utilitarian (and there are no other agents in the universe apart from the Sleeping Beauties).

"If the various Sleeping Beauties decide to pay £x for the coupon, they will make -£x in the heads world. In the tails world, they will each make £(1-x) each, so an average of £(1-x). This give me an expected utility of £0.5(-x+(1-x))= £(0.5-x), so I would want them to buy the coupon for any price less than £0.5."

And this will then be the behaviour the agents will follow, by consistency. Thus they would be behaving as if they were following SSA odds, and putting equal probability on the heads versus tails world.

For a version of this that makes senses for the classical Sleeping Beauty problem, one could imagine that she to be awaknened a week after the experiment. Further imagine she would take her winnings and losses during the experiment in the form of chocolate, consumed immediatly. Then because of the amneia drug, she would only remember one instance of this in the tails world. Hence if she valued memory of pleasure, she would want to be average utilitarian towards the pleasures of her different versions, and would follow SSA odds.

Standard SSA has a problem with reference classes. For instance, the larger the reference class becomes, the more the results of SSA in small situations become similar to SIA. The above setup mimics the effect: if there is a very large population of outsider individuals that Sleeping Beauty is altruistic towards, then the gains to two extra copies will tend to add, rather than average: if Ω is large, then 2x/(2+Ω) (averaged gain to two created agents each gaining x) is approximately twice x/(1+Ω) (averaged gain to one created agent gaining $x$), so she will behave more closely to SIA odds.

This issue is not present for

copy-altruisticaverage utilitarian Sleeping Beauties, as she doesn't care about any outsiders.## Selfish Sleeping Beauty

In all of the above example, the goals of one Sleeping Beauty were always in accordance with the goals of her copies or the past and future versions of herself. But what happens when this fails? What happens when the different versions are entirely selfish towards each other? Very easy to understand in the incubator variant (the different created copies feel no mutual loyalty), it can also be understood in the standard Sleeping Beauty problem if she is a hedonist with a high discount rates.

Since the different copies do have different goals, the consistency axioms no longer apply. It seems that we cannot decide what the correct decision is in this case. There is, however, a tantalising similarity between this case and the altruistic average utilitarian Sleeping Beauty. The setups (including probabilities) are the same. By `setup' we mean the different worlds, their probabilities, the number of agents in each world, and the decisions faced by these agents. Similarly, the possible 'linked' decisions are the same. See future posts for a proper definition of linked decisions; here it just means that all copies will have to make the same decision, being identical, so there is one universal 'buy coupon' or 'reject coupon'. And, given this linking, the utilities derived by the agents is the same for either outcome in the two cases.

To see this, consider the selfish situation. Each Sleeping Beauty will make a single decision, whether to buy the coupon at the price offered. Not buying the coupon nets her £0 in all worlds. Buying the coupon at price £x nets her -£x in the heads world, and £(1-x) in the tails world. The linking is present but has no impact on these selfish agents: they don't care what the other copies decide.

This is exactly the same for the altruistic average utilitarian Sleeping Beauties. In the heads world, buying the coupon at price £x nets her -£x worth of utility. In the tails world, it would net the current copy £(1-x) worth of individual utility. Since the copies are identical (linked decision), this would happen twice in the tails world, but since she only cares about the average, this grants both copies only £(1-x) worth of utility in total. The linking is present, and has an impact, but that impact is dissolved by the average utilitarianism of the copies.

Thus the two situations have the same setup, the same possible linked decisions and the same utility outcomes for each possible linked decision. It would seem there is nothing relevant to decision theory that distinguishes these two cases. This gives us the last axiom:

This axiom immediately solves the selfish Sleeping Beauty problem, implying that agents there must behave as they do in the altruistic average utilitarian Sleeping Beauty problem, namely paying up to £0.50 for the coupon. In this way, the selfish agents also behave as if they were following SSA probabilities, and believed that heads and tails were equally likely.

## Summary of results

We have broadly four categories of agents, and they follow two different types of decisions (SIA-like and SSA-like). In the Sleeping Beauty problem (and in more general problems), the categories decompose as:

For the standard Sleeping Beauty problem, the first three decisions derived from consistency. The same result can be established for the incubator variants using the Outside/Total agent axioms. The selfish result, however, needs to make use of the Isomorphic decisions axiom.

EDIT: A good question from Wei Dai illustrates the issue of precommitment for selfish agents.