"Solving" selfishness for UDT

Stuart_Armstrong

Anthropic Decision Theory

"Solving" selfishness for UDT

by Stuart_Armstrong

10 min read27th Oct 201452 comments

38

AnthropicsParadoxesSleeping Beauty ParadoxUtility Functions

Personal Blog

With many thanks to Beluga and lackofcheese.

When trying to decide between SIA and SSA, two anthropic probability theories, I concluded that the question of anthropic probability is badly posed and that it depends entirely on the values of the agents. When debating the issue of personal identity, I concluded that the question of personal identity is badly posed and depends entirely on the values of the agents. When the issue of selfishness in UDT came up recently, I concluded that the question of selfishness is...

But let's not get ahead of ourselves.

A selfish scenario

Using Anthropic Decision Theory, I demonstrated that selfish agents using UDT should reason in the same way that average utilitarians did - essentially behaving 'as if' SSA were true and going for even odds of heads and tail ("halfer") in the Sleeping Beauty problem.

Then Beluga posted an argument involving gnomes, that seemed to show that selfish UDT agents should reason as total utilitarians did - essentially behaving 'as if' SIA were true and going for 2:1 odds of heads and tail ("thirder") in the Sleeping Beauty problem. After a bit of back and forth, lackofcheese then refined the argument. I noticed the refined argument was solid, and incidentally made the gnomes unnecessary.

How does the argument go? Briefly, a coin is flipped and an incubator machine creates either one person (on heads) or two people (on tails), each in separate rooms.

Without knowing what the coin flip was or how many people there were in the universe, every new person is presented with a coupon that pays £1 if the coin came out tails. The question is - assuming utility is linear in money - what amount £x should the created person(s) pay for this coupon?

The argument from Beluga/lackofcheese can be phrased like this. Let's name the people in the tails world, calling them Jack and Roger (yes, they like dressing like princesses - what of it?). Each of them reasons something like this:

"There are four possible worlds here. In the tails world, I, Jack/Roger, could exist in Room 1 or in Room 2. And in the heads world, it could be either me existing in Room 1, or the other person existing in Room 1 (in which case I don't exist). I'm completely indifferent to what happens in worlds where I don't exist (sue me, I'm selfish). So if I buy the coupon for £x, I expect to make utility: 0.25(0) + 0.25(-x) + 0.5(1-x)=0.5-0.75x. Therefore I will buy the coupon for x<£2/3."

That seems a rather solid argument (at least, if you allow counterfactuals into worlds where you don't exist, which you probably should). So it seems I was wrong and that selfish agents will indeed go for the SIA-like "thirder" position.

Not so fast...

Another selfish scenario

The above argument reminded me of one I made a long time ago, when I "proved" that SIA was true. I subsequently discarded that argument, after looking more carefully into the motivations of the agents. So let's do that now.

Above, I was using a subtle intuition pump by using the separate names Jack and Roger. That gave connotations of "I, Jack, don't care about worlds in which I, Jack, don't exist..." But in the original formulation of the Sleeping Beauty/incubator problem, the agents were strictly identical! There is no Jack versus Roger issues - at most, these are labels, like 1 and 2.

It therefore seems possible that the selfish agent could reason:

"There are three possible worlds here. In the tails world, I either exist in Room 1 or Room 2. And in the heads world, either I exist in Room 1, or an identical copy of me exists in Room 1, and is the only copy of me in that world. I fail to see any actual difference between those two scenarios. So if I buy the coupon for £x, I expect to make utility: 0.5(-x) + 0.5(1-x)=0.5-x. Therefore I will buy the coupon for x<£1/2."

The selfish agent seems on rather solid ground here in their heads world reasoning. After all, would we treat someone else differently if we were told "That's not actually your friend; instead it's a perfect copy of your friend, while the original never existed"?

Notice that even if we do allow for the Jack/Roger distinction, it seems reasonable for the agent to say "If I don't exist, I value the person that most closely resembles me." After all, we all change from moment to moment, and we value our future selves. This idea is akin to Nozick's "closest continuer" concept.

Each selfish person is selfish in their own unique way

So what is really going on here? Let's call the first selfish agent a thirder-selfish agent, and the second a halfer-selfish agent. Note that both types of agents have perfectly consistent utility functions defined in all possible actual and counterfactual universes (after giving the thirder-selfish agent some arbitrary constant C, which we may as well set to zero, in worlds where they don't exist). Compare the two versions of Jack's utility:

	"Jack in Room 1"	"Roger in Room 1"
Heads: buy coupon	-x/-x	0/-x
Heads: reject coupon	0/0	0/0
Tails: buy coupon	1-x/1-x	1-x/1-x
Tails: reject coupon	0/0	0/0

The utilities are given as thirder-selfish utility/halfer-selfish utility. The situation where there is a divergence is indicated in bold - that one difference is key to their different decisions.

At this point, people could be tempted to argue as to which type of agent is genuinely the selfish agent... But I can finally say:

The question of selfishness is badly posed and depends entirely on the values of the agents.

What do I mean by that? Well, here is a selfish utility function: "I expect all future copies of Stuart Armstrong to form a single continuous line through time, changing only slowly, and I value the happiness (or preference satisfaction) of all these future copies. I don't value future copies of other people."

That seems pretty standard selfishness. But this is not a utility function; it's a partial description of a class of utility functions, defined only in one set of universes (the set where there's a single future timeline for me, without any "weird" copying going on). Both the thirder-selfish utility function and the halfer-selfish one agree in such single timeline universes. They are therefore both extensions of the same partial selfish utility to more general situations.

Arguing which is "correct" is pointless. Both will possess all the features of selfishness we've used in everyday scenarios to define the term. We've enlarged the domain of possible scenarios beyond the usual set, so our concepts, forged in the usual set, can extend in multiple ways.

You could see the halfer-selfish values as a version of the "Psychological Approach" to personal identity: it values the utility of the being closest to itself in any world. A halfer-selfish agent would cheerfully step into a teleporter where they are scanned, copied onto a distant location, then the original is destroyed. The thirder-selfish agent might not. Because the thirder-selfish agent is actually underspecified: the most extreme version would be one that does not value any future copies of themselves. They would indeed "jump off a cliff knowing smugly that a different person would experience the consequence of hitting the ground." Most versions of the thirder-selfish agent that people have in mind are less extreme than that, but defining (either) agent requires quite a bit of work, not simply a single word: "selfish".

So it's no wonder that UDT has difficulty with selfish agents: the concept is not well defined. Selfish agent is like "featherless biped" - a partial definition that purports to be the whole of the truth.

Personal identity and values

Different view of personal identity can be seen as isomorphic with a particular selfish utility function. The isomorphism is simply done by caring about the utility of another agent if and only if they share the same personal identity.

For instance, the psychological approach to personal identity posits that "You are that future being that in some sense inherits its mental features—beliefs, memories, preferences, the capacity for rational thought, that sort of thing—from you; and you are that past being whose mental features you have inherited in this way." Thus a psychological selfish utility function would value the preferences of a being that was connected to the agent in this way.

The somatic approach posits that "our identity through time consists in some brute physical relation. You are that past or future being that has your body, or that is the same biological organism as you are, or the like." Again, this can be used to code up a utility function.

Those two approaches (psychological and somatic) are actually broad categories of approaches, all of which would have a slightly different "selfish" utility function. The non-branching view, for instance, posits that if there is only one future copy of you, that is you, but if there are two, there is no you (you're effectively dead if you duplicate). This seems mildly ridiculous, but it still expresses very clear preferences over possible worlds that can be captured in a utility function.

Some variants allow for partial personal identity. For instance, discounting could be represented by a utility function that puts less weight on copies more distant in the future. If you allow "almost identical copies", then these could be represented by a utility function that gives partial credit for similarity along some scale (this would tend to give a decision somewhere in between the thirder and halfer situations presented above).

Many of the "paradoxes of identity" dissolve entirely when one uses values instead of identity. Consider the intransitivity problem for some versions of psychological identity:

First, suppose a young student is fined for overdue library books. Later, as a middle-aged lawyer, she remembers paying the fine. Later still, in her dotage, she remembers her law career, but has entirely forgotten not only paying the fine but everything else she did in her youth. [...] the young student is the middle-aged lawyer, the lawyer is the old woman, but the old woman is not the young student.

In terms of values, this problem is non-existent: the young student values herself, the lawyer and the old woman (as does the lawyer) but the old woman only values herself and the lawyer. That value system is inelegant, perhaps, but it's not ridiculous (and "valuing past copies" might be decision-relevant in certain counterfactual situations).

Similarly, consider the question as to whether it is right to punish someone for the law-breaking of a past copy of themselves. Are they the same person? What if, due to an accident or high technology, the present copy has no memory of law-breaking or of being the past person? Using identity gets this hopelessly muddled, but from a consequentialist deterrence perspective, the answer is simple. The past copy presumably valued their future copy staying out of jail. Therefore, from the deterrence perspective, we should punish the current copy to deter such actions. In courts today, we might allow amnesia to be a valid excuse, simply because amnesia is so hard and dangerous to produce deliberately. But this may change in the future: if it becomes easy to rewire your own memory, then deterrent punishment will need to move beyond the classical notions of identity and punish people we would currently consider blameless.

Evolution and identity

Why are we convinced that there is such a thing as selfishness and personal identity? Well, let us note that it is in the interest of evolution that we believe in it. The "interests" of the genes are to be passed on, and so they benefit if the carrier of the gene in the present values the survival of the (same) carrier of the gene in the future. The gene does not "want" the carrier to jump off a cliff, because whatever the issues of personal identity, it'll be the same gene in the body that gets squashed at the end. Similarly, future copies of yourself are the copies that you have the most control over, through your current actions. So genes have exceptionally strong interests in making you value "your" future copies. Even your twin is not as valuable as you: genetically your're equivalent, but your current decisions have less impact over them than over your future self. Thus is selfishness created.

It seems that evolution has resulted in human copies with physical continuity, influence over (future) and memories of (past), and in very strong cross-time caring between copies. These are unique to a single time line of copies, so no wonder people have seen them as "defining" personal identity. And "The person tomorrow is me" is probably more compact than saying that you care about the person tomorrow, and listing the features connecting you. In the future, the first two components may become malleable, leaving only caring (a value) as the remnants of personal identity.

This idea allows us to do something we generally can't, and directly compare the "quality" of value systems - at least from the evolutionary point of view, according to the value system's own criteria.

Here is an example of an inferior selfish decision theory: agents using CDT, and valuing all future versions of themselves, but not any other copies. Why is this inferior? Because if the agent is duplicated, they want those duplicates to cooperate and value each other equally, because that gives the current agent the best possible expected utility. But if each copy has the same utility as the agent started with, then CDT guarantees rivalry, probably to the detriment of every agent. In effect, the agent wants its future self to have different selfish/indexical values from the ones it has, in order to preserve the same overall values.

This problem can be avoided by using UDT, CDT with precommitments, or a selfish utility function that values all copies equally. Those three are more "evolutionarily stable". So is, for instance, a selfish utility function with an exponential discount rate - but not one with any other discount rate. This is an interesting feature of this approach: the set of evolutionary stable selfish decision theories is smaller than the set of selfish decision theories. Thus there are many circumstances where different selfish utilities will give the same decisions under the same decision theory, or where the different decision theories/utilities will self-modify to make identical decisions.

One would like to make an argument about Rawlsian veils of ignorance and UDT-like initial pre-commitments leading to general altruism or something... But that's another argument, for another time. Note that this kind of argument cannot be used against the most ridiculous selfish utility function of all: "me at every moment is a different person I don't value at all". Someone with that utility function will quickly die, but, according to its own utility, it doesn't see this as a problem.

To my mind, the interesting thing here is that while there are many "non-indexical" utility functions that are stable under self-modification, this is not the case for most selfish and indexical ones.

New to LessWrong?

Getting Started

FAQ

Library

AnthropicsParadoxesSleeping Beauty ParadoxUtility Functions

Personal Blog

38

Anthropic Decision Theory V: Linking and ADT

12 comments5 karma

Mentioned in

12Publication of "Anthropic Decision Theory"

11The Doomsday argument in anthropic decision theory

6Doomsday argument for Anthropic Decision Theory

1The Doomsday argument in anthropic decision theory

New Comment

52 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:15 PM

[-]lackofcheese10y70

OK; I agree with you that selfishness is ill-defined, and the way to actually specify a particular kind of selfishness is to specify a utility function over all possible worlds (actual and counterfactual). Moreover, the general procedure for doing this is to assign "me" or "not me" label to various entities in the possible worlds, and derive utilities for those worlds on the basis of those labels. However, I think there are some issues that still need to be resolved here.

If I don't exist, I value the person that most closely resembles me.

This appears suspect to me. If there is no person who closely resembles you, I guess in that case you're indifferent, right? However, what if two people are equally close to you, how do you assign utility to them in that case? Also, why should you only value people who closely resemble you if you don't exist? If anything, wouldn't you care about them in worlds where you do exist?

As you've noted, in a simple case where you only have to worry about actual worlds and not counterfactual ones, and there is only a single "me", assigning selfish utility is a relatively straightforward task. Being indifferent about counterfactual worlds where "you" don't exist also makes some sense from a selfish perspective, although it brings you into potential conflict with your own past self. Additionally, the constant "C" may not be quite so arbitrary in the general case---what if your decision influences the probability of your own existence? In such a situation, the value of that constant will actually matter.

However, the bigger issue that you haven't covered is this: if there are multiple entities in the same world to which you do (or potentially could) assign the label "me", how do you assign utility to that world?

For example, in the scenario in your post, if I assume that the person in Room 1 in the heads world can indeed be labeled as "me", how do I assign utilities to a tails world in which I could be either one of the two created copies? It appears to me that there are two different approaches, and I think it makes sense to apply the label "selfish" to both of them. One of them would be to add utility over selves (again a "thirder" position), and another would be to average utility over selves (which is halfer-equivalent). Nor do I think that the "adding" approach is equivalent to your notion of "copy-altruism", because under the "adding" approach you would stop caring about your copies once you figured out which one you were, whereas under copy-altruism you would continue to care.

Under those assumptions, a "halfer" would be very strange indeed, because
1) They are only willing to pay 1/2 for a ticket.
2) They know that they must either be Jack or Roger.
3) They know that upon finding out which one they are, regardless of whether it's Jack or Roger, they would be willing to pay 2/3.

Can a similar argument be made against a selfish thirder?

[-]Stuart_Armstrong10y20

Additionally, the constant "C" may not be quite so arbitrary in the general case---what if your decision influences the probability of your own existence? In such a situation, the value of that constant will actually matter.

Indeed. That's a valid consideration. In the examples above, this doesn't matter, but it makes a difference in the general case.

[-]Stuart_Armstrong10y20

Also, why should you only value people who closely resemble you if you don't exist?

There's no "should" - this is a value set. This is the extension of the classical selfish utility idea. Suppose that future you joins some silly religion and does some stupid stuff and so on (insert some preferences of which you disprove here). Most humans would still consider that person "them" and would (possibly grudgingly) do things to make them happy. But now imagine that you were duplicated, and the other duplicate went on and did things you approved of more. Many people would conclude that the second duplicate was their "true" self, and redirect all their efforts towards them.

This is very close to Nozick's "closer continuer" approach http://www.iep.utm.edu/nozick/#H4 .

However, the bigger issue that you haven't covered is this: if there are multiple entities in the same world to which you do (or potentially could) assign the label "me", how do you assign utility to that world?

It seems the simplest extension of classical selfishness is that the utility function assigns preferences to the physical being that it happens to reside in. This allows it to assign preferences immediately, without first having to figure out their location. But see my answer to the next question (the real issue is that our normal intuitions break down in these situations, making any choice somewhat arbitrary).

Nor do I think that the "adding" approach is equivalent to your notion of "copy-altruism", because under the "adding" approach you would stop caring about your copies once you figured out which one you were

UDT (or CDT with precommitments) forces selfish agents who don't know who they are into behaving the same as copy-altruists. Copy altruism and adding/averaging come apart under naive CDT. (Note that for averaging versus adding, the difference can only be detected by comparing with other universes with different numbers of people.)

The halfer is only being strange because they seem to be using naive CDT. You could construct a similar paradox for a thirder if you assume the ticket pays out only for the other copy, not themselves.

[-]lackofcheese10y10

There's no "should" - this is a value set.

The "should" comes in giving an argument for why a human rather than just a hypothetically constructed agent might actually reason in that way. The "closest continuer" approach makes at least some intuitive sense, though, so I guess that's a fair justification.

The halfer is only being strange because they seem to be using naive CDT. You could construct a similar paradox for a thirder if you assume the ticket pays out only for the other copy, not themselves.

I think there's more to it than that. Yes, UDT-like reasoning gives a general answer, but under UDT the halfer is still definitely acting strange in a way that the thirder would not be.

If the ticket pays out for the other copy, then UDT-like reasoning would lead you to buy the ticket regardless of whether you know which one you are or not, simply on the basis of having a linked decision. Here's Jack's reasoning:

"Now that I know I'm Jack, I'm still only going to pay at most $0.50, because that's what I precommited to do when I didn't know who I was. However, I can't help but think that I was somehow stupid when I made that precommitment, because now it really seems I ought to be willing to pay 2/3. Under UDT sometimes this kind of thing makes sense, because sometimes I have to give up utility so that my counterfactual self can make greater gains, but it seems to me that that isn't the case here. In a counterfactual scenario where I turned out to be Roger and not Jack, I would still desire the same linked decision (x=2/3). Why, then, am I stuck refusing tickets at 55 cents?"

It appears to me that something has clearly gone wrong with the self-averaging approach here, and I think it is indicative of a deeper problem with SSA-like reasoning. I'm not saying you can't reasonably come to the halfer conclusion for different reasons (e.g. the "closest continuer" argument), but some or many of the possible reasons can still be wrong. That being said, I think I tend to disagree with pretty much all of the reasons one could be a halfer, including average utilitarianism, the "closest continuer", and selfish averaging.

[-]Stuart_Armstrong10y20

simply on the basis of having a linked decision.

Linked decisions is also what makes the halfer paradox go away.

To get a paradox that hits at the "thirder" position specifically, in the same way as yours did, I think you need only replace the ticket with something mutually beneficial - like putting on an enjoyable movie that both can watch. Then the thirder would double count the benefit of this, before finding out who they were.

[-]lackofcheese9y10

Linked decisions is also what makes the halfer paradox go away.

I don't think linked decisions make the halfer paradox I brought up go away. Any counterintuitive decisions you make under UDT are simply ones that lead to you making a gain in a counterfactual possible worlds at the cost of a loss in actual possible worlds. However, in the instance above you're losing both in the real scenario in which you're Jack, and in the counterfactual one in which you turned out to be Roger.

Granted, the "halfer" paradox I raised is an argument against having a specific kind of indexical utility function (selfish utility w/ averaging over subjectively indistinguishable agents) rather than an argument against being a halfer in general. SSA, for example, would tell you to stick to your guns because you would still assign probability 1/2 even after you know whether you're "Jack" or "Roger", and thus doesn't suffer from the same paradox. That said, due to the reference class problem, If you are told whether you're Jack or Roger before being told everything else SSA would give the wrong answer, so it's not like it's any better...

To get a paradox that hits at the "thirder" position specifically, in the same way as yours did, I think you need only replace the ticket with something mutually beneficial - like putting on an enjoyable movie that both can watch. Then the thirder would double count the benefit of this, before finding out who they were.

Are you sure? It doesn't seem to be that this would be paradoxical; since the decisions are linked you could argue that "If I hadn't put on an enjoyable movie for Jack/Roger, Jack/Roger wouldn't have put on an enjoyable movie for me, and thus I would be worse off". If, on the other hand, only one agent gets to make that decision, then the agent-parts would have ceased to be subjectively indistinguishable as soon as one of them was offered the decision.

[-]Stuart_Armstrong9y20

Did I make a mistake? It's possible - I'm exhausted currently. Let's go through this carefully. Can you spell out exactly why you think that halfers are such that:

They are only willing to pay 1/2 for a ticket.
They know that they must either be Jack or Roger.
They know that upon finding out which one they are, regardless of whether it's Jack or Roger, they would be willing to pay 2/3.

I can see 1) and 2), but, thinking about it, I fail to see 3).

[-]lackofcheese9y10

As I mentioned earlier, it's not an argument against halfers in general; it's against halfers with a specific kind of utility function, which sounds like this: "In any possible world I value only my own current and future subjective happiness, averaged over all of the subjectively indistinguishable people who could equally be "me" right now."

In the above scenario, there is a 1/2 chance that both Jack and Roger will be created, a 1/4 chance of only Jack, and a 1/4 chance of only Roger.

Before finding out who you are, averaging would lead to a 1:1 odds ratio, and so (as you've agreed) this would lead to a cutoff of 1/2.

After finding out whether you are, in fact, Jack or Roger, you have only one possible self in the TAILS world, and one possible self in the relevant HEADS+Jack/HEADS+Roger world, which leads to a 2:1 odds ratio and a cutoff of 2/3.

Ultimately, I guess the essence here is that this kind of utility function is equivalent to a failure to properly conditionalise, and thus even though you're not using probabilities you're still "Dutch-bookable" with respect to your own utility function.

I guess it could be argued that this result is somewhat trivial, but the utility function mentioned above is at least intuitively reasonable, so I don't think it's meaningless to show that having that kind of utility function is going to put you in trouble.

[-]Stuart_Armstrong9y10

"In any possible world I value only my own current and future subjective happiness, averaged over all of the subjectively indistinguishable people who could equally be "me" right now."

Oh. I see. The problem is that that utility takes a "halfer" position on combining utility (averaging) and "thirder" position on counterfactual worlds where the agent doesn't exist (removing them from consideration). I'm not even sure it's a valid utility function - it seems to mix utility and probability.

For example, in the heads world, it values "50% Roger vs 50% Jack" at the full utility amount, yet values only one of "Roger" and "Jack" at full utility. The correct way of doing this would be to value "50% Roger vs 50% Jack" at 50% - and then you just have a rescaled version of the thirder utility.

I think I see the idea you're getting at, but I suspect that the real lesson of your example is that that mixed halfer/thirder idea cannot be made coherent in terms of utilities over worlds.

[-]lackofcheese9y10

I don't think that's entirely correct; SSA, for example, is a halfer position and it does exclude worlds where you don't exist, as do many other anthropic approaches.

Personally I'm generally skeptical of averaging over agents in any utility function.

[-]Stuart_Armstrong9y10

SSA, for example, is

Which is why I don't use anthropic probability, because it leads to these kinds of absurdities. The halfer position is defined in the top post (as is the thirder), and your setup uses aspects of both approaches. If it's incoherent, then SSA is incoherent, which I have no problem with. SSA != halfer.

[-]Stuart_Armstrong9y10

Averaging makes a lot of sense if the number of agents is going to be increased and decreased in non-relevant ways.

Eg: you are an upload. Soon, you are going to experience eating a chocolate bar, then stubbing your toe, then playing a tough but intriguing game. During this time, you will be simulated on n computers, all running exactly the same program of you experiencing this, without any deviations. But n may vary from moment to moment. Should you be willing to pay to make n higher during pleasant experience or lower during unpleasant ones, given that you will never detect this change?

[-]lackofcheese9y10

I think there are some rather significant assumptions underlying the idea that they are "non-relevant". At the very least, if the agents were distinguishable, I think you should indeed be willing to pay to make n higher. On the other hand, if they're indistinguishable then it's a more difficult question, but the anthropic averaging I suggested in my previous comments leads to absurd results.

What's your proposal here?

[-]Stuart_Armstrong9y10

the anthropic averaging I suggested in my previous comments leads to absurd results.

The anthropic averaging leads to absurd results only because it wasn't a utility function over states of the world. Under heads, it ranked 50%Roger+50%Jack differently from the average utility of those two worlds.

[-]Wei Dai10y60

Hi Stuart, you linked to my post on UDT and selfishness, but I'm not sure you actually tried to answer the questions that I brought up there. Was this post intended to do that?

[-]Stuart_Armstrong10y40

Not in this post. I'll turn to those questions if this post doesn't seem to have major flaws.

[-]Wei Dai2y20

Stuart, did you ever get around to doing that? I can't seem to find the sequel to this post.

[-]Stuart_Armstrong2y20

Doing what? Looking at pre-commitments?

[-]torekp9y40

Great post. There's one part (two if you include Manfred's critique) that I don't buy, and seems to go beyond what you need for your core points:

Arguing which is "correct" is pointless. Both will possess all the features of selfishness we've used in everyday scenarios to define the term. We've enlarged the domain of possible scenarios beyond the usual set, so our concepts, forged in the usual set, can extend in multiple ways.

The various approaches to identity agree on the undisputed everyday scenarios, but that doesn't mean they're all equally natural and elegant extensions of our concepts. Compare two early humans who have agreed that sparrows, crows, and pigeons are all "birds". Then they encounter a penguin. It doesn't fly! Argument ensues! That doesn't mean that there isn't a unique best answer to whether this new thing is a bird. Maybe there is, maybe there isn't - agreement on paradigm cases plus disagreement about others doesn't predict much.

Precisely because value questions are getting entangled with semantic questions, the semantic personal identity question is made harder than it needs to be. I think psychological theories of identity are only made plausible by building-in the assumption that whatever identity turns out to be, it must always turn on facts that most people would instantly recognize as valuable. But that's getting off topic.

[-]RomeoStevens10y40

Great post, this makes it easier to think about certain things. For instance, I value future versions of myself unequally leading to investing in trying to create some of them and destroy others. If I am correctly extrapolating, then the future selves I destroy are the ones that would not have wished on reflection to have come into existence. If I am incorrectly extrapolating, well the implications are scary, and this makes good epistemic hygiene feel sharper. That in turn makes something difficult, like taking the outside view on how happiness research applies to you even when your intuitions disagree, somewhat easier.

[-]shminux10y30

Interesting. I did not realize that one's answer to Sleeping Beauty-type of questions can legitimately depend on one's definition of identity... with no single right answer.

[-]Vulture10y30

Excellent post! Copying issues feel significantly clearer in my mind through this framework, although I certainly wouldn't call it a complete solution. Good stuff!

[-]Manfred10y20

I agree with most of this post, but not with most parts mentioning SSA/SIA or the sleeping beauty problem. In general, aside from those two areas I find your written works to be valuable resources. Now that I've said something nice, here's a long comment predictably focusing on the bad bits.

SSA and SIA, as interpreted by you, seem uninformative (treating them as two different black boxes rather than two settings on a transparent box), so I'm not surprised that you decided SSA vs SIA was meaningless. But this does not mean that anthropic probability is meaningless. Certainly you didn't prove that - you tried something else, that's all. It's analogous to how just because UDT solves Psy-Kosh's non-anthropic problem without mentioning classical probability updates, that doesn't mean classical probability updates are "meaningless."

Each of them reasons something like this:

"There are four possible worlds here. In the tails world, I, Jack/Roger, could exist in Room 1 or in Room 2. And in the heads world, it could be either me existing in Room 1, or the other person existing in Room 1 (in which case I don't exist). I'm completely indifferent to what happens in worlds where I don't exist (sue me, I'm selfish). So if I buy the coupon for £x, I expect to make utility: 0.25(0) + 0.25(-x) + 0.5(1-x)=0.5-0.75x. Therefore I will buy the coupon for x<£2/3."

This is the gnome's reasoning with different labels. But that doesn't mean that it has the right labels to be the human's reasoning.

It sounds like the sort of thing that a person who believed that anthropic probabilities were meaningless would write as the person's reasoning.

Let me try and give an analogy for how this sounds to me. It will be grossly unfair to you, and I apologize - pretend the content is a lot better even as the sound remains similar.

Suppose you're sitting in your room, and also in your room is a clock. Now imagine there was a gnome flying by with time dilation of 0.5. The gnome reasons as follows "I see a human and a clock moving past me together. The clock ticks at half a tick per second, and the person thinks at half normal speed, so the human sees the clock tick once per second"

My grossly unfair parody of you would then say: "Physics would be the same if I was moving past with time dilation 0.5. I'd see myself and my clock moving past me together. The clock would tick at half a tick per second, and I'd think at half normal speed, so I see the clock tick once per second."

This is the right conclusion, but it's just copying what the gnome said even when that's not appropriate.

What do I think the right way would look like? Well, it would have anthropic probabilities in it.

[-]lackofcheese10y30

The strongest argument against anthropic probabilities in decision-making comes from problems like the Absent-Minded Driver, in which the probabilities depend upon your decisions.

If anthropic probabilities don't form part of a general-purpose decision theory, and you can get the right answers by simply taking the UDT approach and going straight to optimising outcomes given the strategies you could have, what use are the probabilities?

I won't go so far as to say they're meaningless, but without a general theory of when and how they should be used I definitely think the idea is suspect.

[-]Manfred10y40

Probabilities have a foundation independent of decision theory, as encoding beliefs about events. They're what you really do expect to see when you look outside.

This is an important note about the absent-minded driver problem et al, that gets lost if one gets comfortable in the effectiveness of UDT. The agent's probabilities are still accurate, and still correspond to the frequency with which they see things (truly!) - but they're no longer related to decision-making in quite the same way.

"The use" is then to predict, as accurately as ever, what you'll see when you look outside yourself.

And yes, probabilities can sometimes depend on decisions, not only in some anthropic problems but more generally in Newcomb-like ones. Yes, the idea of having a single unqualified belief, before making a decision, doesn't make much sense in these cases. But Sleeping Beauty is not one of these cases.

[-]lackofcheese10y10

That's a reasonable point, although I still have two major criticisms of it.

What is your resolution to the confusion about how anthropic reasoning should be applied, and to the various potential absurdities that seem to come from it? Non-anthropic probabilities do not have this problem, but anthropic probabilities definitely do.
How can anthropic probability be the "right way" to solve the Sleeping Beauty problem if it lacks the universality of methods like UDT?

[-]Manfred10y10

1 - I don't have a general solution, there are plenty of things I'm confused about - and certain cases where anthropic probability depends on your action are at the top of the list. There is a sense in which a certain extension of UDT can handle these cases if you "pre-chew" indexical utility functions into world-state utility functions for it (like a more sophisticated version of what's described in this post, actually), but I'm not convinced that this is the last word.

Absurdity and confusion have a long (if slightly spotty) track record of indicating a lack in our understanding, rather than a lack of anything to understand.

2 - Same way that CDT gets the right answer on how much to pay for 50% chance of winning $1, even though CDT isn't correct. The Sleeping Beauty problem is literally so simple that it's within the zone of validity of CDT.

[-]lackofcheese10y10

On 1), I agree that "pre-chewing" anthropic utility functions appears to be something of a hack. My current intuition in that regard is to reject the notion of anthropic utility (although not anthropic probability), but a solid formulation of anthropics could easily convince me otherwise.

On 2), if it's within the zone of validity then I guess that's sufficient to call something "a correct way" of solving the problem, but if there is an equally simple or simpler approach that has a strictly broader domain of validity I don't think you can be justified in calling it "the right way".

[-]Stuart_Armstrong10y20

My full response can be found at:

http://www.fhi.ox.ac.uk/anthropics-why-probability-isnt-enough.pdf

But the gist of it is this: different people can assign different anthropic probabilities to certain problems, yet, due to having different decision theories, will make the same decision in every single case. That caused me to wonder what the meaning of "anthropic probability" was if you could shout "SIA" versus "SSA" but never actually do anything different because of this.

[-]Manfred10y20

Probabilities are a way of encoding your knowledge about events. For example in the original Sleeping Beauty problem, probability of it being monday actually does correspond to what the agent with that probability would see if they could get on the internet, or walk outside.

Specifically, probabilities are a function of your information about events.

It seems like you disagree with this. Or maybe you just got fed up with arguments over the Sleeping Beauty problem and decided to declare the whole thing meaningless? Could you expand on that a little?

Consider what this sentence looks like if, like me, you think that probabilities are a function of the agent's information:

different people can assign different anthropic probabilities to certain problems, yet, due to having different decision theories, will make the same decision in every single case

Here's one highly charitable rephrasing. "If I give two people different information, then there exists some set of values such that they'll make the same decision anyhow."

But this has zilch to do with anthropics. If I tell two people two different things about a coin, and ask them how much they'd pay for a candy bar that I only gave them if the coin landed heads, there exists some set of values such that these people will make the same decision.

[-]Stuart_Armstrong10y20

Another reason that anthropic probabilities are different: agents can share all their information, but this will not convince a SIA agent to move to SSA or vice versa.

[-]Manfred10y20

I don't think you understand what I'm saying about SSA and SIA. Hm. Maybe I should rewrite that post where I tried to explain this, since it had a lot of confused bits in it. Clearly you don't remember it, so I'm sure the ideas would look fresh and original.

Sorry, getting off track. I will attempt to recap:

SSA and SIA are identical to certain states of information. From these states of information, probabilities can be gotten using the details of the problem and the maximum entropy principle.

The information state identical to SSA says only that being in different possible worlds are mutually exclusive events, and that you are in some world. In the Sleeping Beauty problem, there are just two worlds, labeled Heads and Tails, and the max ent distribution is just 1/2, 1/2.

The information state identical to SIA says only that the information identical to SSA is true, and also that being in different states relative to the world (including different places or different times) are mutually exclusive and exhaustive events for you. There are then three mutually exclusive and exhaustive events, which all get 1/3 probability.

The reason why SIA doesn't make a distinction between different people and different worlds is because all these things are mutually exclusive and exhaustive - there is no such thing as a "special degree of mutual exclusivity" that might be awarded to the distinction between Heads and Tails, but not between Monday and Tuesday. All mutual exclusivity is the same.

Okay, so now imagine an SSA agent and an SIA agent get to talk to each other.

SSA agent says "me existing in diferent worlds is mutually exclusive and exhaustive."

"Wow, me too!" replies SIA. "Also, when I look at the world I see exactly one place and time, so those are mutually exclusive and exhaustive for me."

"Nope, that's not true for me at all." says SSA.

"Have you tried looking at the world?"

"Nope, not yet."

"Well, when you do, try and check out if you see exactly one place and time - since I'm talking to you my causal model predicts that you will."

"This conversation is just a rhetorical device, I'm really a pretty abstract entity."

"Okay then. Bye."

"See you later, and possibly also at other times as well."

Nothing particularly strange appears to be happening.

[-]Stuart_Armstrong10y20

I think it's worth sorting the issue out (if you agree), so let's go slowly. Both SSA and SIA depend on priors, so you can't argue for them based on maximal entropy grounds. If the coin is biased, they will have different probabilities (so SSA+biased coin can have the same probabilities as SIA+unbiased coin and vice versa). That's probably obvious to you, but I'm mentioning it in case there's a disagreement.

Your model works, with a few tweaks. SSA starts with a probability distribution over worlds, throws away the ones where "you" don't exist (why? shush, don't ask questions!), and then locates themselves within the worlds by subdividing a somewhat arbitrary reference class. SIA starts with the same, uses the original probabilities to weigh every possible copy of themselves, sees these as separate events, and then renormalises (which is sometimes impossible, see http://lesswrong.com/lw/fg7/sia_fears_expected_infinity/).

I have to disagree with your conversation, however. Both SIA and SSA consider all statements of type "I exist in universe X and am the person in location Y" to be mutually exclusive and exhaustive. It's just that SIA stratifies by location only (and then deduces the probability of a universe by combining different locations in the same universe), while SSA first stratifies by universe and then by location.

But I still think this leads us astray. My point is different. Normally, given someone's utility, it's possible to disentangle whether someone is using a particular decision theory or a particular probability approach by observing their decisions. However, in anthropic (and Psy-Koch-like) situations, this becomes impossible. In the notation that I used in the paper I referred to, SIA+"divided responsibility" will always give the same decision as SSA+"total responsibility" (to a somewhat more arguable extent, for any fixed responsibility criteria, EDT+SSA gives the same decisions as CDT+SIA).

Since the decision is the same, this means that all the powerful arguments for using probability (which boil down to "if you don't act as if you have consistent probabilities, you'll lose utility pointlessly") don't apply in distinguishing between SIA and SSA. Thus we are not forced to have a theory of anthropic probability - it's a matter of taste whether to do so or not. Nothing hinges on whether the probability of heads is "really" 1/3 or 1/2. The full decision theory is what counts, not just the anthropic probability component.

[-]Manfred10y40

Both SSA and SIA depend on priors, so you can't argue for them based on maximal entropy grounds. If the coin is biased, they will have different probabilities (so SSA+biased coin can have the same probabilities as SIA+unbiased coin and vice versa).

I definitely agree that SSA + belief in a biased coin can have the same probabilities as SIA + belief in an unbiased coin. (I'm just calling them beliefs to reinforce that the thing that affects the probability directly is the belief, not that coin itself). But I think you're making an implied argument here - check if I'm right.

The implied argument would go like "because the biasedness of the coin is a prior, you can't say what the probabilities will be just from the information, because you can always change the prior."

The short answer is that the probabilities I calculated are simply for agents who "assume SSA" and "assume SIA" and have no other information.

The long answer is to explain how this interacts with priors. By the way, have you re-read the first three chapters of Jaynes recently? I have done so several times, and found it helpful.

Prior probabilities still reflect a state of information. Specifically, they reflect one's aptly named prior information. Then you learn something new, and you update, and now your probabilities are posterior probabilities and reflect your posterior information. Agents with different priors have different states of prior information.

Perhaps there was an implied argument that there's some problem with the fact that two states with different information (SSA+unbiased and SIA+biased) are giving the same probabilities for events relevant to the problem? Well, there's no problem. If we conserve information there must be differences somewhere, but they don't have to be in the probabilities used in decision-making.

a few tweaks.

Predictably, I'd prefer descriptions in terms of probability theory to mechanistic descriptions of how to get the results.

I have to disagree with your conversation, however. Both SIA and SSA consider all statements of type "I exist in universe X and am the person in location Y" to be mutually exclusive and exhaustive. It's just that SIA stratifies by location only (and then deduces the probability of a universe by combining different locations in the same universe), while SSA first stratifies by universe and then by location.

Whoops. Good point, I got SSA quite wrong. Hm. That's troubling. I think I made this mistake way back in the ambitious yet confused post I mentioned, and have been lugging it around ever since.

Consider an analogous game where a coin is flipped. If heads I get a white marble. If tails, somehow (so that this 'somehow' has a label, let's call it 'luck') I get either a white marble or a black marble. This is SSA with different labels. How does one get the probabilities from a specification like the one I gave for SIA in the sleeping beauty problem?

I think it's a causal condition, possibly because of something equivalent to "the coin flip does not affect what day it is." And I'm bad at doing this translation.

But I need to think a lot, so I'll get back to you later.

Since the decision is the same, this means that all the powerful arguments for using probability

Just not a fan of Cox's theorem, eh?

[-]Stuart_Armstrong10y10

"assume SSA" and "assume SIA"

And I'm still not seeing what that either assumption gives you, if your decision is already determined (by UDT, for instance) in a way that makes the assumption irrelevant.

Just not a fan of Cox's theorem, eh?

Very much a fan. Anything that's probability-like needs to be an actual probability. I'm disputing whether anthropic probabilities are meaningful at all.

[-]Manfred9y20

And I'm still not seeing what that either assumption gives you, if your decision is already determined

I'll delay talking about the point of all of this until later.

whether anthropic probabilities are meaningful at all.

Probabilities are a function that represents what we know about events (where "events" is a technical term meaning things we don't control, in the context of Cox's theorem - for different formulations of probability this can take on somewhat different meanings). This is "what they mean."

As I said to lackofcheese:

Probabilities have a foundation independent of decision theory, as encoding beliefs about events. They're what you really do expect to see when you look outside.

This is an important note about the absent-minded driver problem et al, that can get lost if one gets comfortable in the effectiveness of UDT. The agent's probabilities are still accurate, and still correspond to the frequency with which they see things (truly!) - but they're no longer related to decision-making in quite the same way.

"The use" is then to predict, as accurately as ever, what you'll see when you look outside yourself.

If you accept that the events you're trying to predict are meaningful (e.g. "whether it's Monday or Tuesday when you look outside"), and you know Cox's theorem, then P(Monday) is meaningful, because it encodes your information about a meaningful event.

In the Sleeping Beauty problem, the answer still happens to be straightforward in terms of logical probabilities, but step one is definitely agreeing that this is not a meaningless statement.

(side note: If all your information is meaningless, that's no problem - then it's just like not knowing anything and it gets P=0.5)

[-]Stuart_Armstrong9y40

Probabilities are a function that represents what we know about events

As I said to lackofcheese:

If we create 10 identical copies of me and expose 9 of them one stimuli and 1 to another, what is my subjective anticipation of seeing one stimuli over the other? 10% is one obvious answer, but I might take a view of personal identity that fails to distinguish between identical copies of me, in which case 50% is correct. What if identical copies will be recombined later? Eliezer had a thought experiment where agents were two dimensional, and could get glued or separated from each other, and wondered whether this made any difference. I do to. And I'm also very confused about quantum measure, for similar reasons.

In general, the question "how many copies are there" may not be answerable in certain weird situations (or can be answered only arbitrarily).

EDIT: with copying and merging and similar, you get odd scenarios like "the probability of seeing something is x, the probability of remembering seeing it is y, the probability of remembering remembering it is z, and x y and z are all different." Objectively it's clear what's going on, but in terms of "subjective anticipation", it's not clear at all.

Or put more simply: there are two identical copies of you. They will be merged soon. Do you currently have a 50% chance of dying soon?

[-]Manfred9y30

In general, the question "how many copies are there" may not be answerable in certain weird situations (or can be answered only arbitrarily).

I agree with this. In probability terms, this is saying that P(there are 9 copies of me) is not necessarily meaningful because the event is not necessarily well defined.

My first response is / was that the event "the internet says it's Monday" seems a lot better-defined than "there are 9 of me," and should therefore still have a meaningful probability, even in anthropic situations. But an example may be necessary here.

I think you'd agree that a good example of "certain weird situations" is the divisible brain. Suppose we ran a mind on transistors and wires of macroscopic size. That is, we could make them half as big and they'd still run the same program. Then one can imagine splitting this mind down the middle into two half-sized copies. If this single amount of material counts as two people when split, does it also count as two people when it's together?

Whether it does or doesn't is, to some extent, mere semantics. If we set up a Sleeping Beauty problem except that there's the same amount of total width on both sides, it then becomes semantics whether there is equal anthropic probability on both sides, or unequal. So the "anthropic probabilities are meaningless" argument is looking pretty good. And if it's okay to define amount of personhood based on thickness, why not define it however you like and make probability pointless?

But I don't think it's quite as bad as all that, because of the restriction that your definition of personhood is part of how you view the world, not a free parameter. You don't try to change your mind about the gravitational constant so that you can jump higher. So agents can have this highly arbitrary factor in what they expect to see, but still behave somewhat reasonably. (Of course, any time an agent has some arbitrary-seeming information, I'd like to ask "how do you know what you think you know?" Exploring the possibilities better in this case would be a bit of a rabbit hole, though.)

Then, if I'm pretending to be Stuart Armstrong, I note that there's an equivalence in the aforementioned equal-total-width sleeping beauty problem between e.g. agents who think that anthropic probability is proportional to total width but have the same payoffs in both worlds ("width-selfish agents"), and agents who ignore anthropic probability, but weight the payoffs to agents by their total widths, per total width ("width-average-utilitarian outside perspective [UDT] predictors").

Sure, these two different agents have different information/probabilities and different internal experience, but to the extent that we only care about the actions in this game, they're the same.

Even if an agent starts in multiple identical copies that then diverge into non-identical versions, a selfish agent will want to self-modify to be an average utilitarian between non-identical versions. But this is a bit different from the typical usage of "average utilitarianism" in population ethics. A population-ethics average utilitarian would feed one of their copies to hungry alligators if it paid of for the other copies. But a reflectively-selfish average utilitarian would expect some chance of being the one fed to the alligators, and wouldn't like that plan at all.

Actually, I think the cause of this departure from average utilitarianism over copies is the starting state. When you start already defined as one of multiple copies, like in the divisible brain case, the UDT agent that naive selfish agents want to self-modify to be no longer looks just like an average utilitarian.

So that's one caveat about this equivalence - that it might not apply to all problems, and to get these other problems right, the proper thing to do is to go back and derive the best strategy in terms of selfish preferences.

Which is sort of the general closing thought I have: your arguments make a lot more sense to me than they did before, but as long as you have some preferences that are indexically selfish, there will be cases where you need to do anthropic reasoning just to go from the selfish preferences to the "outside perspective" payoffs that generate the same behavior. And it doesn't particularly matter if you have some contrived state of information that tells you you're one person on Mondays and ten people on Tuesdays.

Man, I haven't had a journey like this since DWFTTW. I was so sure that thing couldn't be going downwind faster than the wind.

P.S. So I have this written down somewhere, the causal buzzword important for an abstract description of the game with the marbles is "factorizable probability distribution." I may check out a causality textbook and try and figure the application of this out with less handwaving, then write a post on it.

[-]IlyaShpitser9y30

Hi, "factorization" is just taking a thing and expressing it as a product of simpler things. For example, a composite integer is a product of powers of primes.

In probability theory, we get a simple factorization via the chain rule of probability. If we have independence, some things drop out, but factorization is basically intellectually content-free. Of course, I also think Bayes rule is an intellectually content-free consequence of the chain rule of probability. And of course this may be hindsight bias operating...

You are welcome to message or email me if you want to talk about it more.

[-]Stuart_Armstrong9y10

then write a post on it.

That would be interesting.

[-]lackofcheese9y10

You definitely don't have a 50% chance of dying in the sense of "experiencing dying". In the sense of "ceasing to exist" I guess you could argue for it, but I think that it's much more reasonable to say that both past selves continue to exist as a single future self.

Regardless, this stuff may be confusing, but it's entirely conceivable that with the correct theory of personal identity we would have a single correct answer to each of these questions.

[-]Stuart_Armstrong9y10

Conceivable. But it doesn't seem to me that such a theory is necessary, as it's role seems merely to be able to state probabilities that don't influence actions.

[-]lackofcheese10y10

I think that argument is highly suspect, primarily because I see no reason why a notion of "responsibility" should have any bearing on your decision theory. Decision theory is about achieving your goals, not avoiding blame for failing.

However, even if we assume that we do include some notion of responsibility, I think that your argument is still incorrect. Consider this version of the incubator Sleeping Beauty problem, where two coins are flipped.
HH => Sleeping Beauties created in Room 1, 2, and 3
HT => Sleeping Beauty created in Room 1
TH => Sleeping Beauty created in Room 2
TT => Sleeping Beauty created in Room 3
Moreover, in each room there is a sign. In Room 1 it is equally likely to say either "This is not Room 2" or "This is not Room 3", and so on for each of the three rooms.

Now, each Sleeping Beauty is offered a choice between two coupons; each coupon gives the specified amount to their preferred charity (by assumption, utility is proportional to $ given to charity), but only if each of them chose the same coupon. The payoff looks like this:
A => $12 if HH, $0 otherwise.
B => $6 if HH, $2.40 otherwise.

I'm sure you see where this is going, but I'll do the math anyway.

With SIA+divided responsibility, we have
p(HH) = p(not HH) = 1/2
The responsibility is divided among 3 people in HH-world, and among 1 person otherwise, therefore
EU(A) = (1/2)(1/3)$12 = $2.00
EU(B) = (1/2)(1/3)$6 + (1/2)$2.40 = $2.20

With SSA+total responsibility, we have
p(HH) = 1/3
p(not HH) = 2/3
EU(A) = (1/3)$12 = $4.00
EU(B) = (1/3)$6 + (2/3)$2.40 = $3.60

So SIA+divided responsibility suggests choosing B, but SSA+total responsibility suggests choosing A.

[-]Stuart_Armstrong10y20

The SSA probability of HH is 1/4, not 1/3.

Proof: before opening their eyes, the SSA agents divide probability as: 1/12 HH1 (HH and they are in room 1), 1/12 HH2, 1/12 HH3, 1/4 HT, 1/4 TH, 1/4 TT.

Upon seeing a sign saying "this is not room X", they remove one possible agent from the HH world, and one possible world from the remaining three. So this gives odds of HH:¬HH of (1/12+1/12):(1/4+1/4) = 1/6:1/2, or 1:3, which is a probability of 1/4.

This means that SSA+divided responsibility says EU(A) is $3, and EU(B) is $3.3. - exactly the same ratios as the first setup, with B as the best choice.

[-]lackofcheese10y10

That's not true. The SSA agents are only told about the conditions of the experiment after they're created and have already opened their eyes.

Consequently, isn't it equally valid for me to begin the SSA probability calculation with those two agents already excluded from my reference class?

Doesn't this mean that SSA probabilities are not uniquely defined given the same information, because they depend upon the order in which that information is incorporated?

[-]Stuart_Armstrong9y20

Doesn't this mean that SSA probabilities are not uniquely defined given the same information, because they depend upon the order in which that information is incorporated?

Yep. The old reference class problem. Which is why, back when I thought anthropic probabilities were meaningful, I was an SIAer.

But SIA also has some issues with order of information, though it's connected with decisions ( http://lesswrong.com/lw/4fl/dead_men_tell_tales_falling_out_of_love_with_sia/ ).

Anyway, if your reference class consists of people who have seen "this is not room X", then "divided responsibility" is no longer 1/3, and you probably have to go whole UTD.

[-]lackofcheese9y10

But SIA also has some issues with order of information, though it's connected with decisions

Can you illustrate how the order of information matters there? As far as I can tell it doesn't, and hence it's just an issue with failing to consider counterfactual utility, which SIA ignores by default. It's definitely a relevant criticism of using anthropic probabilities in your decisions, because failing to consider counterfactual utility results in dynamic inconsistency, but I don't think it's as strong as the associated criticism of SSA.

Anyway, if your reference class consists of people who have seen "this is not room X", then "divided responsibility" is no longer 1/3, and you probably have to go whole UTD.

If divided responsibility is not 1/3, what do those words even mean? How can you claim that only two agents are responsible for the decision when it's quite clear that the decision is a linked decision shared by three agents?

If you're taking "divided responsibility" to mean "divide by the number of agents used as an input to the SIA-probability of the relevant world", then your argument that SSA+total = SIA+divided boils down to this: "If, in making decisions, you (an SIA agent) arbitrarily choose to divide your utility for a world by the number of subjectively indistinguishable agents in that world in the given state of information, then you end up with the same decisions as an SSA agent!"

That argument is, of course, trivially true because the the number of agents you're dividing by will be the ratio between the SIA odds and the SSA odds of that world. If you allow me to choose arbitrary constants to scale the utility of each possible world, then of course your decisions will not be fully specified by the probabilities, no matter what decision theory you happen to use. Besides, you haven't even given me any reason why it makes any sense at all to measure my decisions in terms of "responsibility" rather than simply using my utility function in the first place.

On the other hand, if, for example, you could justify why it would make sense to include a notion of "divided responsibility" in my decision theory, then that argument would tell me that SSA+total responsibility must clearly be conceptually the wrong way to do things because it uses total responsibility instead.

All in all, I do think anthropic probabilities are suspect for use in a decision theory because

They result in reflective inconsistency by failing to consider counterfactuals.
It doesn't make sense to use them for decisions when the probabilities could depend upon the decisions (as in the Absent-Minded Driver)

That said, even if you can't use those probabilities in your decision theory there is still a remaining question of "to what degree should I anticipate X, given my state of information". I don't think your argument on "divided responsibility" holds up, but even if it did the question on subjective anticipation remains unanswered.

[-]Stuart_Armstrong9y20

"If, in making decisions, you (an SIA agent) arbitrarily choose to divide your utility for a world by the number of subjectively indistinguishable agents in that world in the given state of information, then you end up with the same decisions as an SSA agent!"

Yes, that's essentially it. However, the idea of divided responsibility has been proposed before (though not in those terms) - it's not just a hack I made up. Basic idea is, if ten people need to vote unanimously "yes" for a policy that benefits them all, do they each consider that their vote made the difference between the policy and no policy, or that it contributed a tenth of that difference? Divided responsibility actually makes more intuitive sense in many ways, because we could replace the unanimity requirement with "you cause 1/10 of the policy to happen" and it's hard to see what the difference is (assuming that everyone votes identically).

But all these approaches (SIA and SSA and whatever concept of responsibility) fall apart when you consider that UDT allows you to reason about agents that will make the same decision as you, even if they're not subjectively indistinguishable from you. Anthropic probability can't deal with these - worse, it can't even consider counterfactual universes where "you" don't exist, and doesn't distinguish well between identical copies of you that have access to distinct, non-decision relevant information.

the question on subjective anticipation remains unanswered.

Ah, subjective anticipation... That's an interesting question. I often wonder whether it's meaningful. If we create 10 identical copies of me and expose 9 of them one stimuli and 1 to another, what is my subjective anticipation of seeing one stimuli over the other? 10% is one obvious answer, but I might take a view of personal identity that fails to distinguish between identical copies of me, in which case 50% is correct. What if identical copies will be recombined later? Eliezer had a thought experiment where agents were two dimensional, and could get glued or separated from each other, and wondered whether this made any difference. I do to. And I'm also very confused about quantum measure, for similar reasons.

[-]lackofcheese9y10

OK, the "you cause 1/10 of the policy to happen" argument is intuitively reasonable, but under that kind of argument divided responsibility has nothing to do with how many agents are subjectively indistinguishable and instead has to do with the agents who actually participate in the linked decision.

On those grounds, "divided responsibility" would give the right answer in Psy-Kosh's non-anthropic problem. However, this also means your argument that SIA+divided = SSA+total clearly fails, because of the example I just gave before, and because SSA+total gives the wrong answer in Psy-Kosh's non-anthropic problem but SIA+divided does not.

Ah, subjective anticipation... That's an interesting question. I often wonder whether it's meaningful.

As do I. But, as Manfred has said, I don't think that being confused about it is sufficient reason to believe it's meaningless.

[-]Stuart_Armstrong9y10

The divergence between reference class (of identical people) and reference class (of agents with the same decision) is why I advocate for ADT (which is essentially UDT in an anthropic setting).

[-]Stuart_Armstrong10y10

Here's one highly charitable rephrasing. "If I give two people different information, then there exists some set of values such that they'll make the same decision anyhow."

No, they have the same values. And same information. Just different decision theories (approximately CDT vs EDT).

[-]Manfred10y10

As I have previously argued against this "different probabilities but same information" line and you just want to repeat it, I doubt there's much value in going further down this path.

Moderation Log