"Solving" selfishness for UDT


With many thanks to Beluga and lackofcheese.

When trying to decide between SIA and SSA, two anthropic probability theories, I concluded that the question of anthropic probability is badly posed and that it depends entirely on the values of the agents. When debating the issue of personal identity, I concluded that the question of personal identity is badly posed and depends entirely on the values of the agents. When the issue of selfishness in UDT came up recently, I concluded that the question of selfishness is...

But let's not get ahead of ourselves.

A selfish scenario

Using Anthropic Decision Theory, I demonstrated that selfish agents using UDT should reason in the same way that average utilitarians did - essentially behaving 'as if' SSA were true and going for even odds of heads and tail ("halfer") in the Sleeping Beauty problem.

Then Beluga posted an argument involving gnomes, that seemed to show that selfish UDT agents should reason as total utilitarians did - essentially behaving 'as if' SIA were true and going for 2:1 odds of heads and tail ("thirder") in the Sleeping Beauty problem. After a bit of back and forth, lackofcheese then refined the argument. I noticed the refined argument was solid, and incidentally made the gnomes unnecessary.

How does the argument go? Briefly, a coin is flipped and an incubator machine creates either one person (on heads) or two people (on tails), each in separate rooms.

Without knowing what the coin flip was or how many people there were in the universe, every new person is presented with a coupon that pays £1 if the coin came out tails. The question is - assuming utility is linear in money - what amount £x should the created person(s) pay for this coupon?

The argument from Beluga/lackofcheese can be phrased like this. Let's name the people in the tails world, calling them Jack and Roger (yes, they like dressing like princesses - what of it?). Each of them reasons something like this:

"There are four possible worlds here. In the tails world, I, Jack/Roger, could exist in Room 1 or in Room 2. And in the heads world, it could be either me existing in Room 1, or the other person existing in Room 1 (in which case I don't exist). I'm completely indifferent to what happens in worlds where I don't exist (sue me, I'm selfish). So if I buy the coupon for £x, I expect to make utility: 0.25(0) + 0.25(-x) + 0.5(1-x)=0.5-0.75x. Therefore I will buy the coupon for x<£2/3."

That seems a rather solid argument (at least, if you allow counterfactuals into worlds where you don't exist, which you probably should). So it seems I was wrong and that selfish agents will indeed go for the SIA-like "thirder" position.

Not so fast...


Another selfish scenario

The above argument reminded me of one I made a long time ago, when I "proved" that SIA was true. I subsequently discarded that argument, after looking more carefully into the motivations of the agents. So let's do that now.

Above, I was using a subtle intuition pump by using the separate names Jack and Roger. That gave connotations of "I, Jack, don't care about worlds in which I, Jack, don't exist..." But in the original formulation of the Sleeping Beauty/incubator problem, the agents were strictly identical! There is no Jack versus Roger issues - at most, these are labels, like 1 and 2.

It therefore seems possible that the selfish agent could reason:

"There are three possible worlds here. In the tails world, I either exist in Room 1 or Room 2. And in the heads world, either I exist in Room 1, or an identical copy of me exists in Room 1, and is the only copy of me in that world. I fail to see any actual difference between those two scenarios. So if I buy the coupon for £x, I expect to make utility: 0.5(-x) + 0.5(1-x)=0.5-x. Therefore I will buy the coupon for x<£1/2."

The selfish agent seems on rather solid ground here in their heads world reasoning. After all, would we treat someone else differently if we were told "That's not actually your friend; instead it's a perfect copy of your friend, while the original never existed"?

Notice that even if we do allow for the Jack/Roger distinction, it seems reasonable for the agent to say "If I don't exist, I value the person that most closely resembles me." After all, we all change from moment to moment, and we value our future selves. This idea is akin to Nozick's "closest continuer" concept.


Each selfish person is selfish in their own unique way

So what is really going on here? Let's call the first selfish agent a thirder-selfish agent, and the second a halfer-selfish agent. Note that both types of agents have perfectly consistent utility functions defined in all possible actual and counterfactual universes (after giving the thirder-selfish agent some arbitrary constant C, which we may as well set to zero, in worlds where they don't exist). Compare the two versions of Jack's utility:

 "Jack in Room 1"
"Roger in Room 1"
 Heads: buy coupon 
-x/-x 0/-x
 Heads: reject coupon 
0/0 0/0
 Tails: buy coupon 
1-x/1-x 1-x/1-x
 Tails: reject coupon 
0/0 0/0

The utilities are given as thirder-selfish utility/halfer-selfish utility. The situation where there is a divergence is indicated in bold - that one difference is key to their different decisions.

At this point, people could be tempted to argue as to which type of agent is genuinely the selfish agent... But I can finally say:


  • The question of selfishness is badly posed and depends entirely on the values of the agents.


What do I mean by that? Well, here is a selfish utility function: "I expect all future copies of Stuart Armstrong to form a single continuous line through time, changing only slowly, and I value the happiness (or preference satisfaction) of all these future copies. I don't value future copies of other people."

That seems pretty standard selfishness. But this is not a utility function; it's a partial description of a class of utility functions, defined only in one set of universes (the set where there's a single future timeline for me, without any "weird" copying going on). Both the thirder-selfish utility function and the halfer-selfish one agree in such single timeline universes. They are therefore both extensions of the same partial selfish utility to more general situations.

Arguing which is "correct" is pointless. Both will possess all the features of selfishness we've used in everyday scenarios to define the term. We've enlarged the domain of possible scenarios beyond the usual set, so our concepts, forged in the usual set, can extend in multiple ways.

You could see the halfer-selfish values as a version of the "Psychological Approach" to personal identity: it values the utility of the being closest to itself in any world. A halfer-selfish agent would cheerfully step into a teleporter where they are scanned, copied onto a distant location, then the original is destroyed. The thirder-selfish agent might not. Because the thirder-selfish agent is actually underspecified: the most extreme version would be one that does not value any future copies of themselves. They would indeed "jump off a cliff knowing smugly that a different person would experience the consequence of hitting the ground." Most versions of the thirder-selfish agent that people have in mind are less extreme than that, but defining (either) agent requires quite a bit of work, not simply a single word: "selfish".

So it's no wonder that UDT has difficulty with selfish agents: the concept is not well defined. Selfish agent is like "featherless biped" - a partial definition that purports to be the whole of the truth.


Personal identity and values

Different view of personal identity can be seen as isomorphic with a particular selfish utility function. The isomorphism is simply done by caring about the utility of another agent if and only if they share the same personal identity.

For instance, the psychological approach to personal identity posits that "You are that future being that in some sense inherits its mental features—beliefs, memories, preferences, the capacity for rational thought, that sort of thing—from you; and you are that past being whose mental features you have inherited in this way." Thus a psychological selfish utility function would value the preferences of a being that was connected to the agent in this way.

The somatic approach posits that "our identity through time consists in some brute physical relation. You are that past or future being that has your body, or that is the same biological organism as you are, or the like." Again, this can be used to code up a utility function.

Those two approaches (psychological and somatic) are actually broad categories of approaches, all of which would have a slightly different "selfish" utility function. The non-branching view, for instance, posits that if there is only one future copy of you, that is you, but if there are two, there is no you (you're effectively dead if you duplicate). This seems mildly ridiculous, but it still expresses very clear preferences over possible worlds that can be captured in a utility function.

Some variants allow for partial personal identity. For instance, discounting could be represented by a utility function that puts less weight on copies more distant in the future. If you allow "almost identical copies", then these could be represented by a utility function that gives partial credit for similarity along some scale (this would tend to give a decision somewhere in between the thirder and halfer situations presented above).

Many of the "paradoxes of identity" dissolve entirely when one uses values instead of identity. Consider the intransitivity problem for some versions of psychological identity:

First, suppose a young student is fined for overdue library books. Later, as a middle-aged lawyer, she remembers paying the fine. Later still, in her dotage, she remembers her law career, but has entirely forgotten not only paying the fine but everything else she did in her youth. [...] the young student is the middle-aged lawyer, the lawyer is the old woman, but the old woman is not the young student.

In terms of values, this problem is non-existent: the young student values herself, the lawyer and the old woman (as does the lawyer) but the old woman only values herself and the lawyer. That value system is inelegant, perhaps, but it's not ridiculous (and "valuing past copies" might be decision-relevant in certain counterfactual situations).

Similarly, consider the question as to whether it is right to punish someone for the law-breaking of a past copy of themselves. Are they the same person? What if, due to an accident or high technology, the present copy has no memory of law-breaking or of being the past person? Using identity gets this hopelessly muddled, but from a consequentialist deterrence perspective, the answer is simple. The past copy presumably valued their future copy staying out of jail. Therefore, from the deterrence perspective, we should punish the current copy to deter such actions. In courts today, we might allow amnesia to be a valid excuse, simply because amnesia is so hard and dangerous to produce deliberately. But this may change in the future: if it becomes easy to rewire your own memory, then deterrent punishment will need to move beyond the classical notions of identity and punish people we would currently consider blameless.


Evolution and identity

Why are we convinced that there is such a thing as selfishness and personal identity? Well, let us note that it is in the interest of evolution that we believe in it. The "interests" of the genes are to be passed on, and so they benefit if the carrier of the gene in the present values the survival of the (same) carrier of the gene in the future. The gene does not "want" the carrier to jump off a cliff, because whatever the issues of personal identity, it'll be the same gene in the body that gets squashed at the end. Similarly, future copies of yourself are the copies that you have the most control over, through your current actions. So genes have exceptionally strong interests in making you value "your" future copies. Even your twin is not as valuable as you: genetically your're equivalent, but your current decisions have less impact over them than over your future self. Thus is selfishness created.

It seems that evolution has resulted in human copies with physical continuity, influence over (future) and memories of (past), and in very strong cross-time caring between copies. These are unique to a single time line of copies, so no wonder people have seen them as "defining" personal identity. And "The person tomorrow is me" is probably more compact than saying that you care about the person tomorrow, and listing the features connecting you. In the future, the first two components may become malleable, leaving only caring (a value) as the remnants of personal identity.

This idea allows us to do something we generally can't, and directly compare the "quality" of value systems - at least from the evolutionary point of view, according to the value system's own criteria.

Here is an example of an inferior selfish decision theory: agents using CDT, and valuing all future versions of themselves, but not any other copies. Why is this inferior? Because if the agent is duplicated, they want those duplicates to cooperate and value each other equally, because that gives the current agent the best possible expected utility. But if each copy has the same utility as the agent started with, then CDT guarantees rivalry, probably to the detriment of every agent. In effect, the agent wants its future self to have different selfish/indexical values from the ones it has, in order to preserve the same overall values.

This problem can be avoided by using UDT, CDT with precommitments, or a selfish utility function that values all copies equally. Those three are more "evolutionarily stable". So is, for instance, a selfish utility function with an exponential discount rate - but not one with any other discount rate. This is an interesting feature of this approach: the set of evolutionary stable selfish decision theories is smaller than the set of selfish decision theories. Thus there are many circumstances where different selfish utilities will give the same decisions under the same decision theory, or where the different decision theories/utilities will self-modify to make identical decisions.

One would like to make an argument about Rawlsian veils of ignorance and UDT-like initial pre-commitments leading to general altruism or something... But that's another argument, for another time. Note that this kind of argument cannot be used against the most ridiculous selfish utility function of all: "me at every moment is a different person I don't value at all". Someone with that utility function will quickly die, but, according to its own utility, it doesn't see this as a problem.

To my mind, the interesting thing here is that while there are many "non-indexical" utility functions that are stable under self-modification, this is not the case for most selfish and indexical ones.