# 13

I

When preferences are selfless, anthropic problems are easily solved by a change of perspective. For example, if we do a Sleeping Beauty experiment for charity, all Sleeping Beauty has to do is follow the strategy that, from the charity's perspective, gets them the most money. This turns out to be an easy problem to solve, because the answer doesn't depend on Sleeping Beauty's subjective perception.

But selfish preferences - like being at a comfortable temperature, eating a candy bar, or going skydiving - are trickier, because they do rely on the agent's subjective experience. This trickiness really shines through when there are actions that can change the number of copies. For recent posts about these sorts of situations, see Pallas' sim game and Jan_Ryzmkowski's tropical paradise. I'm going to propose a model that makes answering these sorts of questions almost as easy as playing for charity.

To quote Jan's problem:

It's a cold cold winter. Radiators are hardly working, but it's not why you're sitting so anxiously in your chair. The real reason is that tomorrow is your assigned upload, and you just can't wait to leave your corporality behind. "Oh, I'm so sick of having a body, especially now. I'm freezing!" you think to yourself, "I wish I were already uploaded and could just pop myself off to a tropical island."

And now it strikes you. It's a weird solution, but it feels so appealing. You make a solemn oath (you'd say one in million chance you'd break it), that soon after upload you will simulate this exact scene a thousand times simultaneously and when the clock strikes 11 AM, you're gonna be transposed to a Hawaiian beach, with a fancy drink in your hand.

It's 10:59 on the clock. What's the probability that you'd be in a tropical paradise in one minute?

So question one is the probability question: what's your probability that you go to the tropical paradise? And question two is the decision problem: is this actually a good idea?

The probability question is straightforward, and is indeed about a 1000/1001 chance of tropical paradise. If this does not make sense, feel free to ask about it, or go check out these two rambling complementary posts: Deriving probabilities from causal diagrams, More marbles and Sleeping Beauty.

One might then make an argument about the decision question that goes like this: Before I swore this oath, my probability of going to a tropical island was very low. After, it was very high. Since I really like tropical islands, this is a great idea. In a nutshell, I have increased my expected utility by making this oath.

The counterargument is also simple, though: Making copies of myself has no causal effect on me. Swearing this oath does not move my body to a tropical paradise. What really happens is that I just sit there in the cold just the same, but then later I make some simulations where I lie to myself. This is not a higher-utility universe than the one where I don't swear the oath.

Hopefully you can see how this is confusing.

II

So, my proposal, in short form: You are a person. I mean this not in the abstract, non-causal, sense, where if I make a copy of you and then shoot you, "you live on." I mean that the isolated causal agent reading this is a person capable of selfish desires, where if you are one of two copies and I give the other copy a candy bar, your selfish desires for eating candy are unfulfilled1. Choose as if you were controlling the output of your decision algorithm, so that you maximize your expected utility, including selfish desires (if you have them), conditioned on the fact that you exist (I'll come back to what this last bit means in part III).

This is at its heart porting TDT to anthropic problems. When there is a decision your original body can make that creates a bunch of copies, and the copies are also faced with this decision, your decision lets you control whether you are the original or a copy. If you don't want to be a copy, as in Pallas' sim game, you have to take that into account. If you do want to be a copy, you take that into account too.

This leads to biting the bullet in Jan's tropical paradise. It is actually a good idea to take an action that, if you're the original body, creates a bunch of high-selfish-expected-utility copies that also undergo the decision you're making right now, because this decision controls whether you're one of those copies.

There is an important caveat: this only holds if you truly would like to be one of those copies. To repeat the short form, this decision algorithm assumes that you are a person trying to increase their own expected utility. These copies can't just be made and disposed of to manipulate your subjective experience - something which is possible, but is a bad idea. The copies have to be people who you would actually like to be, who go on to live long, fulfilling lives. This is not about gaming the system. It's just an extension of TDT to anthropic situations.

Interestingly, this usually gives the same results as playing for charity. Thus there's a sort of locality of money, where you make similar tradeoffs between selfish spending and charitable spending no matter how many copies of you there are.

To deliberately construct an intuitive case, imagine that you are already uploaded, and you're led into a room (a simulated room, of course) where Omega is waiting for you. Omega says hello, and asks you whether you think you're the original or a copy. "Huh? Have I been copied?" you say. Excellent, Omega says. It then presents two boxes to you, box A and box B. Box A always has some okay candy in it for you to eat (eating candy is my go-to selfish reward). If you pick box B and are the original, it is empty, but you will be copied a million times from a snapshot when you entered the room, and offered the same choice - and if you are a copy, box B contains very delicious candy to eat (and then the copies go on to live similar lives to the original). Again there's the odd property that the output of your decision algorithm controls whether you are likely a copy or not. If you would prefer to be a copy, then you should pick box B.

There's a precommitment problem here. Suppose I value my future selves by a sum of their utilities (given some zero point). Then even if being a copy was not so great (like in Pallas' sim game), I'd precommit to making as many copies as possible. But once the game starts, by my definition of selfish preferences I don't care much about whether the other copies get a selfish reward, and so I might try to fight that precommitment to raise my expected utility.

In fact, these precommitment problems crop up whenever I calculate expected value in any other way than by averaging utility among future copies. This is a statement about a small piece of population ethics, and as such, should be highly suspect - the fact that my preferred model of selfish preferences says anything about even this small subset of population ethics makes me significantly less confident that I'm right. Even though the thing it's saying seems sensible.

Footnote 1: The reader who has been following my posts may note how this identification of who has the preferences via causality makes selfish preferences well-defined no matter how many times I define the pattern "I" to map to my brain, which is good because it makes the process well-defined, but also somewhat difficult because it eliminates the last dependence on a lower level where we can think of anthropic probabilities as determined a priori, rather than depending on a definition of self grounded in decision-making as well as experiencing. On the other hand, with that level conflict gone, maybe there's nothing stopping us from thinking of anthropic probabilities on this more contingent level as "obvious" or "a priori."

III

It's worth bringing up Eliezer's anthropic trilemma (further thought by Katja Grace here). The idea is to subjectively experience winning the lottery by entering a lottery and then replicating yourself a trillion times, wake up to have the experience, and then merge back together. Thus, the argument goes, as long as probability flows along causal channels, by waking up a trillion times I have captured the subjective experience, and will go on to subjectively experience winning the lottery.

Again we can ask the two questions: What are the probabilities? And is this actually a good idea?

This is the part where I come back to explain that earlier terminology - why is it important that I specified that you condition on your own existence? When you condition on the fact that you exist, you get an anthropic probability. In the story about Omega I told above, your probability that you're the original before you enter the room is 1. But after you enter the room, if your decision algorithm chooses box B, your probability that you're the original should go down to one in a million. This update is possible because you're updating on new information about where you are in the game - you're conditioning on your own existence.

Note that I did not just say "use anthropic probabilities." When calculating expected utility, you condition on your own existence, but you most certainly do not condition on future selves' existence. After all, you might get hit by a meteor and die, so you don't actually know that you'll be around tomorrow, and you shouldn't condition on things you don't know. Thus the player at russian roulette who says "It's okay, I'll subjectively experience winning!" is making a decision by conditioning on information they do not have.

Katja Grace talks about two principles acting in the Anthropic Trilemma: Follow The Crowd, which sends your subjective experience into the branch with more people, and Blatantly Obvious Principle, which says that your subjective experience should follow causal paths. Katja points out that they do not just cause problems when merging, they also conflict when splitting - so Eliezer is being selective in applying these principles, and there's a deeper problem here. If you recall me mentioning my two-fluid model of anthropics, I partially resolved this by tracking two measures, one that obeyed FTC (subjective probability), and one that obeyed BOP (magic reality fluid).

But the model I'm presenting here dissolves those fluids (or would it be 'dilutes'?) - the thing that follows the crowd is who you think you are, and the blatantly obvious thing is your expectation for the future. There's no subjective experience fluid that it's possible to push around without changing the physical state of the universe. There's just people.

To give the probabilities in the Anthropic Trilemma, it is important to track what information you're conditioning on. If I condition on my existence just after I buy my ticket, my probability that I picked the winning numbers is small, no matter what anthropic hijinks might happen if I win, I still expect to see those hijinks happen with low probability2. If I condition on the fact that I wake up after possibly being copied, my probability that I picked the winning numbers is large, as is my probability that I will have picked the winning numbers in the future, even if I get copied or merged or what have you. Then I learn the result, and no longer have a single state of information which would give me a probability distribution. Compare this to the second horn of the trilemma; it's easy to get mixed up when giving probabilities if there's more than one set of probabilities to give.

Okay, so that's the probabilities - but is this actually a good idea? Suppose I'm just in it for the money. So I'm standing there considering whether to buy a ticket, and I condition on my own existence, and the chances of winning still look small, and so I don't buy the ticket. That's it. This is especially clear if I donate my winnings to charity - the only winning move is not to play the lottery.

Suppose then instead that I have a selfish desire to experience winning the lottery, independent of the money - does copying myself if I win help fulfill this desire? Or to put this another way, in calculating expected utility we weight the selfish utility of the many winning copies less because winning is unlikely, but do we weight it more because there are more of them?

This question is resolved by (possible warning sign) the almost-population-ethics result above, which says that as an attractor of self-modification we should average copies' utilities rather than summing them, and so copying does not increase expected utility. Again, I find this incompletely convincing, but it does seem to be the extension of TDT here. So this procedure does not bite the bullet in the anthropic trilemma. But remember the behavior in Jan's tropical paradise game? It is in fact possible to design a procedure that lets you satisfy your desire to win the lottery - just have the copies created when you win start from a snapshot of yourself before you bought the lottery ticket.

This is a weird bullet to bite. It's like, how come it's a good idea to create copies that go through the decision to create copies, but only a neutral idea to create copies that don't? After all, winning and then creating simulations has the same low chance no matter what. The difference is entirely anthropic - only when the copies also make the decision does the decision control whether you're a copy.

Footnote 2: One might complain that if you know what you'll expect in the future, you should update to believing that in the present. But if I'm going to be copied tomorrow, I don't expect to be a copy today.

IV

The problem of the Anthropic Trilemma is not actually gone, because if I'm indifferent to merging with my copies, there is some procedure that better fulfills my selfish desire to experience winning the lottery just by shuffling copies of me around: if I win, make a bunch of copies that start from a snapshot in the past, then merge a the copies together.

So let's talk about the merging. This is going to be the section with the unsolved problem.

Here's what Eliezer's post says about merging:

Just as computer programs or brains can split, they ought to be able to merge.  If we imagine a version of the Ebborian species that computes digitally, so that the brains remain synchronized so long as they go on getting the same sensory inputs, then we ought to be able to put two brains back together along the thickness, after dividing them.  In the case of computer programs, we should be able to perform an operation where we compare each two bits in the program, and if they are the same, copy them, and if they are different, delete the whole program.  (This seems to establish an equal causal dependency of the final program on the two original programs that went into it.  E.g., if you test the causal dependency via counterfactuals, then disturbing any bit of the two originals, results in the final program being completely different (namely deleted).)

In general, merging copies is some process where many identical copies go in, and only one comes out. If you know they're almost certainly identical, why bother checking them, then? Why not just delete all but one? It's the same pattern, after all.

Well, imagine that we performed a causal intervention on one of these identical copies - gave them candy or something. Now if we deleted all but one, the effect of our intervention is erased with high probability. In short, if you delete all but one, the person who comes out is not actually the causal descendant of the copies who go in - it's just one of the copies.

Just like how "selfish preferences" means that if I give another of your copies candy, that doesn't fulfill your selfish desire for candy, if another of your copies is the one who gets out of the murder-chamber, that doesn't fulfill your selfish desire to not get murdered. This is why Eliezer talks about going through the process of comparing each copy bit by bit and only merging them if they're identical, so that the person who comes out is the causal descendant of all the people who go in.

On the other hand, Eliezer's process is radically different from how things normally go. If I'm one of several copies, and a causal intervention gives me candy, and no merging shenanigans occur, then my causal descendant is me who's had some candy. If I'm one of several copies, and a causal intervention gives me candy, and then we're merged by Eliezer's method, then my causal descendant is utterly annihilated.

If we allow the character of causal arrows to matter, and not merely their existence, then it's possible that merging is not so neutral after all. But this seems like a preference issue independent of the definition of selfish preferences - although I would have said that about how to weight preferences of multiple copies, too, and I would likely have been wrong.

Does the strange behavior permitted by the neutrality of merging serve as a reductio of that neutrality, or of this extension of selfish preferences to anthropic information, or neither? In the immortal words of Socrates, "... I drank what?"

EDIT:

A Problem:

This decision theory has precommitment issues. In the case of Jan's tropical paradise, I want to precommit to creating satisfied copies from a snapshot of my recent self. But once I'm my future self, I don't want to do it because I know I'm not a copy.

Metaproblems:

This decision theory doesn't have very many knobs to turn - it boils down to "choose the decision-algorithm output that causes maximum expected utility for you, conditioning on both the action and the information you possess." This is somewhat good news, because we don't much want free variables in a decision theory. But it's a metaproblem because it means that there's no obvious knob to turn to eliminate the problem above - creativity is required.

One approach that has worked in the past is to figure out what global variable we want to maximize, and just do UDT to this problem. But this doesn't work for this decision theory - as we expected, because it doesn't seem to work for selfish preferences in general. The selves at two different times in the tropical paradise problem just want to act selfishly - so are they allowed to be in conflict?

Solution Brainstorming (if one is needed at all):

One specific argument might run that when you precommit to creating copies, you decrease your amount of indexical information, and that this is just a form of lying to yourself and is therefore bad. I don't think this works at all, but it may be worth keeping in mind.

A more promising line might be to examine the analogy to evidential decision theory. Evidential decision theory fails when there's a difference between conditioning on the action and conditioning on a causal do(Action). What does the analogue look like for anthropic situations?

EDIT 2:

For somewhat of a resolution, see Selfish preferences and self-modification.

# 13

Mentioned in
New Comment
[-][anonymous]50

The probability question is straightforward, and is indeed about a 1000/1001 chance of tropical paradise. If this does not make sense, feel free to ask about it,

To me, this seems to neglect the prospect of someone else simulating the exact scene a bunch more times, somewhere out in time and space. To me, once you've cut yourself loose of Occam's Razor/Kolmogorov Complexity and started assigning probabilities as frequencies throughout a space-time continuum in which identical subjective agent-moments occur multiply, you have long since left behind Cox's Theorem and the use of probability to reason over limited information.

this seems to neglect the prospect of someone else simulating the exact scene a bunch more times, somewhere out in time and space

This is true - and I do think the probability of this is negligible. Additional simulations of our universe wouldn't change the probabilities - you'd need the simulator to interfere in a very specific way that seems unlikely to me.

once you've cut yourself loose of Occam's Razor/Kolmogorov Complexity and started assigning probabilities as frequencies throughout a space-time continuum in which identical subjective agent-moments occur multiply

Why do those conflict at all? I feel like you may be talking about a nonstandard use of occam's razor.

long since left behind [...] the use of probability

What probability do you give the simulation hypothesis?

[-][anonymous]30

What probability do you give the simulation hypothesis?

Some extremely low prior based on its necessary complexity.

This is true - and I do think the probability of this is negligible.

No, you have no information about that probability. You can assign a complexity prior to it and nothing more.

Why do those conflict at all? I feel like you may be talking about a nonstandard use of occam's razor.

They conflict because you have two perspectives, and therefore two different sets of information, and therefore two very different distributions. Assume the scenario happens: the person running the simulation from outside has information about the simulation. They have the evidence necessary to defeat the low prior on "everything So and So experiences is a simulation". So and So himself... does not have that information. His limited information, from sensory data that exactly matches the real, physical, lawful world rather than the mutable simulated environment, rationally justifies a distribution in which, "This is all physically real and I am in fact not going to a tropical paradise in the next minute because I'm not a computer simulation" is the Maximum a Posteriori hypothesis, taking up the vast majority of the probability mass.

So, the standard Bayesian analogue of Solomonoff induction is to put a complexity prior over computable predictions about future sensory inputs. If the shortest program outputting your predictions looks like a specification of a physical world, and then an identification of your sensory inputs within that world, and the physical world in your model has both a meatspace copy of you and a simulated copy of you, the only difference in this Solomonoff-analogous prior between being a meat-person and a chip-person is the complexity of identifying your sensory inputs. I think it is unfounded substrate chauvinism to think that your sensory inputs are less complicated to specify than those of an uploaded copy of yourself.

[-][anonymous]10

If the shortest program outputting your predictions looks like a specification of a physical world, and then an identification of your sensory inputs within that world, and the physical world in your model has both a meatspace copy of you and a simulated copy of you, the only difference in this Solomonoff-analogous prior between being a meat-person and a chip-person is the complexity of identifying your sensory inputs.

Firstly, this isn't a Solomonoff-analogous prior. It is the Solomonoff prior. Solomonoff Induction is Bayesian.

Secondly, my objection is that in all circumstances, if right-now-me does not possess actual information about uploaded or simulated copies of myself, then the simplest explanation for physically-explicable sensory inputs (ie: sensory inputs that don't vary between physical and simulated copies), the explanation with the lowest Kolmogorov complexity, is that I am physical and also the only copy of myself in existence at the present time.

This means that the 1000 simulated copies must arrive to an incorrect conclusion for rational reasons: the scenario you invented deliberately, maliciously strips them of any means to distinguish themselves from the original, physical me. A rational agent cannot be expected to necessarily win in adversarially-constructed situations.

I feel like you may be talking about a nonstandard use of occam's razor.

It's the basis for a common use. However this seems pretty clearly wrong or incomplete.

I think the grandparent's argument really had more to do with "reason(ing) over limited information" vs frequencies in a possibly infinite space-time continuum. That still seems like a weak objection, given that anthropics look related to the topic of fixing Solomonoff induction.

Why average utility of my descendants/copies, instead of total utility? Total utility seems to give better answers. Total utility implies that if copies have better-than-nothing lives, more is better. But that seems right, for roughly the same reason that I don't want to die in my sleep tonight: it deprives me of good future days. Suppose I learn that I will soon lose long-term (>24 hr) episodic memory, so that every future day will be disconnected from every other, but my life will otherwise be good. Do I still prefer a long life over a one-more-day life? I think yes. But now my days might as well, for all practical and ethical purposes, be lived parallel instead of serially.

With total utility, there is only a very ordinary precommitment problem in Tropical Paradise, provided one important feature. The important feature is that uploaded-me should not be overburdened. Suppose uploaded-me can only afford to make simultaneously-running copies on special occasions, and is reluctant to waste that on this project. That seems reasonable. If uploaded me has to sacrifice 1000 minutes of warm fuzzy feelings to give me one minute of hope now, that's not worth it. On the other hand, if he only has to do this once - giving me a 50/50 hope right now - that may well be worth it.

Let's make up some numbers. My present wintry blast with no hope of immediate relief, let's give a utility of zero per minute. Wintry blast with 50/50 hope, 6 per minute. Wintry blast with 999/1000 hope, 8 per minute. Tropical paradise, 10 per minute. Summing over all the me and future-me minutes gives the best result with only a single reliving of Winter.

Upload-me makes the sacrifice of 1 minute for basically the same reason Parfit's hitch-hiker pays his rescuer.

. Suppose I learn that I will soon lose long-term (>24 hr) episodic memory, so that every future day will be disconnected from every other, but my life will otherwise be good. Do I still prefer a long life over a one-more-day life?

Under the model of selfish preferences I use in this post, this is an interesting situation. Suppose that you go to sleep in the same room every night, and every morning you wake up with only your long-term memories (Or your brain is overwritten with the same brain-state every morning or something). Suppose you could give up some tasty candy now for tasty candy every day of your illness. If you eat the candy now, are you robbing yourself of a bunch of future candy, and making a terrible mistake? And yet, every morning a new causal branch of you will wake up, and from their perspective they merely ate their candy a little earlier.

One could even defend not letting yourself get killed off after one day as an altrustic preference rather than a selfish one.

But really this all derived from resolving one single conflict - if there are multiple different conflicts there are multiple solutions. So I'm not really sure - as I hope I sufficiently emphasized, I do not trust this population ethics result.

If you eat the candy now, are you robbing yourself of a bunch of future candy, and making a terrible mistake? And yet, every morning a new causal branch of you will wake up, and from their perspective they merely ate their candy a little earlier.

Cool, this leads me to a new point/question. You've defined "selfish" preference in terms of causal flows. I'd like to point out that those flows are not identity-relation-like. Each future branch of me wakes up and sees a one-to-one tradeoff: he doesn't get candy now, but he got it earlier, so it's a wash. But those time-slices aren't the decider, this current one is. And from my perspective now, it's a many-to-one tradeoff; those future days are all connected to me-now. This is possible because "A is causally connected to B" is intransitive. Isn't this the correct implication of your view? If not, then what?

Well, the issue is in how one calculates expected utility from a description of the future state of the world. If my current self branches into many causal descendants, and each descendant gets one cookie, there does not appear to be a law of physics that requires me to give that the expected utility of one cookie or many cookies.

It's absolutely a many to one tradeoff, that just isn't sufficient to determine how to value it.

However, if one requires that the ancestor and the descendants agree (up to time discounting and selection effects - which are where you value a cookie in 100 years less if you expect to die before then) about the value of a cookie, then that sets a constraint on how to calculate expected utility.

Fair enough. Of course, there's no law of physics ruling out Future Tuesday Indifference, either. We go by plausibility and elegance. Admittedly, "average the branches" looks about equally plausible and elegant to "sum the branches", but I think the former becomes implausible when we look at cases where some of the branches are very short-lived.

Requiring that the ancestor and descendants agree is contrary to the spirit of allowing selfish preferences, I think, in the sense of "selfish" that you've defined. If Methuselah is selfish, Methuselah(1000AD) values the experience of Methuselah(900AD), who values the experience of Methuselah(800AD), but M1000 doesn't value the experience of M800.

I think the former becomes implausible when we look at cases where some of the branches are very short-lived.

As the caveat goes, "The copies have to be people who you would actually like to be." Dying quickly seems like it would really put a damper on the expected utility of being a copy. (Mathematically, the relevant utility here is a time-integral)

I don't see why your claims about Methuselah follow, but I do agree that under this model, agents don't care about their past self - they just do what causes them to have high expected utility. Strictly, this is possible independent of whether descendants and ancestors agree or disagree. But if self-modification is possible, such conflicting selfish preferences would get modified away into nonconflicting selfless preferences.

Dying quickly seems like it would really put a damper on the expected utility of being a copy.

Not if the copy doesn't anticipate dying. Perhaps all the copies go thru a brief dim-witted phase of warm happiness (and the original expects this), in which all they can think is "yup warm and happy, just like I expected", followed by some copies dying and others recovering full intellect and living. Any of those copies is someone I'd "like to be" in the better-than-nothing sense. Is the caveat "like to be" a stronger sense?

I'm confused - if agents don't value their past self, in what sense do they agree or disagree with what the past-self was valuing? In any case, please reverse the order of the Methuselah valuing of time-slices.

Edit: Let me elaborate a story to motivate my some-copies-dying posit. I want to show that I'm not just "gaming the system," as you were concerned to avoid using your caveat.

I'm in one spaceship of a fleet of fast unarmed robotic spaceships. As I feared but planned for, an enemy fleet shows up. This spaceship will be destroyed, but I can make copies of myself in one to all of the many other ships. Each copy will spend 10 warm-and-fuzzy dim-witted minutes reviving from their construction. The space battle will last 5 minutes. The spaceship at the farthest remove from the enemy has about a 10% chance of survival. The next-farthest has a 9 point something percent chance - and so on. The enemy uses an indeterministic algorithm to chase/target ships, so these probabilities are almost independent. If I copy to all the ships in the fleet, I have a very high probability of survival. But the maximum average expected utility is gotten by copying to just one ship.

I'm tapping out, sorry.

One might then make an argument about the decision question that goes like this: Before I swore this oath, my probability of going to a tropical island was very low. After, it was very high. Since I really like tropical islands, this is a great idea. In a nutshell, I have increased my expected utility by making this oath.

If it is indeed in your power to swear and execute such an oath, then "I will make an oath to simulate this event and make such-and-such changes" is a legitimate event that would impact any probability calculation. Before swearing the oath, there was still the probability of you swearing it in the future and executing it.

The probability of going to a tropical island given that the oath was made is likely higher than it was before the oath was made, but the only way it would be significantly higher is if there was a very low probability of the oath being made in the first place.

This is identical to the problem with causal decision theory which goes "If determinism is true, I'm already certain to make my decision, so how can I worry about its causal impacts?"

The answer is that you swear the oath because you calculated what would happen if (by causal surgery) your decision procedure output something else. This calculation gets done regardless of determinism - it's just how this decision procedure goes.