Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Today I realized that UDT doesn't have to clash with "objective" views of anthropic probability, like SSA or SIA: instead it can, in some sense, learn which of these views is true!

The argument goes like this. First I'll describe a game where SSA and SIA lead to different decisions. Unlike Sleeping Beauty, my game doesn't involve memory loss, so someone can play it repeatedly and keep memories of previous plays. Then we'll figure out what UDT would do in that game, if it valued each copy's welfare using the average of SSA and SIA. It turns out that different instances of UDT existing at once will act differently: the instances whose memories are more likely under SSA will make decisions recommended by SSA, and likewise with SIA. So from the perspective of either view, it will look like UDT is learning that view, though from an agnostic perspective we can't tell what's being learned.

First let's describe the game. Imagine there's a small chance that tomorrow you will be copied many times - more than enough to outweigh the smallness of the chance. More precisely, let's say some event happens at 1:N odds and leads to N² copies of you existing. Otherwise (at N:1 odds) nothing happens and you stay as one copy. You're offered a choice: would you rather have each copy receive a dollar if the event happens, or receive a dollar if the event doesn't happen? The former corresponds to SIA, which says you have N:1 odds of ending up in the world with lots of copies. The latter corresponds to SSA, which says you have N:1 odds of ending up in a world with no copying.

We can repeat this game for many rounds, allowing the players to keep their memories. (Everyone existing at the end of a round gets to play the next one, leading to a whole lot of people in the end.) Consider for a moment what happens when you're in the middle of it. Imagine you started out very sure of SSA, but round after round, you kept finding that you were copied. It would be like seeing a coin come up heads again and again. At some point you'll be tempted to say "what the hell, let's do a Bayesian update in favor of SIA". This post is basically trying to give a UDT justification for that intuition.

Our anthropic repeated game is quite complicated, but if SSA and SIA were equally likely, it would be equivalent to this non-anthropic game:

1) Flip a coin and make a note of the result, but don't show it to the player. This step happens only once, and determines whether the whole game will take place in "SSA world" or "SIA world".

2) Simulate the game for a fixed number of rounds, using a random number generator to choose which copy the player "becomes" next. Use either SSA (all worlds weighted equally) or SIA (worlds weighted by number of copies), depending on the coin from step 1.

This game is non-anthropic, so UDT's solution agrees with classical probability: the player should update their beliefs about the coin after each round, and bet accordingly. Since each round leads to an N:1 update in one direction or the other, the player will simply bet according to the majority of their observations so far. For example, if they "got copied" 5 times and "didn't get copied" 3 times, they should bet that they'll "get copied" next time. That's the money-maximizing strategy.

Now let's go back to the anthropic repeated game and see how UDT deals with it. More precisely, let's use the version of UDT described in this post, and make it value each copy's welfare at the end of the game as the average of that copy's SSA and SIA probabilities. (That gets rid of all randomness and gives us only one static tree of copies, weighted in a certain way.) Then UDT's strategy will be the same as in the non-anthropic game: the instances whose memories say they got copied more than half the time will "bet on getting copied" in the next round as well, and vice versa. That's surprising, because naively we expect UDT to never deviate from its starting policy of 50% SSA + 50% SIA, no matter what it sees.

Here's a simple way to understand what's happening. Yes, each instance of UDT will value its descendants using 50% SSA + 50% SIA. But if some instance's memories agree more with SIA for example, then it knows that its next decision will mostly affect descendants with high SIA weight but low SSA weight. It's pretty much the same as UDT's handling of ordinary Bayesian evidence.

This result makes me happy. Making different decisions based on anthropic evidence in favor of SSA or SIA isn't just something a human would do, it's also rational according to UDT with the same odds. Moreover, it's possible that our current evidence already strongly favors SSA or SIA, undermining the "no unique answer" view which is popular on LW.

The idea also sheds some light on the moral status of copies and nature of selfishness. If we define a UDT agent in the above way, it will have the curious property that most of its instances according to SSA will be "SSA-selfish", and the same for SIA. So we can define a robustly selfish agent by giving it a prior over different kinds of selfishness, and then letting it learn.

New to LessWrong?

New Comment
10 comments, sorted by Click to highlight new comments since: Today at 9:07 AM

I go back and forth as to whether this is a deep result or not.

It's clear that different ways of aggregating lead to different "effective anthropic probabilities". If you want to be correct in the most worlds, follow SSA; if you want most of your copies to be correct, follow SIA.

You've described a situation in which people update according to their experience in the way you describe, and, because this update is used to weigh their gain/loss, the previous copies behave as if they were using the updated values as a probability.

It seems like you could use any "updating" process - even a non-Bayesian one, one that violated conservation of expected evidence - to similar effect.

Yeah, I'm not sure which invariants hold in anthropic situations either. Can you try to come up with a toy example of such process?

Have a series of coin ten flips that create copies of the agent (tails) or not (heads). The flips are known to all agents soon after they are created.

If you take the sequences of ten full coin flips, and put any utility across your copies having $s in those universes, you can get something that looks like any sort of updating.

If you value when your copies get money - for instance, you value them getting it at HTH, but not (or not so much) after either HTHH... or HTHT..., then this looks like a non-Bayesian updating process.

Can I actually do this experiment, and thereby empirically determine (for myself but nobody else) which of SIA and SSA is true?

Yes (once you have uploaded your brain into a computer so that it can be copied). If lots of people do this, then in the end most agents will believe that SIA is true, but most descendants of most of the original agents will believe that SSA is true.

With current technology, I think the best you can do is enter a Sleeping Beauty experiment with a 10:1 coin and 100:1 ratio of waking days, guaranteeing you a 10:1 update in one direction or the other, as long as you stay in the experiment. You can even bring someone else along and they will update with you. You'll lose it when the experiment is over, though.

Like Dacyn says, the best way would be to use mind copying, but we don't have it yet.

I'm having trouble following how this works. What's wrong with the following argument?

If you're being asked this question before the event, you know that you won't be a new clone. So assuming N>2, you should always choose the world where the event happens.

I mean, I suppose you could get this situations working if you assumed a notion of psychological identity instead of something tied to your physical body, but if you are using this assumption, it should at least be acknowledged.

The beauty of the approach is that it's agnostic about such things. It can support your view (let's call it physical continuity) the same way it supports SSA and SIA. More precisely, you can give UDT a fixed measure of care about descendants that is 33% SSA, 33% SIA and 33% physical continuity. Then the post shows that UDT will make decisions as though it was adjusting these weights after each observation, so from the perspective of each view it will look like UDT is learning that view with high probability. That was the goal of the post - to give a crisp model of updating in favor of this or that anthropic assumption.

Thanks for posting this, it's an interesting idea.

I'm curious about your second-to-last paragraph: if our current evidence already favored SSA or SIA (for instance, if we knew that an event occurred in the past that had a small chance of creating a huge number of copies of each human, but we also know that we are not copies), wouldn't that already have been enough to update our credence in SSA or SIA? Or did you mean that there's some other category of possible observations, which is not obviously evidence one way or the other, but under this UDT framework we could still use it to make an update?

Thank you! No, I didn't have any new kind of evidence in mind. Just a vague hope that we could use evidence to settle the question one way or the other, instead of saying it's arbitrary. Since many past events have affected the Earth's population, it seems like we should be able to find something. But I'm still very confused about this.