UDT can learn anthropic probabilities

by cousin_it 1y24th Jun 201810 comments

63

Ω 3


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Today I realized that UDT doesn't have to clash with "objective" views of anthropic probability, like SSA or SIA: instead it can, in some sense, learn which of these views is true!

The argument goes like this. First I'll describe a game where SSA and SIA lead to different decisions. Unlike Sleeping Beauty, my game doesn't involve memory loss, so someone can play it repeatedly and keep memories of previous plays. Then we'll figure out what UDT would do in that game, if it valued each copy's welfare using the average of SSA and SIA. It turns out that different instances of UDT existing at once will act differently: the instances whose memories are more likely under SSA will make decisions recommended by SSA, and likewise with SIA. So from the perspective of either view, it will look like UDT is learning that view, though from an agnostic perspective we can't tell what's being learned.

First let's describe the game. Imagine there's a small chance that tomorrow you will be copied many times - more than enough to outweigh the smallness of the chance. More precisely, let's say some event happens at 1:N odds and leads to N² copies of you existing. Otherwise (at N:1 odds) nothing happens and you stay as one copy. You're offered a choice: would you rather have each copy receive a dollar if the event happens, or receive a dollar if the event doesn't happen? The former corresponds to SIA, which says you have N:1 odds of ending up in the world with lots of copies. The latter corresponds to SSA, which says you have N:1 odds of ending up in a world with no copying.

We can repeat this game for many rounds, allowing the players to keep their memories. (Everyone existing at the end of a round gets to play the next one, leading to a whole lot of people in the end.) Consider for a moment what happens when you're in the middle of it. Imagine you started out very sure of SSA, but round after round, you kept finding that you were copied. It would be like seeing a coin come up heads again and again. At some point you'll be tempted to say "what the hell, let's do a Bayesian update in favor of SIA". This post is basically trying to give a UDT justification for that intuition.

Our anthropic repeated game is quite complicated, but if SSA and SIA were equally likely, it would be equivalent to this non-anthropic game:

1) Flip a coin and make a note of the result, but don't show it to the player. This step happens only once, and determines whether the whole game will take place in "SSA world" or "SIA world".

2) Simulate the game for a fixed number of rounds, using a random number generator to choose which copy the player "becomes" next. Use either SSA (all worlds weighted equally) or SIA (worlds weighted by number of copies), depending on the coin from step 1.

This game is non-anthropic, so UDT's solution agrees with classical probability: the player should update their beliefs about the coin after each round, and bet accordingly. Since each round leads to an N:1 update in one direction or the other, the player will simply bet according to the majority of their observations so far. For example, if they "got copied" 5 times and "didn't get copied" 3 times, they should bet that they'll "get copied" next time. That's the money-maximizing strategy.

Now let's go back to the anthropic repeated game and see how UDT deals with it. More precisely, let's use the version of UDT described in this post, and make it value each copy's welfare at the end of the game as the average of that copy's SSA and SIA probabilities. (That gets rid of all randomness and gives us only one static tree of copies, weighted in a certain way.) Then UDT's strategy will be the same as in the non-anthropic game: the instances whose memories say they got copied more than half the time will "bet on getting copied" in the next round as well, and vice versa. That's surprising, because naively we expect UDT to never deviate from its starting policy of 50% SSA + 50% SIA, no matter what it sees.

Here's a simple way to understand what's happening. Yes, each instance of UDT will value its descendants using 50% SSA + 50% SIA. But if some instance's memories agree more with SIA for example, then it knows that its next decision will mostly affect descendants with high SIA weight but low SSA weight. It's pretty much the same as UDT's handling of ordinary Bayesian evidence.

This result makes me happy. Making different decisions based on anthropic evidence in favor of SSA or SIA isn't just something a human would do, it's also rational according to UDT with the same odds. Moreover, it's possible that our current evidence already strongly favors SSA or SIA, undermining the "no unique answer" view which is popular on LW.

The idea also sheds some light on the moral status of copies and nature of selfishness. If we define a UDT agent in the above way, it will have the curious property that most of its instances according to SSA will be "SSA-selfish", and the same for SIA. So we can define a robustly selfish agent by giving it a prior over different kinds of selfishness, and then letting it learn.

63

Ω 3