This is the followup to “Against the Linear Utility Hypothesis and the Leverage Prior” that I had promised in the comments on that post. Apologies that it took me long enough that the previous post is no longer fresh in your mind even if you read it.
Leverage prior and convergence of expected utility
In my previous post, I offered the following counterargument to the leverage prior:
Now consider the following hypothesis: “There are 3↑↑↑3 copies of you, and a Matrix Lord will approach one of them while disguised as an ordinary human, inform that copy about his powers and intentions without offering any solid evidence to support his claims, and then kill the rest of the copies iff this copy declines to pay him $5. None of the other copies will experience or hallucinate anything like this.” Of course, this hypothesis is extremely unlikely, but there is no assumption that some randomly selected copy coincidentally happens to be the one that the Matrix Lord approaches, and thus no way for a leverage penalty to force the probability of the hypothesis below 1/3↑↑↑3.
People objected that this argument doesn't work, since the hypothesis I suggested should be considered alongside other hypotheses like “there are 3↑↑↑3 copies of you, and no Matrix Lords, but copies of you get approached by random people pretending to be Matrix Lords at whatever rate you would expect by default”. This hypothesis is quite a lot more likely than the previous one, and if you pay Pascal's Mugger, then in this world, something like (3↑↑↑3)*10^-10 copies of you lose $5, whereas in the other, vastly less likely world, 3↑↑↑3 copies of you would die. An expected utility calculation should show that paying $5 is not worth it. In general, any hypothesis in which you alone have astronomically large influence can be compared to a hypothesis in which there is a similarly-sized universe filled with copies of you without astronomical influence, some of which encounter evidence that they do have such influence. This suggests using a leverage prior.
That's a good point, but on the other hand, hypotheses like “there are 3↑↑↑3 copies of you, and no Matrix Lords, but copies of you get approached by random people pretending to be Matrix Lords at whatever rate you would expect by default” have to be compared with other hypotheses like “there are (3↑↑↑3)^2 copies of you, 3↑↑↑3 of which will be approached by a Matrix Lord, each of which controls the fates of 3↑↑↑3 others, and the rest of the copies will never get approached by anyone pretending to be a Matrix Lord.” Once this possibility is taken into account as well, the linear utility hypothesis clearly implies that you should pay Pascal's mugger. The leverage prior advocate will object, “but that hypothesis must be compared with other possibilities, such as that there are (3↑↑↑3)^2 copies of you, and no Matrix Lords, but there are occasional random people pretending to be Matrix Lords. Once this possibility is taken into account, it is clear that you should not pay Pascal's mugger.” To this, I reply, “but what if there are (3↑↑↑3)^3 copies of you, (3↑↑↑3)^2 of which get approached by Matrix Lords, each of which controls the fates of 3↑↑↑3 others, and the rest of the copies will never get approached by anyone pretending to be a Matrix Lord?” And so on.
When we get to the end of all this, does the linear utility hypothesis suggest that we should pay Pascal's mugger or not? This is like asking whether 1-1+1-1+1-1+1-1+1-1+... converges to 0 or 1. You can group the terms together to make it look like it converges to 0 if you want, but it doesn't actually converge at all, and the same is true of your expected utility under the linear utility hypothesis. The argument for the leverage prior gives no reason for expected population size to be finite, and if utility is linear with respect to population size, then expected utility shouldn't converge either (and making things still more complicated, there should be nonzero probability that the population is infinite).
Incidentally, though I've been framing this in terms of choosing a policy that all copies of you follow, you could also frame it in terms of choosing an action that just one copy of you takes, and think about the probabilities that various consequences will follow from that particular copy taking certain actions. This is the framing that the leverage prior is phrased in. What is the probability that Pascal's mugger is telling the truth? Under the self-indication assumption, this is the expected number of people approached by honest Pascal's muggers divided by the expected number of people approached by possibly-dishonest Pascal's muggers. But since expected population size is infinite, the self-indication assumption fails to deliver well-defined probabilities at all. Under the self-sampling assumption, the probability that Pascal's mugger is telling the truth is on the order of 1/3↑↑↑3 if you choose a reference class appropriately, but under the linear utility hypothesis, it is the self-indication assumption, not the self-sampling assumption, that agrees with the updateless framing that you should be using instead. For instance, if a priori, there's a 1/2 chance that there is only 1 person and a 1/2 chance that there are 2 people, and in either case, everyone gets the chance to bet on which possible world they're in, then by the linear utility hypothesis, you should bet that you're in the 2-person world at odds up to 2:1, exactly the odds that the self-indication assumption would suggest. If expected population size is infinite, then the expected utilities given by the linear utility hypothesis don't converge, and the probabilities given by the self-indication assumption don't either.
Just pay the mugger?
Some people reacted to my previous post by suggesting that perhaps people actually should pay Pascal's mugger, since they accept the linear utility hypothesis, and take seriously the idea that this might mean they should pay Pascal's mugger. I don't think this is a good takeaway, because expected utilities don't converge under the linear utility hypothesis, so it's hard to say that one action has higher expected utility than some other action. In my previous post, I tried to ignore nonconvergence issues, and get preferences out of the linear utility hypothesis by restricting attention to a finite set of salient possible outcomes. I think this was a mistake. The problem with trying to get a utility function to express preferences about probability distribution on which it does not converge by restricting attention to a finite set of possible outcomes is that the result can be highly dependent on which finite set of possible outcomes you restrict attention to. For instance, we saw in the previous section that given the linear utility hypothesis, whether or not you should pay Pascal's mugger depends on which of “but what if there are the same number of people but no Matrix Lords?” or “but what if there are more Matrix Lords?” gets in the last word. The linear utility hypothesis can't say that you should pay Pascal's mugger because it isn't actually compatible with any coherent preferences at all.
Arguments for the Linear Utility Hypothesis
In my previous post, I noted that I wasn't aware of any coherent arguments for the linear utility hypothesis, so I didn't have any arguments for me to refute. Due to responses both in the comments and elsewhere, I now am aware of arguments for the linear utility hypothesis, which I will address.
People often bring up the claim that anything happening far away that won't interact with you ever again in the future can't have any bearing on what is good or bad to happen locally. This is more like a restatement of the linear utility hypothesis that makes it sound more intuitively appealing than an argument for it. It seems to me that this claim is true of most but not all decisions that people make. As an example of a way that this claim could be false, people who are far away might have preferences about the state of the rest of the universe, and you might care about their preferences getting satisfied. If so, it makes sense to adjust your actions to respect far away peoples' preferences. People sometimes act this way with respect to people who are distant in time, acting on beliefs about what dead friends or family members, or their past selves wanted. I don't think it's fair to call this behavior irrational. Thought experiments involving technology that does not yet exist allow for more dramatic examples. If there are two people in danger (call them Alice and Bob), and you have the ability to save one, if you learned that a copy of Alice had just been created, and one of her was the one in danger and the other was on her way to some distant planet to live out the rest of her life, I would see this as a point in favor of saving Bob, since it is a matter of life or death for him, but merely a matter of one life or two for Alice.
The previously mentioned claim has been defended with the claim that when people care about the experiences they're having, they mostly care about them for intrinsic reasons relating to that specific instantiation of the experience, with external events not being relevant to the valence of the experience, and altruistic external agents should care about those experiences for those same intrinsic reasons. But it seems to me that if there are multiple copies of a person, then no copy has any way of expressing a preference specifically about itself rather than the other copies. Because if copy A and copy B are thinking the exact same thoughts, making the exact same claims about their preferences, taking the exact same actions, and so on, then it seems like they must have the same preferences. But if you were to interpret copy A as having preferences specifically regarding copy A and not caring about copy B, while interpreting copy B as having preferences specifically regarding copy B and not caring about copy A, then you are interpreting copies A and B as having different preferences. So I don't think it makes sense to say that the experiences of each instantiation of a person matter independently in a way unrelated to other instantiations. Even if the copies aren't exactly identical, similar reasoning can apply if the differences between them aren't relevant to their preferences. For instance, if minds A and B have different memories of the past hour but are otherwise the same, and neither of them cares much about maintaining their memories of the last hour compared to their other preferences, then their preferences probably still shouldn't be interpreted as being specific to an individual copy. Also, an altruist might not care about someone's experiences in exactly the same way that the person having the experience cares about it in that moment. A familiar example of this is that killing someone or saving someone's life is seen as morally a much bigger deal than preventing or causing someone from being brought into existence; that is, the existence of a mind is generally taken to imply that it is more important for that mind to continue to exist than it is for a different mind having experiences similar in valence to come into existence.
Another criticism I heard is that values should be determined by abstract philosophical reasoning rather than describing our gut impulses, so arguments that the linear utility hypothesis is not compatible with common instinctive value judgments should not be seen as good arguments against the linear utility hypothesis. While I see a large role for philosophical reasoning to help us decide how to resolve the contradictions we get when trying to interpret our gut impulses as values, ignoring our gut impulses completely in favor of purely abstract philosophical reasoning sounds to me like ignoring what we actually want in favor of what we think we ought to want, which, from the perspective of what we actually want, is dangerous. I also see the purely abstract arguments for the linear utility hypothesis as weaker than the argument that utility functions should be bounded so that expected utility converges, or at the very least that utility should scale slowly enough that expected utility converges on any probability distribution you could possibly encounter in practice, and the linear utility hypothesis fails this condition.
One interesting possibility that I had been ignoring is that there could be a probability measure on the universe. This could either be objective (i.e. there could be a finite amount of magical reality fluid to go around, and the measure of an experience is the amount of magical reality fluid that goes into instantiating the experience), or it could just be about your preferences (i.e. you could act like the importance of an experience depends on its measure according to some finite measure that you have in mind, which does not have any objective basis in reality). Given such a probability measure, the linear utility hypothesis is much more viable, where the linear utility hypothesis is now interpreted as saying that utility should be the integral of some local utility function with respect to the measure. Since utility would then be bounded (assuming the local utility function is bounded), expected utilities would always converge. I ultimately still don't buy the linear utility hypothesis even given such a probability measure on the universe, for some of the other arguments I've given against it, but I think nonconvergence of expected utility is the strongest argument against the linear utility hypothesis in general, and it does not apply in this setting.
A major issue with having preferences that are linear with respect to an objective, physically real probability measure on the universe is that there might not be such a measure. It doesn't seem to me like we have especially strong reasons to believe such a measure to exist, and we certainly shouldn't believe that there is such a measure with probability 1. So you still have to decide what your preferences are in the absence of an objective probability measure on the universe. And maximizing the integral of a local utility function with respect to a finite measure that has no objective basis in reality strikes me as wrong. Like, why would I care so much whether experiences are instantiated in this piece of universe over here or that piece of universe over there, if there's no real sense in which there is more of the experience if it is instantiated in one place than if it is instantiated in the other?
An agent with preferences satisfying the linear utility hypothesis with respect to a finite measure could have an interesting way of looking at problems like Pascal's mugger. The agent probably expects that its measure is much more than 1/3↑↑↑3. So if Pascal's mugger comes up and says that it controls the fates of minds with combined measure 3↑↑↑3 times the agent's measure, in order for Pascal's mugger to be telling the truth, the agent would have to be morally smaller than it thought it was, in some sense. If the agent thinks its measure is 10^-100, and assigns less than 10^-100 probability to Pascal's mugger's claim that the agent's measure is actually 1/3↑↑↑3 and the mugger controls the fates of minds with measure close to 1, then it will not pay Pascal's mugger because the mugger is only claiming to control the fates of minds with measure 10^100 times what the agent thinks its own measure is, and there's no possibility that Pascal's mugger could have even more influence than that.