Against the Linear Utility Hypothesis and the Leverage Penalty

byAlexMennen1y14th Dec 201747 comments

73


[Roughly the second half of this is a reply to: Pascal's Muggle]

There's an assumption that people often make when thinking about decision theory, which is that utility should be linear with respect to amount of stuff going on. To be clear, I don't mean linear with respect to amount of money/cookies/etc that you own; most people know better than that. The assumption I'm talking about is that the state of the rest of the universe (or multiverse) does not affect the marginal utility of there also being someone having certain experiences at some location in the uni-/multi-verse. For instance, if 1 util is the difference in utility between nothing existing, and there being a planet that has some humans and other animals living on it for a while before going extinct, then the difference in utility between nothing existing and there being n copies of that planet should be n utils. I'll call this the Linear Utility Hypothesis. It seems to me that, despite its popularity, the Linear Utility Hypothesis is poorly motivated, and a very poor fit to actual human preferences.

The Linear Utility Hypothesis gets implicitly assumed a lot in discussions of Pascal's mugging. For instance, in Pascal's Muggle, Eliezer Yudkowsky says he “[doesn't] see any way around” the conclusion that he must be assigning a probably at most on the order of 1/3↑↑↑3 to the proposition that Pascal's mugger is telling the truth, given that the mugger claims to be influencing 3↑↑↑3 lives and that he would refuse the mugger's demands. This implies that he doesn't see any way that influencing 3↑↑↑3 lives could not have on the order of 3↑↑↑3 times as much utility as influencing one life, which sounds like an invocation of the Linear Utility Hypothesis.

One argument for something kind of like the Linear Utility Hypothesis is that there may be a vast multiverse that you can influence only a small part of, and unless your utility function is weirdly nondifferentiable and you have very precise information about the state of the rest of the multiverse (or if your utility function depends primarily on things you personally control), then your utility function should be locally very close to linear. That is, if your utility function is a smooth function of how many people are experiencing what conditions, then the utility from influencing 1 life should be 1/n times the utility of having the same influence on n lives, because n is inevitably going to be small enough that a linear approximation to your utility function will be reasonably accurate, and even if your utility function isn't smooth, you don't know what the rest of the universe looks like, so you can't predict how the small changes you can make will interact with discontinuities in your utility function. This is a scaled-up version of a common argument that you should be willing to pay 10 times as much to save 20,000 birds as you would be willing to pay to save 2,000 birds. I am sympathetic to this argument, though not convinced of the premise that you can only influence a tiny portion of what is actually valuable to you. More importantly, this argument does not even attempt to establish that utility is globally linear, and counterintuitive consequences of the Linear Utility Hypothesis, such as Pascal's mugging, often involve situations that seem especially likely to violate the assumption that all choices you make have tiny consequences.

I have never seen anyone provide a defense of the Linear Utility Hypothesis itself (actually, I think I've been pointed to the VNM theorem for this, but I don't count that because it's a non-sequitor; the VNM theorem is just a reason to use a utility function in the first place, and does not place any constraints on what that utility function might look like), so I don't know of any arguments for it available for me to refute, and I'll just go ahead and argue that it can't be right because actual human preferences violate it too dramatically. For instance, suppose you're given a choice between the following two options: 1: Humanity grows into a vast civilization of 10^100 people living long and happy lives, or 2: a 10% chance that humanity grows into a vast civilization of 10^102 people living long and happy lives, and a 90% chance of going extinct right now. I think almost everyone would pick option 1, and would think it crazy to take a reckless gamble like option 2. But the Linear Utility Hypothesis says that option 2 is much better. Most of the ways people respond to Pascal's mugger don't apply to this situation, since the probabilities and ratios of utilities involved here are not at all extreme.

There are smaller-scale counterexamples to the Linear Utility Hypothesis as well. Suppose you're offered the choice between: 1: continue to live a normal life, which lasts for n more years, or 2: live the next year of a normal life, but then instead of living a normal life after that, have all your memories from the past year removed, and experience that year again n more times (your memories getting reset each time). I expect pretty much everyone to take option 1, even if they expect the next year of their life to be better than the average of all future years of their life. If utility is just a naive sum of local utility, then there must be some year in which has at least as much utility in it as the average year, and just repeating that year every year would thus increase total utility. But humans care about the relationship that their experiences have with each other at different times, as well as what those experiences are.

Here's another thought experiment that seems like a reasonable empirical test of the Linear Utility Hypothesis: take some event that is familiar enough that we understand its expected utility reasonably well (for instance, the amount of money in your pocket changing by $5), and some ludicrously unlikely event (for instance, the event in which some random person is actually telling the truth when they claim, without evidence, to have magic powers allowing them to control the fates of arbitrarily large universes, and saying, without giving a reason, that the way they use this power is dependent on some seemingly unrelated action you can take), and see if you become willing to sacrifice the well-understood amount of utility in exchange for the tiny chance of a large impact when the large impact becomes big enough that the tiny chance of it would be more important if the Linear Utility Hypothesis were true. This thought experiment should sound very familiar. The result of this experiment is that basically everyone agrees that they shouldn't pay the mugger, not only at much higher stakes than the Linear Utility Hypothesis predicts should be sufficient, but even at arbitrarily large stakes. This result has even stronger consequences than that the Linear Utility Hypothesis is false, namely that utility is bounded. People have come up with all sorts of absurd explanations for why they wouldn't pay Pascal's mugger even though the Linear Utility Hypothesis is true about their preferences (I will address the least absurd of these explanations in a bit), but there is no better test for whether an agent's utility function is bounded than how it responds to Pascal's mugger. If you take the claim “My utility function is unbounded”, and taboo “utility function” and "unbounded", it becomes “Given outcomes A and B such that I prefer A over B, for any probability p>0, there is an outcome C such that I would take B rather than A if it lets me control whether C happens instead with probability p.” If you claim that one of these claims is true and the other is false, then you're just contradicting yourself, because that's what “utility function” means. That can be roughly translated into English as “I would do the equivalent of paying the mugger in Pascal's mugging-like situations”. So in Pascal's mugging-like situations, agents with unbounded utility functions don't look for clever reasons not to do the equivalent of paying the mugger; they just pay up. The fact that this behavior is so counterintuitive is an indication that agents with unbounded utility functions are so alien that you have no idea how to empathize with them.

The “least absurd explanation” I referred to for why an agent satisfying the Linear Utility Hypothesis would reject Pascal's mugger, is, of course, the leverage penalty that Eliezer discusses in Pascal's Muggle. The argument is that any hypothesis in which there are n people, one of whom has a unique opportunity to affect all the others, must imply that a randomly selected one of those n people has only a 1/n chance of being the one who has influence. So if a hypothesis implies that you have a unique opportunity to affect n people's lives, then this fact is evidence against this hypothesis by a factor of 1:n. In particular, if Pascal's mugger tells you that you are in a unique position to affect 3↑↑↑3 lives, the fact that you are the one in this position is 1 : 3↑↑↑3 evidence against the hypothesis that Pascal's mugger is telling the truth. I have two criticisms of the leverage penalty: first, that it is not the actual reason that people reject Pascal's mugger, and second, that it is not a correct reason for an ideal rational agent to reject Pascal's mugger.

The leverage penalty can't be the actual reason people reject Pascal's mugger because people don't actually assign probability as low as 1/3↑↑↑3 to the proposition that Pascal's mugger is telling the truth. This can be demonstrated with thought experiments. Consider what happens when someone encounters overwhelming evidence that Pascal's mugger actually is telling the truth. The probability of the evidence being faked can't possibly be less than 1 in 10^10^26 or so (this upper bound was suggested by Eliezer in Pascal's Muggle), so an agent with a leverage prior will still be absolutely convinced that Pascal's mugger is lying. Eliezer suggests two reasons that an agent might pay Pascal's mugger anyway, given a sufficient amount of evidence: first, that once you update to a probability of something like 10^100 / 3↑↑↑3, and multiply by the stakes of 3↑↑↑3 lives, you get an expected utility of something like 10^100 lives, which is worth a lot more than $5, and second, that the agent might just give up on the idea of a leverage penalty and admit that there is a non-infinitesimal chance that Pascal's mugger may actually be telling the truth. Eliezer concludes, and I agree, that the first of these explanations is not a good one. I can actually demonstrate this with a thought experiment. Suppose that after showing you overwhelming evidence that they're telling the truth, Pascal's mugger says “Oh, and by the way, if I was telling the truth about the 3↑↑↑3 lives in your hands, then X is also true,” where X is some (a priori fairly unlikely) proposition that you later have the opportunity to bet on with a third party. Now, I'm sure you'd be appropriately cautious in light of the fact that you would be very confused about what's going on, so you wouldn't bet recklessly, but you probably would consider yourself to have some special information about X, and if offered good enough odds, you might see a good opportunity for profit with an acceptable risk, which would not have looked appealing before being told X by Pascal's mugger. If you were really as confident that Pascal's mugger was lying as the leverage prior would imply, then you wouldn't assume X was any more likely than you thought before for any purposes not involving astronomical stakes, since your reason for believing X is predicated on you having control over astronomical stakes, which is astronomically unlikely.

So after seeing the overwhelming evidence, you shouldn't have a leverage prior. And despite Eliezer's protests to the contrary, this does straightforwardly imply that you never had a leverage prior in the first place. Eliezer's excuse for using a leverage prior before but not after seeing observations that a leverage prior predicts are extremely unlikely is computational limitations. He compares this to the situation in which there is a theorem X that you aren't yet aware you can prove, and a lemma Y that you can see is true and you can see implies X. If you're asked how likely X is to be true, you might say something like 50%, since you haven't thought of Y, and then when asked how likely X&Y is to be true, you see why X is probably true, and say something like 90%. This is not at all analogous to a “superupdate” in which you change priors because of unlikely observations, because in the case of assigning probabilities to mathematical claims, you only need to think about Y, whereas Eliezer is trying to claim that a superupdate can only happen when you actually observe that evidence, and just thinking hypothetically about such evidence isn't enough. A better analogy to the situation with the theorem and lemma would be when you initially say that there's a 1 in 3↑↑↑3 chance that Pascal's mugger was telling the truth, and then someone asks what you would think if Pascal's mugger tore a hole in the sky, showing another copy of the mugger next to a button, and repeating the claim that pushing the button would influence 3↑↑↑3 lives, and then you think “oh in that case I'd think it's possible the mugger's telling the truth; I'd still be pretty skeptical, so maybe I'd think there was about a 1 in 1000 chance that the mugger is telling the truth, and come to think of it, I guess the chance of me observing that evidence is around 10^-12, so I'm updating right now to a 10^-15 chance that the mugger is telling the truth.” Incidentally, if that did happen, then this agent would be very poorly calibrated, since if you assign a probability of 1 in 3↑↑↑3 to a proposition, you should assign a probability of at most 10^15 / 3↑↑↑3 to ever justifiably updating that probability to 10^-15. If you want a well-calibrated probability for an absurdly unlikely event, you should already be thinking about less unlikely ways that your model of the world could be wrong, instead of waiting for strong evidence that your model of the world actually is wrong, and plugging your ears and shouting “LA LA LA I CAN'T HEAR YOU!!!” when someone describes a thought experiment that suggests that the overwhelmingly most likely way the event could occur is for your model to be incorrect. But Eliezer perplexingly suggests ignoring the results of these thought experiments unless they actually occur in real life, and doesn't give a reason for this other than “computational limitations”, but, uh, if you've thought of a thought experiment and reasoned though its implications, then your computational limitations apparently aren't strict enough to prevent you from doing that. Eliezer suggests that the fact that probabilities must sum to 1 might force you to assign near-infinitesimal probabilities to certain easy-to-state propositions, but this is clearly false. Complexity priors sum to 1. Those aren't computable, but as long as we're talking about computational limitations, by Eliezer's own estimate, there are far less than 10^10^26 mutually disjoint hypotheses a human is physically capable of even considering, so the fact that probabilities sum to 1 cannot force you to assign a probability less than 1 in 10^10^26 to any of them (and you probably shouldn't; I suggest a “strong Cromwell's rule” that empirical hypotheses shouldn't be given probabilities less than 10^-10^26 or so). And for the sorts of hypotheses that are easy enough to describe that we actually do so in thought experiments, we're not going to get upper bounds anywhere near that tiny.

And if you do assign a probability of 1/3↑↑↑3 to some proposition, what is the empirical content of this claim? One possible answer is that this means that the odds at which you would be indifferent to betting on the proposition are 1 : 3↑↑↑3, if the bet is settled with some currency that your utility function is close to linear with respect to across such scales. But the existence of such a currency is under dispute, and the empirical content to the claim that such a currency exists is that you would make certain bets with it involving arbitrarily extreme odds, so this is a very circular way to empirically ground the claim that you assign a probability of 1/3↑↑↑3 to some proposition. So a good empirical grounding for this claim is going to have to be in terms of preferences between more familiar outcomes. And in terms of payoffs at familiar scales, I don't see anything else that the claim that you assign a probability of 1/3↑↑↑3 to a proposition could mean other than that you expect to continue to act as if the probability of the proposition is 0, even conditional on any observations that don't give you a likelihood ratio on the order of 1/3↑↑↑3. If you claim that you would superupdate long before then, it's not clear to me what you could mean when you say that your current probability for the proposition is 1/3↑↑↑3.

There's another way to see that bounded utility functions, not leverage priors, are Eliezer's (and also pretty much everyone's) true rejection to paying Pascal's mugger, and that is the following quote from Pascal's Muggle: “I still feel a bit nervous about the idea that Pascal's Muggee, after the sky splits open, is handing over five dollars while claiming to assign probability on the order of 10^9/3↑↑↑3 that it's doing any good.” This is an admission that Eliezer's utility function is bounded (even though Eliezer does not admit that he is admitting this) because the rational agents whose utility functions are bounded are exactly (and tautologically) characterized by those for which there exists a probability p>0 such that the agent would not spend [fixed amount of utility] for probability p of doing any good, no matter what the good is. An agent satisfying the Linear Utility Hypothesis would spend $5 for a 10^9/3↑↑↑3 chance of saving 3↑↑↑3 lives. Admitting that it would do the wrong thing if it was in that situation, but claiming that that's okay because you have an elaborate argument that the agent can't be in that situation even though it can be in situations in which the probability is lower and can also be in situations in which the probability is higher, strikes me as an exceptionally flimsy argument that the Linear Utility Hypothesis is compatible with human values.

I also promised a reason that the leverage penalty argument is not a correct reason for rational agents (regardless of computational constraints) satisfying the Linear Utility Hypothesis to not pay Pascal's mugger. This is that in weird situations like this, you should be using updateless decision theory, and figure out which policy has the best a priori expected utility and implementing that policy, instead of trying to make sense of weird anthropic arguments before updatefully coming up with a strategy. Now consider the following hypothesis: “There are 3↑↑↑3 copies of you, and a Matrix Lord will approach one of them while disguised as an ordinary human, inform that copy about his powers and intentions without offering any solid evidence to support his claims, and then kill the rest of the copies iff this copy declines to pay him $5. None of the other copies will experience or hallucinate anything like this.” Of course, this hypothesis is extremely unlikely, but there is no assumption that some randomly selected copy coincidentally happens to be the one that the Matrix Lord approaches, and thus no way for a leverage penalty to force the probability of the hypothesis below 1/3↑↑↑3. This hypothesis and the Linear Utility Hypothesis suggest that having a policy of paying Pascal's mugger would have consequences 3↑↑↑3 times as important as not dying, which is worth well over $5 in expectation, since the probability of the hypothesis couldn't be as low as 1/3↑↑↑3. The fact that actually being approached by Pascal's mugger can be seen as overwhelming evidence against this hypothesis does nothing to change that.

Edit: I have written a follow-up to this.