Anthropic decision theory for selfish agents

Beluga

Anthropic decision theory for selfish agents

7 min read21st Oct 201437 comments

12

Consider Nick Bostrom's Incubator Gedankenexperiment, phrased as a decision problem. In my mind, this provides the purest and simplest example of a non-trivial anthropic decision problem. In an otherwise empty world, the Incubator flips a coin. If the coin comes up heads, it creates one human, while if the coin comes up tails, it creates two humans. Each created human is put into one of two indistinguishable cells, and there's no way for created humans to tell whether another human has been created or not. Each created human is offered the possibility to buy a lottery ticket which pays 1$ if the coin has shown tails. What is the maximal price that you would pay for such a lottery ticket? (Utility is proportional to Dollars.) The two traditional answers are 1/2$ and 2/3$.

We can try to answer this question for agents with different utility functions: total utilitarians; average utilitarians; and selfish agents. UDT's answer is that total utilitarians should pay up to 2/3$, while average utilitarians should pay up to 1/2$; see Stuart Armstrong's paper and Wei Dai's comment. There are some heuristic ways to arrive at UDT prescpriptions, such as asking "What would I have precommited to?" or arguing based on reflective consistency. For example, a CDT agent that expects to face Counterfactual Mugging-like situations in the future (with predictions also made in the future) will self-modify to become an UDT agent, i.e., one that pays the counterfactual mugger.

Now, these kinds of heuristics are not applicable to the Incubator case. It is meaningless to ask "What maximal price should I have precommited to?" or "At what odds should I bet on coin flips of this kind in the future?", since the very point of the Gedankenexperiment is that the agent's existence is contingent upon the outcome of the coin flip. Can we come up with a different heuristic that leads to the correct answer? Imagine that the Incubator's subroutine that is responsible for creating the humans is completely benevolent towards them (let's call this the "Benevolent Creator"). (We assume here that the humans' goals are identical, such that the notion of benevolence towards all humans is completely unproblematic.) The Benevolent Creator has the power to program a certain maximal price the humans pay for the lottery tickets into them. A moment's thought shows that this leads indeed to UDT's answers for average and total utilitarians. For example, consider the case of total utilitarians. If the humans pay x$ for the lottery tickets, the expected utility is 1/2*(-x) + 1/2*2*(1-x). So indeed, the break-even price is reached for x=2/3.

But what about selfish agents? For them, the Benevolent Creator heuristic is no longer applicable. Since the humans' goals do not align, the Creator cannot share them. As Wei Dai writes, the notion of selfish values does not fit well with UDT. In Anthropic decision theory, Stuart Armstrong argues that selfish agents should pay up to 1/2$ (Sec. 3.3.3). His argument is based on an alleged isomorphism between the average utilitarian and the selfish case. (For instance, donating 1$ to each human increases utility by 1 for both average utilitarian and selfish agents, while it increases utility by 2 for total utilitarians in the tails world.) Here, I want to argue that this is incorrect and that selfish agents should pay up to 2/3$ for the lottery tickets.

(Needless to say that all the bold statements I'm about to make are based on an "inside view". An "outside view" tells me that Stuart Armstrong has thought much more carefully about these issues than I have, and has discussed them with a lot of smart people, which I haven't, so chances are my arguments are flawed somehow.)

In order to make my argument, I want to introduce yet another heuristic, which I call the Submissive Gnome. Suppose each cell contains a gnome which is already present before the coin is flipped. As soon as it sees a human in its cell, it instantly adopts the human's goal. From the gnome's perspective, SIA odds are clearly correct: Since a human is twice as likely to appear in the gnome's cell if the coin shows tails, Bayes' Theorem implies that the probability of tails is 2/3 from the gnome's perspective once it has seen a human. Therefore, the gnome would advise the selfish human to pay up to 2/3$ for a lottery ticket that pays 1$ in the tails world. I don't see any reason why the selfish agent shouldn't follow the gnome's advice. From the gnome's perspective, the problem is not even "anthropic" in any sense, there's just straightforward Bayesian updating.

Suppose we want to use the Submissive Gnome heuristic to solve the problem for utilitarian agents. (ETA: Total/average utilitarianism includes the well-being and population of humans only, not of gnomes.) The gnome reasons as follows: "With probability 2/3, the coin has shown tails. For an average utilitarian, the expected utility after paying x$ for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5$, respectively." The gnome's advice disagrees with UDT and the solution based on the Benevolent Creator. Something has gone terribly wrong here, but what? The mistake in the gnome's reasoning here is in fact perfectly isomorphic to the mistake in the reasoning leading to the "yea" answer in Psy-Kosh's non-anthropic problem.

Things become clear if we look at the problem from the gnome's perspective before the coin is flipped. Assume, for simplicity, that there are only two cells and gnomes, 1 and 2. If the coin shows heads, the single human is placed in cell 1 and cell 2 is left empty. Since the humans don't know in which cell they are, neither should the gnomes know. So from each gnome's perspective, there are four equiprobable "worlds": it can be in cell 1 or 2 and the coin flip can result in heads or tails. We assume, of course, that the two gnomes are, like the humans, sufficiently similar such that their decisions are "linked".

We can assume that the gnomes already know what utility functions the humans are going to have. If the humans will be (total/average) utilitarians, we can then even assume that the gnomes already are so, too, since the well-being of each human is as important as that of any other. Crucially, then, for both utilitarian utility functions, the question whether the gnome is in cell 1 or 2 is irrelevant. There is just one "gnome advice" that is given identically to all (one or two) humans. Whether this advice is given by one gnome or the other or both of them is irrelevant from both gnomes' perspective. The alignment of the humans' goals leads to alignment of the gnomes' goals. The expected utility of some advice can simply be calculated by taking probability 1/2 for both heads and tails, and introducing a factor of 2 in the total utilitarian case, leading to the answers 1/2 and 2/3, in accordance with UDT and the Benevolent Creator.

The situation looks different if the humans are selfish. We can no longer assume that the gnomes already have a utility function. The gnome cannot yet care about that human, since with probability 1/4 (if the gnome is in cell 2 and the coin shows heads) there will not be a human to care for. (By contrast, it is already possible to care about the average utility of all humans there will be, which is where the alleged isomorphism between the two cases breaks down.) It is still true that there is just one "gnome advice" that is given identically to all (one or two) humans, but the method for calculating the optimal advice now differs. In three of the four equiprobable "worlds" the gnome can live in, a human will appear in its cell after the coin flip. Two out of these three are tail worlds, so the gnome decides to advise paying up to 2/3$ for the lottery ticket if a human appears in its cell.

There is a way to restore the equivalence between the average utilitarian and the selfish case. If the humans will be selfish, we can say that the gnome cares about the average well-being of the three humans which will appear in its cell with equal likelihood: the human created after heads, the first human created after tails, and the second human created after tails. The gnome expects to adopt each of these three humans' selfish utility function with probability 1/4. It makes thus sense to say that the gnome cares about the average well-being of these three humans. This is the correct correspondence between selfish and average utilitarian values and it leads, again, to the conclusion that the correct advise is to pay up to 2/3$ for the lottery ticket.

In Anthropic Bias, Nick Bostrom argues that each human should assign probability 1/2 to the coin having shown tails ("SSA odds"). He also introduces the possible answer 2/3 ("SSA+SIA", nowadays usually simply called "SIA") and refutes it. SIA odds have been defended by Olum. The main argument against SIA is the Presumptuous Philosopher. Main arguments for SIA and against SSA odds are that SIA avoids the Doomsday Argument¹, which most people feel has to be wrong, that SSA odds depend on whom you consider to be part of your "reference class", and furthermore, as pointed out by Bostrom himself, that SSA odds allow for acausal superpowers.

The consensus view on LW seems to be that much of the SSA vs. SIA debate is confused and due to discussing probabilities detached from decision problems of agents with specific utility functions. (ETA: At least this was the impression I got. Two commenters have expressed scepticism about whether this is really the consensus view.) I think that "What are the odds at which a selfish agent should bet on tails?" is the most sensible translation of "What is the probability that the coin has shown tails?" into a decision problem. Since I've argued that selfish agents should take bets following SIA odds, one can employ the Presumptuous Philosopher argument against my conclusion: it seems to imply that selfish agents, like total but unlike average utilitarians, should bet at extreme odds on living in a extremely large universe, even if there's no empirical evidence in favor of this. I don't think this counterargument is very strong. However, since this post is already quite lengthy, I'll elaborate more on this if I get encouraging feedback for this post.

¹ At least its standard version. SIA comes with its own Doomsday conclusions, cf. Katja Grace's thesis Anthropic Reasoning in the Great Filter.

New to LessWrong?

12

Mentioned in

38"Solving" selfishness for UDT

New Comment

37 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:02 PM

[-]lackofcheese10y40

I think I can resolve the confusion here, but as a quick summary, I'm quite sure Beluga's argument holds up. The first step is to give a clear statement of what the difference is between the indexical and non-indexical versions of the utility functions. This is important because the UDT approach translates to "What is the optimal setting for decision variable X, in order to maximise the expected utility over all a priori possible worlds that are influenced by decision variable X?" On the basis of UDT or UDT-like principles such as an assumption of linked decisions, it thus follows that two utility functions are equivalent for this purpose if and only if they are equivalent over all possible worlds in which the outcomes are dependent upon X.

Now, as the first step in resolving these issues I think it's best to go over all of the relevant utility functions for this problem. First, let's begin with the three core non-indexical cases (or "lexicality-independent" cases, although I'm not sure of the term):
Indifference (0): I don't care at all about anything (i.e. a constant function).
Total utilitarian (T): I care linearly in the sum total dollars owned by humans in all possible worlds.
Average utilitarian (A): I care linearly in the average dollars owned by humans in all possible worlds.
There's also one essential operator we can apply to these functions:
Negation (-): -F = my preferences are the exact inverse of F.
e.g. -T would mean that you want humans to lose as many total dollars as possible.

Now for indexical considerations, the basic utility function is
Selfish (S): I care linearly in the amount of dollars that I own.
Notably, as applied to worlds where you don't exist, selfishness is equivalent to indifference. With this in mind, it's useful to introduce two indexical operators; first there's
Indexicalization (I): IF(w) = F(w) if you exist in world w, and 0 if you do not exist in world w.
Of course, it's pretty clear that IS=I, since S was already indifferent to worlds where you don't exist. Similarly, we can also introduce
Anti-indexicalization (J): JF(w) = 0 if you exist in world w, and F(w) if you do not exist in world w.

It's important to note that if you can influence the probability of yourself existing the constant value of the constant function becomes important, so these indexical operators are actually ill-conditioned in the general case. In this case, though, you don't affect the probability of your own existence, and so we may as well pick the constant to be zero. Also, since our utility functions are all enumerated in dollars we can also reasonably talk about making linear combinations of them, and so we can add, subtract, and multiply by constants. In general this wouldn't make sense but it's a useful trick here. With this in mind, we also have the identity IF + JF = F.

Now we already have all we need to define the other utility functions discussed here. Indexical total utilitarianism is simply IT, which translates into English as "I care about the total dollars owned by humans, but only if I exist; otherwise I'm indifferent."

As for "hatred", it's important to note that there are several different kinds. First of all, there is "anti-selflessness", which I represent via Z = S - T; this translates to "I don't care about myself, but I want people who aren't me to lose as many dollars as possible, whether or not I exist". Then there's the kind of hatred proposed below, where you still care about your own money as well; that one still comes in two different kinds. There is plain "selfish hatred" H = 2S - T, and then there's its indexical version IH = I(2S - T) = 2S - IT, which translates to "In worlds in which I exist, I want to get as much money as possible and for other people to have as little money as possible". The latter is probably best referred to as "jealousy" rather than hatred. From these definitions, two identities of selfishness as mixes of total utilitarianism and hatred follow pretty clearly, as S = 0.5(H+T) = 0.5(IH+IT).

Next comment: submissive gnomes, and the correct answers.

EDIT: Apparently the definitions of "hater" used in the other comments assume that haters still care about their own money, so I've updated my definitions.

[-]lackofcheese10y40

Having established the nature of the different utility functions, it's pretty simple to show how the gnomes relate to these. The first key point to make, though, is that there are actually two distinct types of submissive gnomes and it's important not to confuse the two. This is part of the reason for the confusion over Beluga's post.
Submissive gnome: I adopt the utility function of any human in my cell, but am completely indifferent otherwise.
Pre-emptively submissive gnome: I adopt the utility function of any human in my cell; if there is no human in my cell I adopt the utility function they would have had if they were here.

The two are different precisely in the key case that Stuart mentioned---the case where there is no human at all in the gnome's cell. Fortunately, the utility function of the human who will be in the gnome's cell (which we'll call "cell B") is entirely well-defined, because any existing human in the same cell will always end up with the same utility function. The "would have had" case for the pre-emptively submissive gnomes is a little stranger, but it still makes sense---the gnome's utility would correspond to the anti-indexical component JU of the human's utility function U (which, for selfish humans, is just zero). Thus we can actually remove all of the dangling references in the gnome's utility function, as per the discussion between Stuart and Beluga. If U is the utility function the human in cell B has (or would have), then the submissive gnome's utility function is IU (note the indexicalisation!) whereas the pre-emptively submissive gnome's utility function is simply U.

Following Beluga's post here, we can use these ideas to translate all of the various utility functions to make them completely objective and observer-independent, although some of them reference cell B specifically. If we refer to the second cell as "cell C", swapping between the two gnomes is equivalent to swapping B and C. For further simplification, we use $(B) to refer to the number of dollars in cell B, and o(B) as an indicator function for whether the cell has a human in it. The simplified utility functions are thus
T = $B + $C
A = ($B + $C) / (o(B) + o(C))
S = IS = $B
IT = o(B) ($B + $C)
IA = o(B) ($B + $C) / (o(B) + o(C))
Z = - $C
H = $B - $C
IH = o(B) ($B - $C)
Note that T and A are the only functions that are invariant under swapping B and C.

This invariance means that, for both cases involving utilitarian humans and pre-emptively submissive gnomes, all of the gnomes (including the one in an empty cell) and all of the humans have the same utility function over all possible worlds. Moreover, all of the decisions are obviously linked, and so there is effectively only one decision. Consequently, it's quite trivial to solve with UDT. Total utilitarianism gives
E[T] = 0.5(-x) + 2*0.5(1-x) = 1-1.5x
with breakeven at x = 2/3, and average utilitarianism gives
E[A] = 0.5(-x) + 0.5(1-x) = 0.5-x
with breakeven at x = 1/2.

In the selfish case, the gnome ends up with the same utility function whether it's pre-emptive or not, because IS = S. Also, there is no need to worry about decision linkage, and hence the decision problem is a trivial one. From the gnome's point of view, 1/4 of the time there will be no human in the cell, 1/2 of time there will be a human in the cell and the coin will have come up tails, and 1/4 of the time there will be a human in the cell and the coin will have come up heads. Thus
E[S] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
and the breakeven point is x = 2/3, as with the total utilitarian case.

In all of these cases so far, I think the humans quite clearly should follow the advice of the gnomes, because
1) Their utility functions coincide exactly over all a priori possible worlds.
2) The humans do not have any extra information that the gnomes do not.

Now, finally, let's go over the reasoning that leads to the so-called "incorrect" answers of 4/5 and 2/3 for total and average utilitarianism. We assume, as before, that the decisions are linked. As per Beluga's post, the argument goes like this:

With probability 2/3, the coin has shown tails. For an average utilitarian, the expected utility after paying x$ for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5$, respectively.

So, what's the problem with this argument? In actual fact, for a submissive gnome, that advice is correct, but the human should not follow it. The problem is that a submissive gnome's utility function doesn't coincide with the utility function of the human over all possible worlds, because IT != T and IA != A. The key difference between the two cases is the gnome in the empty cell. If it's a submissive gnome, then it's completely indifferent to the plight of the humans; if it's a pre-emptively submissive gnome then it still cares.

If we were to do the full calculations for the submissive gnome, the gnome's utility function is IT for total utilitarian humans and IA for average utilitariam humans; since IIT = IT and IIA = IA the calculations are the same if the humans have indexical utility functions. For IT we get
E[IT] = 0.25(0) + 0.25(-x) + 2*0.5(1-x) = 1-1.25x
with breakeven at x = 4/5, and for IA we get
E[IA] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
with breakeven at x = 2/3. Thus the submissive gnome's 2/3 and 4/5 numbers are correct for the gnome, and indeed if the human's total/average utilitarianism is indexical they should just follow the advice, because their utility function would then be identical to the gnome's.

So, if this advice is correct for the submissive gnome, why should the pre-emptive submissive gnome's advice be different? After all, after conditioning on the presence of a human in the cell the two utility functions are the same. This particular issue is indeed exactly analogous to the mistaken "yea" answer in Psy-Kosh's non-anthropic problem. Although I side with UDT and/or the precommitment-based reasoning, I think that question warrants further discussion, so I'll leave that for a third comment.

[-]lackofcheese10y20

OK, time for further detail on the problem with pre-emptively submissive gnomes. Let's focus on the case of total utilitarianism, and begin by looking at the decision in unlinked form, i.e. we assume that the gnome's advice affects only one human if there is one in the room, and zero humans otherwise. Conditional on there being a human in cell B, the expected utility of the human in cell B buying a ticket for $x is, indeed, (1/3)(-x) + (2/3)(1-x) = 2/3 - x, so the breakeven is obviously at x = 2/3. However, if we also assume that the gnome in the other cell will give the same advice, we get (1/3)(-x) + 2(2/3)(1-x) = 4/3 - (5/3)x, with breakeven at x=4/5. In actual fact, the gnome's reasoning, and the 4/5 answer, is correct. If tickets were being offered at a price of, say, 75 cents, then the overall outcome (conditional on there being a human in cell B) is indeed better if the humans buy at 75 cents than if they refuse to buy at 75 cents, because 3/4 is less than 4/5.

As I mentioned previously, in the case where the gnome only cares about total $ if there is a human in its cell, then 4/5 is correct before conditioning on the presence of a human, and it's also correct after conditioning on the presence of a human; the number is 4/5 regardless. However, the situation we're examining here is different, because the gnome cares about total $ even if no human is present. Thus we have a dilemma, because it appears that UDT is correct in advising the gnome to precommit to 2/3, but the above argument also suggests that after seeing a human in its cell it is correct for the gnome to advise 4/5.

The key distinction, analogously to mwenger's answer to Psy-Kosh's non-anthropic problem, has to do with the possibility of a gnome in an empty cell. For a total utilitarian gnome in an empty cell, any money at all spent in the other cell translates directly into negative utility. That gnome would prefer the human in the other cell to spend $0 at most, but of course there is no way to make this happen, since the other gnome has no way of knowing that this is the case.

The resolution to this problem is that, for linked decisions, you must (as UDT does) necessarily consider the effects of that decision over all a priori possible worlds affected by that decision. As it happens, this is the same thing as what you would do if you had the opportunity to precommit in advance.

It's a bit trickier to justify why this should be the case, but the best argument I can come up with is to apply that same "linked decision" reasoning at one meta-level up, the level of "linked decision theories". In short, by adopting a decision theory that ignores linked decisions in a priori possible worlds that are excluded by your observations, you are licensing yourself and other agents to do the same thing in future decisions, which you don't want. If other agents follow this reasoning, they will give the "yea" answer in Psy-Kosh's non-anthropic problem, but you don't want them to do that.

Note that most of the time, decisions in worlds excluded by your observations do not usually tend to be "linked". This is because exclusion by observation would usually imply that you receive a different observation in the other possible world, thus allowing you to condition your decision on that observation, and thereby unlinking the decisions. However, some rare problems like the Counterfactual Mugging and Psy-Kosh's non-anthropic problem violate this tendency, and should therefore be treated differently.

Overall, then, the "linked decision theory" argument supports adopting UDT, and it means that you should consider all linked decisions in all a priori possible worlds.

[-]Beluga10y10

Thanks a lot for your comments, they were very insightful for me. Let me play the Advocatus Diaboli here and argue from the perspective of a selfish agent against your reasoning (and thus also against my own, less refined version of it).

"I object to the identification 'S = $B'. I do not care about the money owned by the person in cell B, I only do so if that person is me. I do not know whether the coin has come up heads or tails, but I do not care about how much money the other person that may have been in cell B had the coin come up differently would have paid or won. I only care about the money owned by the person in cell B in "this world", where that person is me. I reject identifying myself with the other person that may have been in cell B had the coin come up differently, solely because that person would exist in the same cell as I do. My utility function thus cannot be expressed as a linear combination of $B and $C.

I would pay a counterfactual mugger. In that case, there is a transfer, as it were, between two possible selfes of mine that increases "our" total fortune. We are both both possible descendants of the same past-self, to which each of us is connected identically. The situation is quite different in the incubator case. There is no connection over a mutual past self between me and the other person that may have existed in cell B after a different outcome of the coin flip. This connection between past and future selves of mine is exactly what specifies my selfish goals. Actually, I don't feel like the person that may have existed in cell B after a different outcome of the coin flip is "me" any more than the person in cell C is "me" (if that person exists). Since I will pay and win as much as the person in cell C (if they exist), I cannot win any money from them, and I don't care about whether they exist at all, I think I should decide as an average utilitarian would. I will not pay more than $0.50."

Is the egoist arguing this way mistaken? Or is our everyday notion of selfishness just not uniquely defined when it comes to the possibility of subjectively indistinguishable agents living in different "worlds", since it rests on the dubious concept of personal identity? Can one understand selfishness both as caring about everyone living in subjectively identical circumstances as oneself (and their future selves), and as caring about everyone to whom one is directly connected only? Do these two possibilities correspond to SIA-egoists and SSA-egoists, respectively, which are both coherent possibilities?

[-]lackofcheese10y00

First of all, I think your argument from connection of past/future selves is just a specific case of the more general argument for reflective consistency, and thus does not imply any kind of "selfishness" in and of itself. More detail is needed to specify a notion of selfishness.

I understand your argument against identifying yourself with another person who might counterfactually have been in the same cell, but the problem here is that if you don't know how the coin actually came up you still have to assign amounts of "care" to the possible selves that you could actually be.

Let's say that, as in my reasoning above, there are two cells, B and C; when the coin comes up tails humans are created in both cell B and cell C, but when the coin comes up heads a human is created in either cell B or cell C, with equal probability. Thus there are 3 "possible worlds":
1) p=1/2 human in both cells
2) p=1/4 human in cell B, cell C empty
3) p=1/4 human in cell C, cell B empty

If you're a selfish human and you know you're in cell B, then you don't care about world (3) at all, because there is no "you" in it. However, you still don't know whether you're in world (1) or (2), so you still have to "care" about both worlds. Moreover, in either world the "you" you care about is clearly the person in cell B, and so I think the only utility function that makes sense is S = $B. If you want to think about it in terms of either SSA-like or SIA-like assumptions, you get the same answer because both in world (1) and world (2) there is only a single observer who could be identified as "you".

Now, what if you didn't know whether you were in cell B or cell C? That's where things are a little different. In that case, there are two observers in world (1), either of whom could be "you". There are basically two different ways of assigning utility over the two different "yous" in world (1)---adding them together, like a total utilitarian, and averaging them, like an average utilitarian; the resulting values are x=2/3 and x=1/2 respectively. Moreover, the first approach is equivalent to SIA, and the second is equivalent to SSA.

However, the SSA answer has a property that none of the others do. If the gnome was to tell the human "you're in cell B", an SSA-using human would change their cutoff point from 1/2 to 2/3. This seems to be rather strange indeed, because whether the human is in cell B or in cell C is not in any way relevant to the payoff. No human with any of the other utility functions we've considered would change his/her answer upon being told that they are in cell B.

[-]Lumifer10y-10

time for further detail on the problem with pre-emptively submissive gnomes.

One of the aspects of what makes LW what it is -- people with serious expressions on their faces discuss the problems with pre-emptively submissive gnomes and nobody blinks an eye X-D

[-]lackofcheese10y00

I guess your comment means that you must have blinked an eye, so your comment can't be completely true. That said, as discussions of pre-emptively submissive gnomes go, I would generally expect the amount of eye-blinking on LW to be well below average ^_~

[-]Lumifer10y-10

I guess your comment means that you must have blinked an eye

I arched my eyebrow :-P

[-]Stuart_Armstrong10y10

I like your analysis. Interestingly, the gnomes advise in the T and A cases for completely different reasons than in the S case.

But let me modify the case slightly: now the gnomes adopt the utility function of the closest human. This makes no difference to the T and A cases. But now in the S case, the gnomes have a linked decision, and

E[S] = 0.25(-x) + 0.25(-x) + 0.5(1-x) = 0.5-x

This also seems to satisfy "1) Their utility functions coincide exactly over all a priori possible worlds. 2) The humans do not have any extra information that the gnomes do not." Also, the gnomes are now deciding the T, A and S cases for the same reasons (linked decisions).

[-]lackofcheese10y30

I don't think that works, because 1) isn't actually satisfied. The selfish human in cell B is indifferent over worlds where that same human doesn't exist, but the gnome is not indifferent.

Consequently, I think that as one of the humans in your "closest human" case you shouldn't follow the gnome's advice, because the gnome's recommendation is being influenced by a priori possible worlds that you don't care about at all. This is the same reason a human with utility function T shouldn't follow the gnome recommendation of 4/5 from a gnome with utility function IT. Even though these recommendations are correct for the gnomes, they aren't correct for the humans.

As for the "same reasons" comment, I think that doesn't hold up either. The decisions in all of the cases are linked decisions, even in the simple case of U = S above. The difference in the S case is simply that the linked nature of the decision turns out to be irrelevant, because the other gnome's decision has no effect on the first gnome's utility. I would argue that the gnomes in all of the cases we've put forth have always had the "same reasons" in the sense that they've always been using the same decision algorithm, albeit with different utility functions.

[-]Stuart_Armstrong10y20

Let's ditch the gnomes, they are contributing little to this argument.

My average ut=selfish argument was based on the fact that if you changed the utility of everyone who existed from one system to the other, then people's utilities would be the same, given that they existed.

The argument here is that if you changed the utility of everyone from one system to the other, then this would affect their counterfactual utility in the worlds where they don't exist.

That seems... interesting. I'll reflect further.

[-]lackofcheese10y30

Yep, I think that's a good summary. UDT-like reasoning depends on the utility values of counterfactual worlds, not just real ones.

[-]Stuart_Armstrong10y20

I'm starting to think this is another version of the problem of personal identity... But I want to be thorough before posting anything more.

[-]Stuart_Armstrong10y10

I think I'm starting to see the argument...

[-]Stuart_Armstrong10y20

To give a summary of my thoughts:

One cannot add agents to an anthropic situation and expect the situation to be necessarily unchanged. This includes agents that are not valued by anyone.
The submissive gnome problem initially gives x=$2/3 for selfish/average ut and x=$4/5 for total ut. This is wrong, but still has selfish and average ut at the same value.
A patch is added to change average ut and total ut but not selfish. This patch is essentially a pre-commitment. This patch is argued to not be available for selfish agents. This argument may be valid, but the reason the patch is not available needs to be made clear.
UDT attempts to reason without needing pre-commitment patches. Therefore even if the argument above is valid and patches can't be applied to gnomes of selfish agents, this does not preclude UDT from reaching a different answer from the above. UDT is compatible with pre-commitments, but that doesn't mean that UDT needs to be different if pre-commitments become impossible.
When discussing indexical vs non-indexical total utilitarianism, it seems to be argued that the first cannot be pre-commitment patched (stays at x=$4/5) while the second can (moves to x=$2/3).
It is not clear at all why this is the case, since the two utility functions are equal in every possible world, and, unlike the average ut=selfish situation, in impossible worlds as well (in the impossible worlds where identical agents reach different decisions). I see no valid reason that arguments available to one type of utility would not be available to the other.
In terms of worlds, these two utilities are just different ways of defining the same thing.
Without that difference between the indexical and non-indexical, the selfish=50%total ut+50%hater is valid.

Thus I don't think the argument works in its current form.

[-]lackofcheese10y10

There's some confusion here that needs to be resolved, and you've correctly pinpointed that the issue is with the indexical versions of the utility functions, or, equivalently, the gnomes who don't see a human at all.

I think I have a comprehensive answer to these issues, so I'm going to type it up now.

[-]Manfred10y10

I feel like you are searching for disconfirmation and then stopping.

E.g.

One cannot add agents to an anthropic situation and expect the situation to be necessarily unchanged.

It's true, one can't always add agents. But there are some circumstances under which one can add agents, and it is important to continue on and identify how this can work. It turns out that you have to add gnomes and then add information that lets the humans know that they're humans and the gnomes know that they're gnomes. This works because even in anthropic situations, the only events that matter to your probability assignment are ones that are consistent with your information.

[-]Stuart_Armstrong10y10

One cannot add agents to an anthropic situation and expect the situation to be necessarily unchanged.

The point of that is to allow me to analyse the problem without assuming the gnome example must be true. The real objections are in the subsequent points. Even if the gnomes argue something (still a big question what they argue), we still don't have evidence that the humans should follow them.

[-]Manfred10y20

still a big question what they argue

To be blunt, this is a question you can solve. Since it's a non-anthropic problem, though there is some danger in Beluga' analysis, vanilla UDT is all that's needed.

we still don't have evidence that the humans should follow them

The evidence goes as follows: The gnomes are in the same situation as the humans, with the same options and the same payoffs. Although they started with different information than the humans (especially since the humans didn't exist), at the time when they have to make the decision they have the same probabilities for payoffs given actions (although there's a deeper point here that could bear elaboration). Therefore the right decision for the gnome is also the right decision for the human.

This sounds an awful lot like an isomorphism argument to me... What sort of standard of evidence would you say is appropriate for an isomorphism argument?

[-]Stuart_Armstrong10y20

I'm convinced that this issue goes much deeper than it first seemed... I'm putting stuff together, and I'll publish a post on it soon.

[-]lackofcheese10y00

The deeper point is important, and I think you're mistaken about the necessary and sufficient conditions for an isomorphism here.

If a human appears in a gnome's cell, then that excludes the counterfactual world in which the human did not appear in the gnome's cell. However, on UDT, the gnome's decision does depend on the payoffs in that counterfactual world.

Thus, for the isomorphism argument to hold, the preferences of the human and gnome must align over counterfactual worlds as well as factual ones. It is not sufficient to have the same probabilities for payoffs given linked actions when you have to make a decision, you also have to have the same probabilities for payoffs given linked actions when you don't have to make a decision.

[-]Manfred10y10

Could you give a worked example of the correct action for the gnome with a human in their cell depending on the payoffs for the gnome without a human in their cell? (Assuming they know whether there's a human in their cell, and know the three different possible sets of payoffs for the available actions - if these constraints were relaxed I think it would be clearly doable. As it is I'm doubtful.)

[-]lackofcheese10y20

I already have a more detailed version here; see the different calcualtions for E[T] vs E[IT]. However, I'll give you a short version. From the gnome's perspective, the two different types of total utilitarian utility functions are:
T = total $ over both cells
IT = total $ over both cells if there's a human in my cell, 0 otherwise.
and the possible outcomes are
p=1/4 for heads + no human in my cell
p=1/4 for heads + human in my cell
p=1/2 for tails + human in my cell.

As you can see, these two utility functions only differ when there is no human in the gnome's cell. Moreover, by the assumptions of the problem, the utility functions of the gnomes are symmetric, and their decisions are also. UDT proper doesn't apply to gnomes whose utility function is IT, because the function IT is different for each of the different gnomes, but the more general principle of linked decisions still applies due to the obvious symmetry between the gnomes' situations, despite the differences in utility functions. Thus we assume a linked decision where either gnome recommends buying a ticket for $x.

The utility calculations are therefore
E[T] = (1/4)(-x) + (1/4)(-x) + (1/2)2(1-x) = 1-(3/2)x (breakeven at 2/3)
E[IT] = (1/4)(0) + (1/4)(-x) + (1/2)2(1-x) = 1-(5/4)x (breakeven at 4/5)

Thus gnomes who are indifferent when no human is present (U = IT) should precommit to a value of x=4/5, while gnomes who still care about the total $ when no human is present (U = T) should precommit to a value of x=2/3.

Note also that this is invariant under the choice of which constant value we use to represent indifference. For some constant C, the correct calculation would actually be
E[IT | buy at $x] = (1/4)(C) + (1/4)(-x) + (1/2)2(1-x) = (1/4)C + 1-(5/4)x
E[IT | don't buy] = (1/4)(C) + (1/4)(0) + (1/2)(0) = (1/4)C
and so the breakeven point remains at x = 4/5

[-]Manfred10y00

Thanks for giving this great example. This works because in the total utilitarian case (and average utilitarian, and other more general possibilities) the payoff of one gnome depends on the action of the other, so they have to coordinate for maximum payoff. This effect doesn't exist in any selfish case, which is what I was thinking about at the time. But this definitely shows that isomorphism can be more complicated than what I said.

[-]Stuart_Armstrong10y20

Ok, I don't like gnomes making current decisions based on their future values. Let's make it simpler: the gnomes have a utility function linear in the money owned by person X. Person X will be the person who appears in their (the gnome's) room, or, if no-one appeared, some other entity irrelevant to the experiment.

So now the gnomes have subjectively indistinguishable utility functions, and know they will reach the same decision upon seeing "their" human. What should this decision be?

If they advise "buy the ticket for price $x", then they expect to lose $x with probability 1/4 (heads world, they see a human), lose/gain nothing with probability 1/4 (heads world, they don't see a human), and gain $1-x with probability 1/2 (tails world). So this gives an expected gain of 1/2-(3/4)x, which is zero for x=$2/3.

So this seems to confirm your point.

"Not so fast!" shouts a voice in the back of my head. That second head-world gnome, the one who never sees a human, is a strange one. If this model is vulnerable, it's there.

So let's do without gnomes for a second. The incubator always creates two people, but in the heads world, the second person can never gain (nor lose) anything, no matter what they agree to: any deal is nullified. This seems a gnome setup without the gnomes. If everyone is an average utilitarian, then they will behave exactly as the total utilitarians would (since population is equal anyway) and buy the ticket for x<$2/3. So this setup has changed the outcome for average utilitarians. If its the same as the gnome setup (and it seems to be) then the gnome setup is interfering with the decisions in cases we know about. The fact that the number of gnomes is fixed is the likely cause.

I'll think more about it, and post tomorrow. Incidentally, one reason for the selfish=average utilitarian is that I sometimes model selfish as the average between total utilitarian incubator and anti-incubator (where the two copies hate each other in the tail world). 50%-50% on total utilitarian vs hatred seems to be a good model of selfishness, and gives the x<$1/2 answer.

[-]Beluga10y20

Thanks for your reply.

Ok, I don't like gnomes making current decisions based on their future values.

For the selfish case, we can easily get around this by defining the gnome's utility function to be the amount of $ in the cell. If we stipulate that this can only change through humans buying lottery tickets (and winning lotteries) and that humans cannot leave the cells, the gnome's utility function coincides with the human's. Similarly, we can define the gnome's utility function to be the amount of $ in all cells (the average amount of $ in those cells inhabited by humans) in the total (average) utilitarian case.

This seems to be a much neater way of using the gnome heuristic than the one I used in the original post, since the gnome's utility function is now unchanging and unconditional. The only issue seems to be that before the humans are created, the gnome's utility function is undefined in the average utilitarian case ("0/0"). However, this is more a problem of average utilitarianism than of the heuristic per se. We can get around it by defining the utility to be 0 if there aren't any humans around yet.

The incubator always creates two people, but in the heads world, the second person can never gain (nor lose) anything, no matter what they agree to: any deal is nullified. This seems a gnome setup without the gnomes. If everyone is an average utilitarian, then they will behave exactly as the total utilitarians would (since population is equal anyway) and buy the ticket for x<$2/3. So this setup has changed the outcome for average utilitarians. If its the same as the gnome setup (and it seems to be) then the gnome setup is interfering with the decisions in cases we know about. The fact that the number of gnomes is fixed is the likely cause.

I don't follow. As I should have written in the original post, total/average utilitarianism includes of course the wellbeing and population of humans only, not of gnomes. Otherwise, it's trivial that the presence of gnomes affects the conclusions. That the presence of an additional human affects the conclusion for average utilitarians is not surprising, since in contrast to the presence of gnomes, an additional human changes the relevant population.

Incidentally, one reason for the selfish=average utilitarian is that I sometimes model selfish as the average between total utilitarian incubator and anti-incubator (where the two copies hate each other in the tail world). 50%-50% on total utilitarian vs hatred seems to be a good model of selfishness, and gives the x<$1/2 answer.

Hm, so basically one could argue as follows against my conclusion that both selfish and total utilitarians pay up to $2/3: A hater wouldn't pay anything for a ticket that pays $1 in the tails world. Since selfishness is a mixture of total utilitarianism and hating, a selfish person certainly cannot have the same maximal price as a total utilitarian.

However, I feel like "caring about the other person in the tail world in a total utilitarian sense" and "hating the other person in the tail world" are not exactly mirror images of each other. The difference is that total utilitarianism is lexicality-independent, while "hating the other person" isn't. My claim is: However you formalize "hating the person in the other room in the tail world" and "being a total utilitarian", the statements "a total utilitarian pays up to $2/3" and "selfishness is a mixture of total utilitarianism and hating" and "a hater would not pay more than $0 for the ticket" are never simultaneously true.

Imagine that the human formally writes down their utility function in order to apply the "if there were a gnome in my room, what maximal prize to pay would it advise me after asking itself what advice it would have precommited to?" heuristic. We introduce the variables 'vh' and 'vo' for "$-value in this/the other room". These are 0 if there's no human, -x after buying a ticket after head, and 1-x after buying a ticket after tail. We also introduce a variable 't' which is 1 after tail and 0 after head.

We can then write down the following utility functions with their respective expectation values (from the point of view of the gnome before the coin flip):

egoist: vh => 1/4 * (-x+0+(1-x)+(1-x))

total ut.: vh + t vo => 1/4 (-x+0+2 (1-x)+2 (1-x))

hate: vh - t vo => 1/4 (-x+0+0+0)

Here, we see that egoism is indeed a mixture of total utilitarianism and hating, the egoist pays up to 2/3, and the hater pays nothing. However, according to this definition of total utilitarianism, a t.u. should pay up to 4/5. Its utility function is lexicality-dependent (the variable t enters only the utility coming from the other person), in contrast to true total utilitarianism.

In order to write down a lexicality-independent utility function, we introduce new variables 'nh' and 'no', the number of people here and in the other room (0 or 1). Then, we could make the following definitions:

egoist: nh vh
total ut.: nh vh + no vo
hate: nh vh - no * vo

(The 'nh' and 'no' factors are actually redundant, since 'vh' is defined to be zero if 'nh' is.)

With these definitions, both an egoist and a t.u. pay up to 2/3 and egoism is a mixture of t.u. and hating. However, the expected utility of a hater is now 0 independent of x, such that there is no longer a contradiction. The reason is that we now count the winnings of the single head-human one time positively (if ze is in our room) and one time negatively (if ze is in the other room). This isn't what we meant by hating, so we could modify the utility function of the hater as follows:

hate: nh (vh - no vo)

This reproduces again what we mean by hating (it is equivalent to the old definition 'vh - t * vo'), but now egoism is no longer a combination of hating and t.u..

In conclusion, it doesn't seem to be possible to derive a contradiction between "a hater wouldn't pay anything for a lottery ticket" and "both egoists and total utilitarians would pay up to $2/3".

[-]Stuart_Armstrong10y10

The broader question is "does bringing in gnomes in this way leave the initial situation invariant"? And I don't think it does. The gnomes follow their own anthropic setup (though not their own preferences), and their advice seems to reflect this fact (consider what happens when the heads world has 1, 2 or 50 gnomes, while the tails world has 2).

I also don't see your indexical objection. The sleeping beauty could perfectly have an indexical version of total utilitarianism ("I value my personal utility, plus that of the sleeping beauty in the other room, if they exist"). If you want to proceed further, you seem to have to argue that indexical total utilitarianism gives different decisions than standard total utilitarianism.

This is odd, as it seems a total utilitarian would not object to having their utility replaced with the indexical version, and vice-versa.

[-]Beluga10y10

The broader question is "does bringing in gnomes in this way leave the initial situation invariant"? And I don't think it does. The gnomes follow their own anthropic setup (though not their own preferences), and their advice seems to reflect this fact (consider what happens when the heads world has 1, 2 or 50 gnomes, while the tails world has 2).

As I wrote (after your comment) here, I think it is prima facie very plausible for a selfish agent to follow the gnome's advice if a) conditional on the agent existing, the gnome's utility function agrees with the agent's and b) conditional on the agent not existing, the gnome's utility function is a constant. (I didn't have condition b) explicitly in mind, but your example showed that it's necessary.) Having the number of gnomes depend upon the coin flip invalidates their purpose. The very point of the gnomes is that from their perspective, the problem is not "anthropic", but a decision problem that can be solved using UDT.

I also don't see your indexical objection. The sleeping beauty could perfectly have an indexical version of total utilitarianism ("I value my personal utility, plus that of the sleeping beauty in the other room, if they exist"). If you want to proceed further, you seem to have to argue that indexical total utilitarianism gives different decisions than standard total utilitarianism.

That's what I tried in the parent comment. To be clear, I did not mean "indexical total utilitarianism" to be a meaningful concept, but rather a wrong way of thinking, a trap one can fall into. Very roughly, it corresponds to thinking of total utilitarianism as "I care for myself plus any other people that might exist" instead of "I care for all people that exist". What's the difference, you ask? A minimal non-anthropic example that illustrates the difference would be very much like the incubator, but without people being created. Imagine 1000 total utilitarians with identical decision algorithms waiting in separate rooms. After the coin flip, either one or two of them are offered to buy a ticket that pays $1 after heads. When being asked, the agents can correctly perform a non-anthropic Bayesian update to conclude that the probability of tails is 2/3. An indexical total utilitarian reasons: "If the coin has shown tails, another agent will pay the same amount $x that I pay and win the same $1, while if the coin has shown heads, I'm the only one who pays $x. The expected utility of paying $x is thus 1/3 (-x) + 2/3 2 * (1-x)." This leads to the incorrect conclusion that one should pay up to $4/5. The correct (UDT-) way to think about the problem is that after tails, one's decision algorithm is called twice. There's only one factor of 2, not two of them. This is all very similar to this post.

To put this again into context: You argued that selfishness is a 50/50 mixture of hating the other person, if another person exists, and total utilitarianism. My reply was that this is only true if one understands total utilitarianism in the incorrect, indexical way. I formalized this as follows: Let the utility function of a hater be vh - h vo (here, vh is the agent's own utility, vo the other person's utility, and h is 1 if the other person exists and 0 otherwise). Selfishness would be a 50/50 mixture of hating and total utilitarianism if the utility function of a total utilitarian were vh + h vo. However, this is exactly the wrong way of formalizing total utilitarianism. It leads, again, to the conclusion that a total utilitarian should pay up to $4/5.

[-]Stuart_Armstrong10y20

A minimal non-anthropic example that illustrates the difference

The decision you describe in not stable under pre-commitments. Ahead of time, all agents would pre-commit to the $2/3. Yet they seem to change their mind when presented with the decision. You seem to be double counting, using the Bayesian updating once and the fact that their own decision is responsible for the other agent's decision as well.

In the terminology of paper http://www.fhi.ox.ac.uk/anthropics-why-probability-isnt-enough.pdf , your agents are altruists using linked decisions with total responsibility and no precommitments, which is a foolish thing to do. If they were altruists using linked decisions with divided responsibility (or if they used precommitments), everything would be fine (I don't like or use that old terminology - UDT does it better - but it seems relevant here).

But that's detracting from the main point: still don't see any difference between indexical and non-indexical total utilitarianism. I don't see why a non-indexical total utilitarian can't follow the wrong reasoning you used in your example just as well as an indexical one, if either of them can - and similarly for the right reasoning.

[-]Beluga10y00

The decision you describe in not stable under pre-commitments. Ahead of time, all agents would pre-commit to the $2/3. Yet they seem to change their mind when presented with the decision. You seem to be double counting, using the Bayesian updating once and the fact that their own decision is responsible for the other agent's decision as well.

Yes, this is exactly the point I was trying to make -- I was pointing out a fallacy. I never intended "lexicality-dependent utilitarianism" to be a meaningful concept, it's only a name for thinking in this fallacious way.

[This comment is no longer endorsed by its author]Reply

[-]Manfred10y10

Needless to say that all the bold statements I'm about to make are based on an "inside view". [...]

Spare us :P Not only are Stuart's advantages not really that big, but it's worthwhile to discuss things here. Something something title of this subreddit.

The consensus view on LW seems to be that much of the SSA vs. SIA debate is confused and due to discussing probabilities detached from decision problems of agents with specific utility functions.

Hm, this makes me sad, because it means I've been unsuccessful. I've been trying to hammer on the fact that an agent's probability assignments are determined by the information it has. Since SSA and SIA describe pieces of information ("being in different worlds are mutually exclusive and exhaustive events" and "being different people are mutually exclusive and exhaustive events"), quite naturally they lead to assigning different probabilities. If you specify what information your agent is supposed to have, this will answer the question of what probability distribution to use.

[-]Stuart_Armstrong10y00

Not only are Stuart's advantages not really that big

My advantages might be bigger than you think... oops, I've just been informed that this is not actually a penis-measuring competition, but an attempt to get at a truth. ^_^ Please continue.

[-]Stuart_Armstrong10y10

Thanks for engaging with my paper ^_^ I will think about your post and construct a more detailed answer.

The consensus view on LW seems to be that much of the SSA vs. SIA debate is confused and due to discussing probabilities detached from decision problems of agents with specific utility functions.

Really? That's my view, but I didn't know it had spread!

[-]Stuart_Armstrong10y00

Right now lets modify the setup a bit, targeting that one vulnerable gnome who sees no human in the heads world.

First scenario: there is no such gnome. The number of gnomes is also determined by the coin flip, so every gnome will see a human. Then if we apply the reasoning from http://lesswrong.com/r/discussion/lw/l58/anthropic_decision_theory_for_selfish_agents/bhj7 , this will result with a gnome with a selfish human agreeing to x<$1/2.

Instead, let's now make the gnome in the head world hate the other human, if they don't have one themselves. The result of this is that they will agree to any x<$1, as they are (initially) indifferent to what happens in the heads world (potential gains, if they are the gnome with a human, as cancelled out by the potential loss, if they are the gnome without the human).

So it seems to me that the situation is most likely an artefact of the number and particular motivations of the gnomes (notice I never changed the motivations of gnomes who would encounter a human, only the "unimportant extra" one).

[-]Beluga10y10

First scenario: there is no such gnome. The number of gnomes is also determined by the coin flip, so every gnome will see a human. Then if we apply the reasoning from http://lesswrong.com/r/discussion/lw/l58/anthropic_decision_theory_for_selfish_agents/bhj7 , this will result with a gnome with a selfish human agreeing to x<$1/2.

If the gnomes are created after the coin flip only, they are in exactly the same situation like the humans and we cannot learn anything by considering them that we cannot learn from considering the humans alone.

Instead, let's now make the gnome in the head world hate the other human, if they don't have one themselves. The result of this is that they will agree to any x<$1, as they are (initially) indifferent to what happens in the heads world (potential gains, if they are the gnome with a human, as cancelled out by the potential loss, if they are the gnome without the human).

What this shows is that "Conditional on me existing, the gnome's utility function coincides with mine" is not a sufficient condition for "I should follow the advice that the gnome would have precommited to give".

What I propose is instead: "If conditional on me existing the gnome's utility function coincides with mine, and conditional on me not existing the gnome's utility function is a constant, then I should follow the advice that the gnome would have precommited to."

ETA: Speaking of indexicality-dependent utility functions here. For lexicality-independent utility functions, such as total or average utilitarianism, the principle simplifies to: "If the gnome's utility function coincides with mine, then I should follow the advice that the gnome would have precommited to."

[-]Stuart_Armstrong10y10

I'm still not clear why lexicality-independent utility functions are different from their equivalent indexical versions.

[-]Beluga10y10

I elaborated on this difference here. However, I don't think this difference is relevant for my parent comment. With indexical utility functions I simply mean selfishness or "selfishness plus hating the other person if another person exists", while with lexicality-independent utility functions I meant total and average utilitarianism.

Moderation Log