# 14

Personal Blog

Imagine it's the future, and everything has gone according to plan. Humanity has worked out its own utility function, f0, and has worked out a strategy S0 to optimize it.

Humanity has also run a large number of simulations of how alien worlds evolve. It has determined that of those civilizations which reach the same level of advancement - that know their own utility function and have a strategy for optimizing it - there is an equal probability that they will end up with each of 10 possible utility functions. Call these f0...f9.

(Of course, these simulations are coarse-grained enough to satisfy the nonperson predicate).

Humanity has also worked out the optimal strategy S0...S9 for each utility function. But they just happen to score poorly on all of the others:

fi(Si) = 10
fi(Sj) = 1 for i != j

In addition, there is a compromise strategy C:

fi(C) = 3 for all i.

The utility functions, f0 through f9, satisfy certain properties:

They are altruistic, in the sense that they care just as much about far-away aliens that they can't even see as they do about members of their own species.

They are additive: if one planet implements Sj and another implements Sk, then:
fi(Sj on one planet and Sk on the other) = fi(Sj) + fi(Sk).

(This is just to make things easier - the problem I'm describing will still apply in cases where this rule doesn't hold).

They are non-negotiable. They won't "change" if that civilization encounters aliens with a different utility function. So if two of these civilisations were to meet, we would expect it to be like the humans and the babyeaters: the stronger would attempt to conquer the weaker and impose their own values.

In addition, humanity has worked out that it's very likely that a lot of alien worlds exist, i.e. aliens are really really real. They are just too far away to see or exist in other Everett branches.

So given these not entirely ridiculous assumptions, it seems that we have a multiplayer prisoner's dilemma even though none of the players has any causal influence on any other. If the universe contains 10 worlds, and each chooses its own best strategy, then each expects to score 19. If they all choose the compromise strategy then each expects to score 30.

Anyone else worried by this result, or have I made a mistake?

Personal Blog

# 14

New Comment
Some comments are truncated due to high volume. Change truncation settings
[-][anonymous]9y 5

I'm not worried by the result because there are two very implausible constraints: the number of possible utility functions and the utility of the compromise strategy. Given that there are, in fact, many possible utility functions, it seems really really unlikely that there is a strategy that has 3/10 the utility of the optimal strategy for every possible utility function. Additionally, some pairs of utility functions won't be conducive to high-utility compromise strategies. For example: what if one civilization has paperclip maximization as a value, and an... (read more)

2Giles9yI'm still not sure this is right. You have to consider not just fi(Si) but all the fi(Sj)'s as well, i.e. how well each strategy scores under other planets' utility functions. So I think the relevant cutoff here is 1.9 - a compromise strategy that does better than that under everyone's utility function would be a win-win-win. The number of possible utility functions isn't important, just their relative probabilities. You're right that it's far from obvious that such a compromise strategy would exist in real life. It's worth considering that the utility functions might not be completely arbitrary, as we might expect some of them to be a result of systematizing evolved social norms. We can exclude UFAI disasters from our reference class - we can choose who we want to play PD with, as long as we expect them to choose the same way.
0amcknight9yIt's a toy example but doesn't it still apply if you have an estimate of the expected distribution of instances that will actually be implemented within mind space? The space of possible minds is vast, but the vast majority of those minds will not be implemented (or extremely less often). The math would be much more difficult but couldn't you still estimate it in principle? I don't think your criticism actually applies.

Edit: This comment is retracted. My comment is wrong, primarily because it misses the point of the post, which simply presents a usual game theory-style payoff matrix problem statement. Thanks to Tyrrell McAllister for pointing out the error, apologies to the readers. See this comment for details. (One more data point against going on a perceptual judgement at 4AM, and not double-checking own understanding before commenting on a perceived flaw in an argument. A bit of motivated procrastination also delayed reviewing Tyrrell's response.)

Humanity has also

3Tyrrell_McAllister9y[...] The post calls the functions f_i "utility functions", not "expected utility functions". So, I take Giles to be pursuing your "alternative" approach. However, I don't think that f_i(S_j) denotes the total utility of a state of the universe. It is just one of the terms used to compute such a total utility. From the comments about additivity, I take f_i(S_j) to be the amount by which the utility of a universe to species i would increase if a planet following strategy j were added to it (while the strategies of all other planets remained unchanged), regardless of how or by whom that planet is added. Giles's question, as I understand it, is, how should these "utility" terms be incorporated into an expected utility calculation? For example, what should the probability weights say is the probability that species i will produce a planet following the compromise strategy, given that we do?
3Vladimir_Nesov9yYou are right, I retract my comment [http://lesswrong.com/lw/8fz/babyeaters_dilemma/59ml]. (As an aside, some terminological confusion can result from there being a "utility relation" that compares lotteries, that can be represented by a "utility function" that takes lotteries as inputs, and separately expected utility representation of utility relation (or of "utility function") that breaks it down into a probability distribution and a "utility function" in a different sense, that takes pure outcomes as inputs.) Right. Or, more usefully (since we can't actually add planets), the utility function of aliens #k that takes a collection S of strategies for each of the planets under consideration (i.e. a state of the world) is F_k (S) = sum_p f_k(S_p) Then, the decision problem is to maximize expected value of F_0(S) by controlling S_0, a standard game theory setting. It's underdetermined only to the extent PD is underdetermined, in that you should still defect against CooperationBots or DefectBots, etc.
1wedrifid9yI'm a little less confident with the period for 'expected' but that is a whole different philosophical issue to the one important here!
0Vladimir_Nesov9yI believe the issue discussed in the post doesn't exist, and only appears to be present because of the confusion described in my comment. [Edit: I believe this no longer, see the edit to the original comment [http://lesswrong.com/lw/8fz/babyeaters_dilemma/59ml].] (I'm actually not sure what you refer to by "that philosophical issue", "one issue discussed here" and what you are less confident about.)
1wedrifid9yIt is not absolutely determined that finding that multiplying the probability of universe-state by the value of it is what must be done period. Another relationship between probabilities, values for states of the universe and behavior could actually be legitimate. I noted that this is an obscure philosophical question that is not intended to detract from your point.
0Vladimir_Nesov9yRight; since probabilities (and expected utility axioms) break in some circumstances (for decision-theoretic purposes), expected utility of the usual kind isn't fundamental, but its role seems to be. (I did anticipate this objection/clarification, see the parenthetical about utility failing to factor as expectation of a utility function...)

The altruistic assumption given here seems implausible for a utility function ultimately derived from evolution, so while it's an interesting exercise I'm not sure there's anything to be worried about in practice.

I think this result means that you understand the true prisoner's dilemma and acausal trade.

4SilasBarta9yI think acausal trade is just a special case of TDT-like decision theories, which consider "acausal consequences" of your decisions. That is, you reason in the following form, "If I were to output X in condition Y, so would all other sufficiently similar instantiations of me (including simulations). Therefore, in gauging the relative impact of my actions, I must also include the effect of all those instantiations outputting X." "Sufficiently similar" includes "different but symmetric" conditions like those described here, i.e., where you have different utility functions, but are in the same position with respect to each other. In this case, the "acausal trade" argument is that, since everyone would behave symmetrically to you, and you would prefer that everyone do the 3-utility option, you should do it yourself, because it would entail everyone else doing so -- even though your influence on the others is not causal.
1amcknight9yThanks! Is anything similar to acausal trade discussed anywhere outside of LessWrong? Coming up with the simplest case where acausal trade may be required seems like a thought experiment that (at least) philosophers should be aware of.
4SilasBarta9yThat I don't know, and I hope someone else (lukeprog?) fills it in with a literature review. I do, however, want to add a clarification: TDT-like decision theories are the justification for engaging in "acausal trade", while acausal trade itself refers to the actions you take (e.g. the 3-utility option) based on such justifications. (I blurred it a little by calling acausal trade a decision theory.) Glad to have clarified the issue for and saved time for those who were wondering the same thing.
5[anonymous]9yI've read all the literature on TDT that I can find, but I still find that I disagree with the people in this thread who claim that the compromise strategy is recommended by TDT in this problem. Here is Yudkowsky's brief summary of TDT: In the TDT pdf document, he also says: This refers to the idea that in a Pearlian causal graph, knowing the accurate initial physical state of two causally isolated but physically identical calculators, which are both poised to calculate 678x978, doesn’t (or shouldn’t) allow us to screen them off from each other and render them probabilistically independent. Knowing their physical state doesn’t imply that we know the answer to the calculation 678x978 – and if we press the “equals” button on one calculator and receive the answer 669186, this leads us to believe that this will be the answer displayed when we press the equals button on the other, causally isolated calculator. Since knowing their initial physical state entirely does in fact cause us to screen off the two calculators in the causal graph, as such a graph would normally be drawn, we are led to conclude that the standard way of drawing a causal graph to represent this scenario is simply wrong. Therefore Yudkowsky includes another “latent” node with arcs to each of the calculator outputs, which represents the “platonic output” of the computation 678x987 (about which we are logically uncertain despite our physical knowledge of the calculators). The latent node “AndyPlatonic” referred to by Yudkowsky in that quote is similar to the latent node representing the output of the platonic computation 678x987, except that in this case the computation is the computation implemented in an agent's brain that determines whether he takes one or two boxes, and the causal graph is the one used by a TDT-agent in Newcomb’s problem. So on the one hand we have an abstract or platonic computation “678x987” which is very explicit and simple, then later on page 85 of the TDT document we are
2endoself9yThe links from http://wiki.lesswrong.com/wiki/Decision_theory [http://wiki.lesswrong.com/wiki/Decision_theory] should cover most of the main ideas. There are both more basic and more advanced ones, so you can read as many as appropriate to your current state of knowledge. It's not all relevant, but most of what is relevant is at least touched on there.
3[anonymous]9yOr rather: to understand this result means that you understand acausal trade. To agree with this result requires that you agree with the idea of acausal trade, as well.
1endoself9yYes, that is what I meant. Were you confused by my less rigourous style, are you trying to point out that one can understand acausal trade without agreeing with it, or are you asking for clarification for some other reason?
0[anonymous]9yThe second.
0endoself9yI apologize for any implications of condescension in my comment. I think you are wrong, but I encourage you to present your ideas, if you want to.
2[anonymous]9yYou... think it is impossible to understand acausal trade without agreeing with it?
0endoself9yI think that acausal trade is a valid way of causing things to happen (I could have phrased that differently, but it is causation in the Pearlian sense). I think that this is somewhat value-dependent, so a general agent in reflective equilibrium need not care about acausal effects of its actions, but I think that, if it makes any sense to speak of a unique or near-unique reflective equilibrium for humans, it is very likely that almost all humans would agree with acausal trade in their reflective equilibria.
0endoself9ySomeone downvoted all my comments in this thread. This is the first time this has happened to me. I am not sure what exactly they meant to discourage. What is the proper procedure in this case?
0endoself9yThank you for this. Even if this is not why someone had a negative reaction toward me, I appreciate such feedback. I am definitely not trying to be cryptic. There are a lot of posts about decision theory on LW going back a few years, which resulted in the (continuing) development of updateless decision theory. It is a fascinating subject and it is about, among other things, exactly the same topic that this post covered. I expect lesswrongers discussing decision theory to be aware of what has already been done on this website. By your metric, I fear this may sound as dismissive as the rest of what I wrote. Does it?
0Viliam_Bur9yThis is why Eliezer [http://lesswrong.com/user/Eliezer_Yudkowsky/] always [http://lesswrong.com/user/Eliezer_Yudkowsky/] uses [http://lesswrong.com/user/Eliezer_Yudkowsky/] hyperlinks [http://lesswrong.com/user/Eliezer_Yudkowsky/], even [http://lesswrong.com/user/Eliezer_Yudkowsky/] when [http://lesswrong.com/user/Eliezer_Yudkowsky/] sometimes [http://lesswrong.com/user/Eliezer_Yudkowsky/] it [http://lesswrong.com/user/Eliezer_Yudkowsky/] seems [http://lesswrong.com/user/Eliezer_Yudkowsky/] strange [http://lesswrong.com/user/Eliezer_Yudkowsky/]. :D The LessWrong site is too big, and many people are not here from the beginning. With so many articles even people who seriously try to read the Sequences can miss a few ideas. No it doesn't. I feel I understand this comment completely. Thanks for not being angry for my comment, because by standard metric it was impolite. Somehow I felt the information is more important... and I am happy you took it this way.
0endoself9yThank you for this advise. I will definitely try to hyperlink a lot more in the future. There's a good chance I went back and edited a few things after writing this sentence. :) I think this type of feedback should be the norm here. It might just be me [http://lesswrong.com/lw/dr/generalizing_from_one_example/], but I think the number of LWers who would appreciate this type of constructive criticism is greater than the number who would be offended, especially after weighting based on commenting frequency.
1Viliam_Bur9yThis type of feedback [http://wiki.lesswrong.com/wiki/Crocker%27s_rules] can be invited explicitly in a comment. It was suggested [http://lesswrong.com/lw/5by/official_less_wrong_redesign_call_for_suggestions/3zkf] that LW users should be able to invite it permanently through a user profile, but this suggestion was not implemented yet.
2XiXiDu9yIf, upon reflection, you have no clue why you have been downvoted, then I suggest to ignore the information as noise and continue to to express your point (maybe more thoroughly in future, in case someone just misunderstood you). I would recommend to do this until someone explains why they think that you are wrong (at least if you don't value your karma score more than additional information on why you might be mistaken).
0endoself9yI think the ideas that I was expressing were rather representative of the LWers who think a lot about decision theory, so I don't expect to encounter someone who opposes them this strongly very often. I have a few theories that might explain why I was downvoted, but none are particularly probable and none give me reason to change my mind about decision theory.

I'm not sure if this qualifies as a mistake per se, but it seems very implausible to me that the only advanced civilization-enabling utility functions are altruistic towards aliens. Is there evidence in favor of that hypothesis?

2antigonus9yHmm, on second thought, I'm not sure this is a big deal. Even if the vast majority of civilization-enabling utility functions are xenophobic, we can still play PD with those that aren't. And if Everett is correct, there are presumably still lots of altruistic, isolated civilizations.
0Giles9ySorry, yes - this is what I meant. I should have made that clearer.

I don't expect that humans, on meeting aliens, would try to impose our ethical standards on them. We generally wouldn't see their minds as enough like ours to see their pain as real pain. The reason I think this is that very few people think we should protect all antelopes from lions, or all dolphins from sharks. So the babyeater dillemma seems unrealistic to me.

-1TimS9yA person who decides not to save a deer from a wolf has committed no moral failing. But a person does commit an immoral choice by deciding not to save a human from the wolf. Both deer and human feel pain, so I think a better understanding is that only individual creatures that can (or potentially could) think recursively are entitled to moral weight. If aliens can think recursively, then that principle states that a human would make an immoral choice not to save an alien from the wolf. If we ran into an alien species that disagreed with that principle (e.g. the Babyeaters), wouldn't we consider them immoral?
2summerstay9yMaybe the antelope was a bad example because they aren't intelligent enough or conscious in the right way to deserve our protection. So let's limit the discussion to dolphins. There are people who believe that humans killing dolphins is murder, that dolphins are as intelligent as people, just in a different way. Whether or not you agree with them, my point is that even these people don't advocate changing how the dolphins live their lives, only that we as humans shouldn't harm them. I imagine our position with aliens would be similar: for humans to do them harm is morally wrong for humans, but they have their own way of being and we should leave them to find their own way.
1wedrifid9yThinking recursively sounds like the wrong word for a concept that you are trying to name. My computer programs can think recursively. It wouldn't surprise me if certain animals could too, with a sufficiently intelligent researcher to come up with tests.
1smk9y-Harry Potter and the Methods of Rationality -Harry Potter and the Methods of Rationality I suppose these two quotes might just be referring to a confused idea that Eliezer only put in his story for fun... but then again maybe not?
-2TimS9yI'm trying to label the capacity of humans to create proofs like Godel's incompleteness proofs [http://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems] or the halting problem [http://en.wikipedia.org/wiki/Halting_problem]. Cats and cows cannot create proofs like these, and it doesn't seem to be a shortfall in intelligence. Is there a better label you would suggest?
0[anonymous]9yWhat makes those proofs any different from proofs of other mathematical theorems? I imagine that the halting problem, in particular, would not be beyond the capability of some existing automated theorem prover, assuming you could encode the statement; its proof isn't too involved. If your argument is that humans understand these proofs because of some magical out-of-the-box-thinking ability, then I am skeptical.
0summerstay9yDolphins do in fact engage in infanticide, among other behaviors we would consider evil if done by a human. But no one suggests we should be policing them to keep this from happening.

An interesting idea, but I'm afraid the idea is little more than interesting. Given all your premises, it does follow that compromise would be the optimal strategy, but I find some of them unlikely:

• That there is a small, easily computable number of potential utility functions, like 10 as opposed to 10^(2^100)
• I have qualms with the assumption that these computed utility functions are added. I would more readily accept them being mutually exclusive (e.g. one potential utility function is "absorb all other worlds" or "defect in all inter-spe

You are worried that, given your assumptions, civilizations might not be willing to pay an extremely high price to do things that aliens would like if they knew about them, which they don't.

But one of your assumptions is that every civilization has a moral system that advocates attacking and enslaving everyone they meet who thinks differently from them.

It would be worrying if a slightly bad assumption led to a very bad conclusion, but a very bad assumption leading to a slightly bad conclusion doesn't strike me as particularly problematic.

[-][anonymous]9y 2

If some far away world implements utility function f2 and strategy S2, I intuitively feel like I ought to care more about f2(S2) than about f0(S2), even if my own utility function is f0. Provided, of course, that S2 doesn't involve destroying my part of the world.

2Giles9yOut of interest, have you read the Three Worlds Collide [http://lesswrong.com/lw/y4/three_worlds_collide_08/] story?
4[anonymous]9yI have, and my intuitive feeling of the "right thing to do" is similar there as well: I have no problem with leaving the baby-eating aliens alone, with some qualifications to the effect of assuming they are not somehow mistaken about their utility function.

To make this more interesting, interpret it as a true prisoner's dilemma. I.E. the aliens care about something stupid like maximizing paperclips.

6Giles9yI consider this to be a true prisoner's dilemma already (basically, any prisoner's dilemma is true if it's written out in numbers and you believe that the numbers really capture everything). You can make it more paperclippy by substituting fi(Sj) = 0.
[-][anonymous]9y 1

Anyone else worried by this result, or have I made a mistake?

To update my reply to Silas Barta, after a little reflection I would say this:

The various species are supposed to possess common knowledge of each other's utility functions, and of each other's epistemic beliefs about how these utility functions can be satsified.

Since the various species' preferences are described by utility functions, we must assume that each species has self-modified collectively (or so the humans believe) such that they collectively obey the von Neumann-Morgenstern axioms ... (read more)

Anyone else worried by this result, or have I made a mistake?

This seems correct.

An interesting question. Some thoughts here:

1. Does this type of reasoning mean it is a good idea to simulate lots of alien civilizations (across lots of different worlds), to see what utility functions emerge, and how frequently each type emerges?

2. It seems like detailed simulation is quite a sensible strategy anyway, if we're utility trading (detailed enough to create conscious beings). We could plausibly assume that each utility function f(i) assigns positive utility to the aliens of type (i) existing in a world, as long as their welfare in that world ex

If you knew something about the expected development process of potential alien civilizations and could use that information to estimate a probability of them defecting in case like this, then which utility functions would you include in your set of aliens to cooperate with? Roughly, should you cooperate with each civilization proportional to your expectation of each civilization cooperating? Also, should you cooperate proportional to the number of expected civilizations implementing each utility function?

This seems unavoidable to me, as long as you first ... (read more)

0Vladimir_Nesov9yIf aliens cooperate in a way independent on your decision, you should defect. Only if they cooperate conditional on your cooperation, it might make sense to cooperate. That is, who cooperates unconditionally is irrelevant. (Which of these do you mean? I can't tell, taken literally you seem to be talking about unconditional cooperation.)
0amcknight9yWhat I should have been saying is that the ones that cooperate conditionally are the ones that would matter. (I wasn't even thinking about conditional and unconditional cooperation, at the time.)

none of the players has any causal influence on any other.

An aggregate score without any causal influence is meaningless. Without influence on each other, each should pursue its own best interest, not some meaningless compromise solution.

2wedrifid9yHow many boxes do you take in Newcomb's problem?
0billswift9yOne. Either he can predict my actions and is honest and I get a million dollars, or he can't and I don't lose anything, but don't get a thousand dollars, and get to laugh at him. (Note that I think one of the reasons the problem is bogus is you are restricted by the conditions of the problem from considering any gain (ie, get to laugh at him) except the cash, which is too unrealistic to matter.) (Also note that this is from memory and I think Newcomb's and similar problems are bogus enough that I haven't read any posts, or anything else, about them in well over a year.)
0wedrifid9ySo your objection (correct me if I am wrong) is that it makes no sense to value what the other aliens do because what they do doesn't effect you in any way. You don't have a problem with acting as if your own behavior determines what the aliens do, given that they decide their actions based on a reliable prediction of what you will do. You just don't care. You reject the premise about your own utility function depending on them as a silly utility function.
0billswift9yYour first sentence is a fair description. As your second sentence admits, What they do is their decision. Letting their decision influence me in the way you seem to support is no different than giving in to "emotional blackmail". Your final sentence makes no sense, I cannot figure out what you mean by it.
1lessdazed9yYou reject premise 1), where premise 1) is: your utility function depends on them, i.e."They are altruistic". You reject it because you think any utility function with premise 1) is silly.
1wedrifid9yI didn't advocate one way or the other. Utility functions simply happened to be provided for us in the post so for me the question becomes one of abstract reasoning. But I certainly don't object to you mentioning an aversion to utility functions of the type given. It is a relevant contribution and a legitimate perspective. I read it, it makes sense. See above for more explanation. Or simple trade. They have preferences about the future state of their local environment and preferences about the future state of your local environment (for whatever unspecified reason). You are in a symmetrical situation. You discover that you can get an overall higher utility by doing some slightly worse things in your local environment in exchange for them doing some more preferred things where they live. This doesn't exclude them outright synthesizing humans! Or trade.
[-][anonymous]9y 0

Humanity has also run a large number of simulations of how alien worlds evolve. It has determined that of those civilizations which reach the same level of advancement - that know their own utility function and have a strategy for optimizing it - there is an equal probability that they will end up with each of 10 possible utility functions.

No, humanity isn't going to do that. We'd be exposing ourselves to blackmail from any simulated world whose utility function had certain properties -- it'd be summoning a basilisk. For humanity's utility function in particular, there is an asymmetry such that the potential losses from acausal trade dramatically outweigh the potential gains.

[This comment is no longer endorsed by its author]Reply
[-][anonymous]9y 0

Assuming that TDT doesn't apply, the fact that we would be in a prisoner's dilemma is irrelevant. The only rational option for humanity would be to defect by maximising its local utility - whether humans defect or cooperate in the dilemma has no effect on what the aliens choose to do.

So really the problem is only interesting from a timeless decision theory perspective (I feel that you might have made this more explicit in your post).

According to my sketchy understanding of TDT, if in a prisoner's dilemma both parties can see the other's source code, or oth... (read more)

[This comment is no longer endorsed by its author]Reply
[-][anonymous]9y 0

They are altruistic, in the sense that they care just as much about far-away aliens that they can't even see as they do about members of their own species.

[This comment is no longer endorsed by its author]Reply

Anyone else worried by this result, or have I made a mistake?

Surely only utilitarians would be concerned by it. Others will just reject the "altruistic" assumption as being terribly unrealistic.

6[anonymous]9yAnyone can reject the assumption as unrealistic. I don't see what utilitarianism has to do with that.
7timtyler9yMany utilitarians claim that they aspire to be to be altruistic - in the sense that they would claim to "care just as much about far-away aliens that they can't even see as they do about members of their own species". Possibly there are others that say they aspire to this too - but it is a pretty odd thing to want. Such selflessness makes little biological sense. It looks like either an attempt at a "niceness" superstimulus (though one that is rather hampered by a lack of plausibility) - or the result of memetic manipulation, probably for the benefit of others. Those are currently my two best guesses for explaining the existence of utilitarianism.
1[anonymous]9yAh, I think my mistake was assuming utilitarianism meant something reasonable along the lines of consequentialism (as opposed to belief in a specific and somewhat simple utility function). I thought I already knew what it meant, you see, so I didn't see the need to click on your link.

My thought is that there's no reason to believe the humans are right over any of the other groups. One person was born with one mind, another with another. There's no reason to pick one mind. As such, I'd pick the compromise, even if we additionally worked out that the other aliens wouldn't try the compromise.