Coherent decisions imply consistent utilities

[-]johnswentworth5y1322Review for 2019 Review

Things To Take Away From The Essay

First and foremost: Yudkowsky makes absolutely no mention whatsoever of the VNM utility theorem. This is neither an oversight nor a simplification. The VNM utility theorem is not the primary coherence theorem. It's debatable whether it should be considered a coherence theorem at all.

Far and away the most common mistake when arguing about coherence (at least among a technically-educated audience) is for people who've only heard of VNM to think they know what the debate is about. Looking at the top-voted comments on this essay:

the first links to a post which argues against VNM on the basis that it assumes probabilities and preferences are already in the model
the second argues that two of the VNM axioms are unrealistic

I expect that if these two commenters read the full essay, and think carefully about how the theorems Yudkowsky is discussing differ from VNM, then their objections will look very different.

So what are the primary coherence theorems, and how do they differ from VNM? Yudkowsky mentions the complete class theorem in the post, Savage's theorem comes up in the comments, and there are variations on these two and probably others as well. Rough... (read more)

[-]Rohin Shah5y190

the first links to a post which argues against VNM on the basis that it assumes probabilities and preferences are already in the model

I assume this is my comment + post; I'm not entirely sure what you mean here. Perhaps you mean that I'm not modeling the world as having "external" probabilities that the agent has to handle; I agree that is true, but that is because in the use case I'm imagining (looking at the behavior of an AI system and determining what it is optimizing) you don't get these "external" probabilities.

I expect that if these two commenters read the full essay, and think carefully about how the theorems Yudkowsky is discussing differ from VNM, then their objections will look very different.

I assure you I read this full post (well, the Arbital version of it) and thought carefully about it before making my post; my objections remain. I discussed VNM specifically because that's the best-understood coherence theorem and the one that I see misused in AI alignment most often. (That being said, I don't know the formal statements of other coherence theorems, though I predict with ~98% confidence that any specific theorem you point me to would not change my objection.)

Yes, if ... (read more)

[-]johnswentworth5y260

I assume this is my comment + post

I was referring mainly to Richard's post here. You do seem to understand the issue of assuming (rather than deriving) probabilities.

I discussed VNM specifically because that's the best-understood coherence theorem and the one that I see misused in AI alignment most often.

This I certainly agree with.

I don't know the formal statements of other coherence theorems, though I predict with ~98% confidence that any specific theorem you point me to would not change my objection.

Exactly which objection are you talking about here?

If it's something like "coherence theorems do not say that tool AI is not a thing", that seems true. Even today humans have plenty of useful tools with some amount of information processing in them which are probably not usefully model-able as expected utility maximizers.

But then you also make claims like "all behavior can be rationalized as EU maximization", which is wildly misleading. Given a system, the coherence theorems map a notion of resources/efficiency/outcomes to a notion of EU maximization. Sure, we can model any system as an EU maximizer this way, but only if we use a trivial/uninteresting notion of resources/efficiency/o... (read more)

[-]Rohin Shah5y170

Exactly which objection are you talking about here?
If it's something like "coherence theorems do not say that tool AI is not a thing", that seems true.

Yes, I think that is basically the main thing I'm claiming.

But then you also make claims like "all behavior can be rationalized as EU maximization", which is wildly misleading.

I tried to be clear that my argument was "you need more assumptions beyond just coherence arguments on universe-histories; if you have literally no other assumptions then all behavior can be rationalized as EU maximization". I think the phrase "all behavior can be rationalized as EU maximization" or something like it was basically necessary to get across the argument that I was making. I agree that taken in isolation it is misleading; I don't really see what I could have done differently to prevent there from being something that in isolation was misleading, while still being able to point out the-thing-that-I-believe-is-fallacious. Nuance is hard.

(Also, it should be noted that you are not in the intended audience for that post; I expect that to you the point feels obvious enough so as not to be worth stating, and so overall it feels like I'm just being mislead... (read more)

[-]johnswentworth5y120

I somewhat expect your response will be "why would anyone be applying coherence arguments in such a ridiculously abstract way rather than studying a concrete system", to which I would say that you are not in the intended audience.

Ok, this is a fair answer. I think you and I, at least, are basically aligned here.

I do think a lot of people took away from your post something like "all behavior can be rationalized as EU maximization", and in particular I think a lot of people walked away with the impression that usefully applying coherence arguments to systems in our particular universe is much more rare/difficult than it actually is. But I can't fault you much for some of your readers not paying sufficiently close attention, especially when my review at the top of this thread is largely me complaining about how people missed nuances in this post.

2Ben Pace5y

(Once again, great use of that link)

4ESRogs5y

Assuming that one accepts the arguments against coherence arguments being important for alignment (as I tentatively do), I don't see why that means this shouldn't be included in the Alignment section. The motivation for this post was its relevance to alignment. People think about it in the context of alignment. If subsequent arguments indicate that it's misguided, I don't see why that means it shouldn't be considered (from a historical perspective) to have been in the alignment stream of work (along with the arguments against it). (Though, I suppose if there's another category that seems like a more exact match, that seems like a fine reason to put it in that section rather than the Alignment section.) Does that make sense? Is your concern that people will see this in the Alignment section, and not see the arguments against the connection, and continue to be misled?

[-]johnswentworth5y151

I actually think it shouldn't be in the alignment section, though for different reasons than Rohin. There's lots of things which can be applied to AI, but are a lot more general, and I think it's usually better to separate the "here's the general idea" presentation from the "here's how it applies to AI" presentation. That way, people working on other interesting things can come along and notice the idea and try to apply it in their own area rather than getting scared off by the label.

For instance, I think there's probably gains to be had from applying coherence theorems to biological systems. I would love it if some rationalist biologist came along, read Yudkowsky's post, and said "wait a minute, cells need to make efficient use of energy/limited molecules/etc, can I apply that?". That sort of thing becomes less likely if this sort of post is hiding in "the alignment section".

Zooming out further... today, alignment is the only technical research area with a lot of discussion on LW, and I think it would be a near-pareto improvement if more such fields were drawn in. Taking things which are alignment-relevant-but-not-just-alignment and lumping them all under the alignment heading makes that less likely.

2ESRogs5y

That makes a lot of sense to me. Good points!

4Rohin Shah5y

It seems weird to include a post in the book if we believe that it is misguided, just because people historically believed it. If I were making this book, I would not include such posts; I'd want an "LW Review" to focus on things that are true and useful, rather than historically interesting. That being said, I haven't thought much about the goals of the book, and if we want to include posts for the sake of history, then sure, include the post. That was just not my impression about the goal. I would have this concern, yes, but I'm happy to defer (in the sense of "not pushing", rather than the sense of "adopting their beliefs as my own") to the opinions of the people who have thought way more than me about the purpose of this review and the book, and have caused it to happen. If they are interested in including historically important essays that we now think are misguided, I wouldn't object. I predict that they would prefer not to include such essays but of course I could be wrong about that.

4Vaniver5y

I like this comment, but I feel sort of confused about it as a review instead of an elaboration. Yes, coherence theorems are very important, but did people get it from this post? To the extent that comments are evidence, they look like no, the post didn't quite make it clear to them what exactly is going here.

2Ben Pace5y

No need to think about editing at this point, we'll sort out all editing issues after the review. (And for this specific issue, all hyperlinks in the books have been turned into readable footnotes, which works out just fine in the vast majority of cases.)

[-]Rohin Shah7y35-2

Obligatory: Coherence arguments do not imply goal-directed behavior

Also Coherent behaviour in the real world is an incoherent concept

[-]Said Achmiz7y340

(Note: This comment mostly concerns the material in the first three sections of the post. I have not yet read, but only skimmed, the section titled “Probabilities and expected utility”. It seems to cover material I am familiar with, but I will read it in detail when I have more time.)

Eliezer, you speak here of reasons why an agent ought to behave as if its preferences satisfy the transitivity axiom, specifically; you discuss circular preferences, and their unfortunate effects, as the consequences of transitivity violations. You also discuss the independence axiom in the latter half of the post. You have discussed reasons to accept these two axioms before, in the Sequences.

However, the Von Neumann–Morgenstern utility theorem (the most commonly used, as far as I am aware, formalization of decision-theoretic utility) has four axioms; and an agent’s preferences must satisfy all four in order for a utility function to be constructable from them.

It so happens that the two axioms you do not discuss are precisely the two axioms that I (and many economists; see below) find most suspect. The case for transitivity is obvious; the case for independence is not obvious but nonetheless reasonably

... (read more)

9Said Achmiz7y

Incidentally, there are also reasons for hesitation to accept the independence axiom.

2Adele Lopez7y

For continuity, it's reasonable to assume this because all computable functions are continuous. See theorem 4.4 of https://eccc.weizmann.ac.il/resources/pdf/ica.pdf Edit: I realized that the continuity assumption is different (though related) from assuming the utility function is continuous. My guess is that computability is still a good justification for this, but I'd have to check that that actually follows.

4Chris_Leong7y

"Because all computable functions are continuous" - how does this make any sense? Why can't I just pick a value x=1 and if it's left limit and right limit are p, set the function to p+1 at x=1.

[-]Richard_Kennaway7y120

Because equality of (computable) real numbers is uncomputable. So is calculating the limit of an infinite sequence of them.

In more detail: a computable real number must be represented by a Turing machine (or your favorite Turing-equivalent model) that generates some representation of it as a possibly infinite string. Equality of the infinite output of two Turing machines is uncomputable.

In fact, picking a representation for "computable" real numbers, and implementing basic arithmetic on them is non-trivial. The usual decimal or binary strings of digits won't work.

2Chris_Leong7y

Hmm, I'm still not following. Limits are uncomputable in general, but I just need one computational function where I know the limits at one point and then I can set it to p+1 instead. Why wouldn't this function still be computable? Maybe "computable function" is being defined differently than I would expect.

3Richard_Kennaway7y

To compute that function for an unknown argument x, you would have to determine whether x is equal to 1. But if real numbers are encoded as infinite strings, there is no way to tell whether x=1 in finite time. If x happens to be 1, then however long an initial segment of that representation you examined, you could never be sure that x was not very slightly different from 1. In the usual decimal representation, if you see 1.00000.... if the number is greater than 1 you will eventually know that, but if the zeroes go on forever, you can never know that. Similarly if you see 0.99999..... I'm not sure how relevant this is to the original context, see other replies to Adele Lopez's ancestor comment.

6Chris_Leong7y

Okay, so there is an additional assumption that these strings are all encoded as infinite sequences. Instead, they could be encoded with a system that starts by listing the number of digits or -1 if the sequence if infinite, then provide those digits. That's a pretty key property to not mention (then again, I can't criticise too much as I was too lazy to read the PDF). Thanks for the explanation!

3Said Achmiz7y

This seems to be a non sequitur. Suppose it were true that preferences that violate the continuity axiom imply a utility function that is uncomputable. (This hardly seems worse or less convenient than the case—which is, in fact, the actual state of affairs—where continuity violations imply that no utility function can represent one’s preferences, computable or otherwise… but let’s set that aside, for now.) How would this constitute a reason to have one’s preferences conform to the continuity axiom…?

1Adele Lopez7y

Presumably, any agent which we manage to build will be computable. So to the extent our agent is using utility functions, they will be continuous. If an agent is only capable of computable observations, but has a discontinuous utility function, then if the universe is in a state where the utility function is discontinuous, the agent will need to spend an infinite amount of time (or as long as the universe state remains at such a point) determining the utility of the current state. I think it might be possible to use this to create a more concrete exploit.

1Said Achmiz7y

There are several objections one could make to this line of reasoning. Here are two. First: do you believe that we, humans are uncomputable? If we are uncomputable, then it is clearly possible to construct an uncomputable agent. If, conversely, we are computable, then whatever reasoning you apply to an agent we build can be applied to us as well. Do you think it does apply to us? Second: supposing your reasoning holds, why should it not be a reason for our constructed agent not to use utility functions, rather than a reason for said agent to have continuous preferences? (This is a good time to mention, again, that this entire tangent is moot, as violating the continuity axiom—or any of the axioms—means that no utility function, computable or not, can be constructed from your preferences. But even if that weren’t the case, the above objections apply.)

1MikkW6y

As for completeness, I struggle to see any practical difference between being unwilling to choose between two outcomes, and finding them equally acceptable (which is allowed by completeness). Or, one can imagine someone relentlessly flopping between two highly (un)desirable outcomes because they are unwilling to settle, and I think it's obvious what the problem there is.

2Said Achmiz6y

Have you read the papers I linked (or the more directly relevant papers cited by those)? What do you think about Aumann’s commentary on this question, for instance?

0Sherrinford7y

While I don't find completeness so problematic, I got quite confused by Eliezer's post. Firstly, it would make much more sense to first explain what "utility" is, in the sense that it is used here. Secondly, the justification of transitivity is common, but using a word like "dominated strategy" there does not make much sense, because you can only evaluate strategies if you know the utility functions (and it also mixes up words). Thirdly, it's necessary to discuss all axioms and their implications. For example, in standard preferences theory under certainty, it's possible to have preferences that are complete and transitive but you cannot get a utility function from. Fourthly, I am still confused whether this talk about expected utility is only normative or also a positive description of humans, or kinda both.

4Said Achmiz7y

He is referring to decision-theoretic utility, in the sense in which the term is used in economics and game theory. Such (“lexicographic”) preferences violate the continuity axiom. Eliezer is definitely speaking normatively; none of the VNM axioms reliably apply to humans in a descriptive sense. Eliezer is concerned with the design of artificial agents, for which task it is necessary to determine what axioms their preferences ought to conform to (among other things).

[-]Richard_Kennaway7y160

Meta: I'm unclear about the context in which this post is to be read, and its purpose. Googling phrases from below the fold tells me that it appears both here and on Arbital, although there is no indication here or there that this is a cross-post, and Arbital posts are not routinely posted here. It includes a link to a blog post of 2018, so appears to be of recent composition, but reads like a posting from the Sequences now long in the past, and I am not sure it contains any ideas not present there. It begins in medias res ("So, we're talking..."), yet does not refer back to any of the implied predecessors.

I notice that I am confused.

[-]Said Achmiz7y150

Huh, you’re right: this is just a re-post of an Arbital article.

I must say I feel rather cheated. When I saw this, I was under the impression that Eliezer had composed this post for Less Wrong, and had posted it to Less Wrong; I assumed that there was therefore some chance he might respond to comments. But that seems not to be the case. (Is it even Eliezer who posted it? Or someone else using his account, as happened, IIRC, with Inadequate Equilibria?)

I, too, would like to know what the purpose of this post is.

[-]Rob Bensinger7y150

I asked Eliezer if it made sense to cross-post this from Arbital, and did the cross-posting when he approved. I'm sorry it wasn't clear that this was a cross-post! I intended to make this clearer, but my idea was bad (putting the information on the sequence page) and I also implemented it wrong (the sequence didn't previously display on the top of this post).

This post was originally written as a nontechnical introduction to expected utility theory and coherence arguments. Although it begins in media res stylistically, it doesn't have any prereqs or context beyond "this is part of a collection of introductory resources covering a wide variety of technical and semitechnical topics."

Per the first sentence, the main purpose is for this to be a linkable resource for conversations/inquiry about human rationality and conversations/inquiry about AGI:

So we're talking about how to make good decisions, or the idea of 'bounded rationality', or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of 'expected utility' or 'utility functions'. And before we even ask what tho

... (read more)

[-]Said Achmiz7y120

I see, thanks. That does explain things.

Some questions occur to me, which I don’t expect you necessarily to answer at once, but hope you (and/or whoever is responsible for the Arbital content or the decisions to post it to LW) will consider:

In your opinion, does this post (still? ever?) work well as a “linkable resource for conversations about human rationality and … AGI”?
Are there plans (by Eliezer, or by anyone else) to revise this content? Or is it meant to stand unchanged, as a matter of “historical interest” only, so to speak?
Relatedly to #2, is it productive to engage with this post, by commenting, discussing, critiquing? (In any sense other than “it’s fun and/or personally edifying to do so”?) That is: is there anyone “on the other end”, so to speak, who might read (and possibly even participate in) such discussions, and take action (such as writing an updated version of this material, to pick a simple example) as a result?
For whom is this post intended, and by whom? Whose purposes does it serve, whom is it meant to benefit, and who may reasonably judge whether it is serving its purpose?

[-]Benquo7y110

Presumably to keep morale up by making it look like the rightful Caliph is still alive and producing output.

[-]Raemon7y110

I believe the intention was for this post to appear as part of a sequence that more clearly situated it as part of a series of re-posts from Arbital, but there were some mix-ups that made the sequence title not show up by default. I agree the current implementation is confusing.

[-]habryka6y150

Promoted to curated: This is a pretty key post that makes an argument that I think has been implicit in a lot of things on LessWrong for a long time, but hasn't actually been made this explicitly.

I do actually think that in the act of making it explicit, I've started to agree with some of the commenters that there is something missing in this argument (in particular as Said pointed out the treatment of the completeness axiom). It's not necessarily the case that I disagree with the conclusion, but I still think covering those arguments is something I would want someone to spend serious time on.

However, overall I still think this post does an exceptionally well job at introducing utility functions as a core abstraction in rationality, and expect it to be something I reference for a long time to come.

[-]johnswentworth5y130Nomination for 2019 Review

I don't particularly like dragging out the old coherence discussions, but the annual review is partly about building common knowledge, so it's the right time to bring it up.

This currently seems to be the canonical reference post on the subject. On the one hand, I think there are major problems/missing pieces with it. On the other hand, looking at the top "objection"-style comment (i.e. Said's), it's clear that the commenter didn't even finish reading the post and doesn't understand the pieces involved. I think this is pretty typical among people who object to coherence results: most of them have only dealt with the VNM theorem, and correctly complain about the assumptions of that theorem being too strong, but don't know about the existence of all the other coherence theorems (including the complete class theorem mentioned in the post, and Savage's theorem mentioned in the comments). The "real" coherence theorems do have problems with them, but they're not the problems which a lot of people point to in VNM.

I'll leave a more detailed review later. The point of this nomination is to build common knowledge: I'd like to get to the point where the objections to coherence theorems are the right objections, rather than objections based in ignorance, and this post (and reviews of it) seem like a good place for that.

[-]Sniffnoy6y110

So this post is basically just collecting together a bunch of things you previously wrote in the Sequences, but I guess it's useful to have them collected together.

I must, however, take objection to one part. The proper non-circular foundation you want for probability and utility is not the complete class theorem, but rather Savage's theorem, which I previously wrote about on this website. It's not short, but I don't think it's too inaccessible.

Note, in particular, that Savage's theorem does not start with any assumption baked in that R is the correct system of numbers to use for probabilities[0], instead deriving that as a conclusion. The complete class theorem, by contrast, has real numbers in the assumptions.

In fact -- and it's possible I'm misunderstanding -- but it's not even clear to me that the complete class theorem does what you claim it does, at all. It seems to assume probability at the outset, and therefore cannot provide a grounding for probability. Unlike Savage's theorem, which does. Again, it's possible I'm misunderstanding, but that sure seems to be the case.

Now this has come up here before (I'm basically in this comment just restating things I've previously

... (read more)

[-]Zvi5y90Review for 2019 Review

The problem with evaluating a post like this is that long post is long and slow and methodical, and making points that I (and I'm guessing most others who are doing the review process) already knew even at the time it was written in 2017. So it's hard to know whether the post 'works' at doing the thing it is trying to do, and also hard to know whether it is an efficient means of transmitting that information.

Why can't the post be much shorter and still get its point across? Would it perhaps even get the point across better if it was much shorter, bec... (read more)

[-]Said Achmiz7y90

Meta: (some of?) the linked Arbital pages do not seem to work. For example, https://arbital.com/p/probability_theory/ shows me a blank page:

Arbital blank page, titled “Error”

(There was also some sort of red box with something about a “pipeline error” or something, but it disappeared.)

I am using Chrome 74.0.3729.131 (the latest as of this writing) on a Mac.

[-]Rob Bensinger7y110

Arbital has been getting increasingly slow and unresponsive. The LW team is looking for fixes or work-arounds, but they aren't familiar with the Arbital codebase. In the meantime, I've been helping cross-post some content from Arbital to LW so it's available at all.

[-]Said Achmiz7y100

Is it possible to create, and make available, a dump of the Arbital content? I’ve no doubt that there are people who’d be willing to host the entire thing, or convert it en masse into another format, etc.

Edit: Actually, if you could just post a complete list of Arbital page names, I could extract the content myself, as the API to request page content seems sufficiently straightforward.

5Rob Bensinger7y

We'd talked about getting a dump out as well, and your plan sounds great to me! The LW team should get back to you with a list at some point (unless they think of a better idea).

6jimrandomh7y

While we have a long-term plan of importing Arbital's content into LessWrong (after LessWrong acquires some wiki-like features to make it make sense), we have not taken responsibility for the maintenance of Arbital itself.

5Rob Bensinger7y

I assume you mean 'no one has this responsibility for Arbital anymore', and not that there's someone else who has this responsibility.

2Ruby7y

A week or so ago Arbital was working but had load times of several minutes.

[-]fdrocha7y80

I find it confusing that the only thing that matters to a rational agent is the expectation of utility, i.e., that the details of the probability distribution of utilities do not matter.

I understand that VNM theorem proves that from what seem reasonable axioms, but on the other hand it seems to me that there is nothing irrational about having different risk preferences. Consider the following two scenarios

A: you gain utility 1 with probability 1
B: you gain utility 0 with probability 1/2 or utility 2 with probability 1/2

According to expected utility, it is... (read more)

7TheMajor7y

This is part of the meaning of 'utility'. In real life we often have risk-averse strategies where, for example, 100% chance at 100 dollars is preferred to 50% chance of losing 100 dollars and 50% chance of gaining 350 dollars. But, under the assumption that our risk-averse tendencies satisfy the coherence properties from the post, this simply means that our utility is not linear in dollars. As far as I know this captures most of the situations where risk-aversion comes into play: often you simply cannot tolerate extremely negative outliers, meaning that your expected utility is mostly dominated by some large negative terms, and the best possible action is to minimize the probability that these outcomes occur. Also there is the following: consider the case where you are repeatedly offered bets of the example you give (B versus C). You know this in advance, and are allowed to redesign your decision theory from scratch (but you cannot change the definition of 'utility' or the bets being offered). What criteria would you use to determine if B is preferable to C? The law of large numbers(/central limit theorem) states that in the long run with probability 1 the option with higher expected value will give you more utilons, and in fact that this number is the only number you need to figure out which option is the better pick in the long run. The tricky bit is the question whether this also applies to one-shot problems or not. Maybe there are rational strategies that use, say, the aggregate median instead of the expected value, which has the same limit behaviour. My intuition is that this clashes with what we mean with 'probability' - even if this particular problem is a one-off, at least our strategy should generalise to all situations where we talk about probability 1/2, and then the law of large numbers applies again. I also suspect that any agent that uses more information to make this decision than the expected value to decide (in particular, occasionally deliberatel

1fdrocha7y

This is the crux. It seems to me that the expected utility frame work means that if you prefer A to B in one time choice, then you must also prefer n repetitions of A to n repetitions of B, because the fact that you have larger variance for n=1 does not matter. This seems intuitively wrong to me.

0Pattern7y

I'd hold that it's the reverse that seems more questionable. If n is a large number then the Law of Large Numbers may be applicable ("the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.").

4dxu7y

You may be interested in reading this series of posts.

3Said Achmiz7y

Robyn Dawes makes a more detailed version of precisely this argument in Rational Choice in an Uncertain World. I summarize his argument in an old comment of mine. (The axiom you must reject, incidentally, if you find this sort of reasoning convincing, is the independence axiom.)

1fdrocha7y

Thanks, I looked at the discussion you linked with interest. I think I understand my confusion a little better, but I am still confused. I can walk through the proof of the VNM theorem and see where the independence axiom comes in and how it leads to u(A)=u(B) in my example. The axiom of independence itself feels unassailable to me and I am not quite sure this is a strong enough argument against it. Maybe having a more direct argument from axiom of independence to unintuitive result would be more convincing. Maybe the answer is to read Dawes book, thanks for the reference.

3Said Achmiz7y

Well, the axiom of independence is just that: an axiom. It doesn’t need to be assailed; we can take it as axiomatic, or not. If we do take it as axiomatic, certain interesting analyses become possible (depending on what other axioms we adopt). If we refuse to do so, then bad things happen—or so it’s claimed. In any case, Dawes’ argument (and related ones) about the independence axiom fundamentally concerns the question of what properties of an outcome distribution we should concern ourselves with. (Here “outcome distribution” can refer to a probability distribution, or to some set of outcomes, distributed across time, space, individuals, etc., that is generated by some policy, which we may perhaps view as the output of a generator with some probability distribution.) A VNM-compliant agent behaves as if it is maximizing the expectation of the utility of its outcome distribution. It is not concerned at all with other properties of that distribution, such as dispersion (i.e., standard deviation or some related measure) or skewness. (Or, to put it another way, a VNM-compliant agent is unconcerned with the form of the outcome distribution.) What Dawes is saying is simply that, contra the assumptions of VNM-rationality, there seems to be ample reason to concern ourselves with, for instance, the skewness of the outcome distribution, and not just its expectation. But if we do prefer one outcome distribution to another, where the dis-preferred distribution has a higher expectation (but a “better” skewness), then we violate the independence axiom.

3fdrocha7y

I get what you are saying. You have convinced me that the following two statements are contradictory: * Axiom of Independence: preferring A to B implies preferring ApC to BpC for any p and C. * The variance and higher moments of utility matter, not just the expected value. My confusion is that it intuitively it seems both must be true for a rational agent but I guess my intuition is just wrong. Thanks for your comments, they were very illuminating.

2Slider7y

I think you are not allowed to refer explicitly to utility in the options. That is an option of "I do not choose this option" is selfdefeating and illformed. In another post I posited a risk-averse utility function that references amount of paperclips. Maximising the utility function doesn't maximise expected amount of paperclips. Even if the physical objects of interest are paperclips and we value them linearly a paperclip is not synonymous with utilon. It's not a thing you can give out in an option.

2fdrocha7y

I was going to answer that I can easily reword my example to not explicitly mention any utility values, but when I tried to that it very quickly led to something where it is obvious that u(A) = u(C). I guess my rewording was basically going through the steps of the proof of VNM theorem. I am still not sure I am convinced by your objection, as I don't think there's anything self-referential in my example, but that did give me some pause.

1Slider7y

In a case where you are going to pick less variance less expected value over more variance more expected value it will mean that option needs to have a bigger "utility number". In order to get that you need to mess with how utility is calculated. Then it becomes ambigious whether the "utility-fruits" are redefined in the same go as we redefine how we compare options. If we name them "paperclips" it's clear that they are not touched by such redefining. It triggerred a "type-unsafety" trigger but the operation overall might be safe as it doesn't actualise the danger. For example having an option of "plum + 2 utility" could give one agent "plum + apple" if it valued apples and "plum + pear" if it valued pears. I guess if you consistenly replace all physical items for their utility values it doesn't happen. In the case of "gain 1 utility with probability 1" if your agent is risk-seeking it might give this option "actual" utility less than 1. In general if we lose the distribution independence we might need to retain the information of our suboutcomes rather than collapsing it to he a single number. For if an agent is risk-seeking it's clear that it would prefer A=( 5% 0,90% 1, 5% 2) to B=(100%, 1). But same risk-seeking in combined lotteries would make it prefer C=(5% , 90% A, 5% A+A) over A. When comparing C and A it's not sufficent to know that their expected utilities are 1.

[-]TAG7y6-1

The food preferrence example is rather self defeating. Most people don't mechanically and predictably choose X over y and z when all are available...they also have preferences for variety, trying new things, impressing people they are with, and and so on. People whose preferences are both predictable and incoherent can be gamed... but that doesnt mean everyone has coherent preferences, because coherent preferences need to be defined against a limited framework (without randomness or meta preferences).. and because having messy, unpredictable preferences pr

... (read more)

[-]Slider7y60

The "damage" from shooting your own foot is defined in the terms of the utility-number.

Say I pick a dominated strategy that nets me 2 apples and the dominating strategy nets me 3 apples. If on another level of modelling I can know that the first apples are clean and the 2 apples in the dominating arrangement have worms I might be happy to be dominated. Apple-level damage is okay (while nutritional level damage might not be). All deductive results are tautologies but "if you can't model the agent as trying to achieve goal X then it's inefficient at achieving X" seems very far from "incoherent agents are stupid".

5Said Achmiz7y

If some of the apples are clean and others have worms, then that is modeled in your preference ordering: you prefer clean apples to wormy ones, perhaps at some exchange rate, etc. We then stipulate that all the apples are clean (or all are wormy, or all have an equal chance of being clean vs. wormy, etc.), and the analysis proceeds as before. That said, your general point is worth exploring. If we suppose, as Eliezer says, that … and if we further suppose that her preferences are intransitive, then we conclude that Alice’s strategy is strictly dominated by some other. That is—Alice’s strategy is strictly dominated in terms of apples (or fruit in general). It can’t be dominated in utility, of course, because we cannot construct a utility function from Alice’s preferences (on account of their intransitivity)! Well, and so what? Is this bad according to Alice’s own preferences? Can we show this? How would we do that? By asking Alice whether she prefers the outcome (5 apples and 1 orange) to the initial state (8 apples and 1 orange)? But what good is that? If Alice’s preferences are circular, then it’s entirely possible (in fact, it’s true) that the outcome (5 apples and 1 orange) both dominates, and is dominated by, the initial state (8 apples and 1 orange). (More accurately, that’s true if we’re permitted to say that if strategy X dominates Y, and Y dominates Z, then X dominates Z. It’s not possible for an agent to prefer X to Y and, simultaneously, Y to X, however intransitive their preferences are, if they still obey the completeness axiom. Of course, if an agent’s preferences are intransitive and incomplete, then it can prefer X to Y, and also Y to X.) The point is this: it’s not so easy to show that an agent’s strategy is sub-optimal according to its own preferences if those preferences violate the axioms. We can gesture at some intuitive considerations like “well, that’s obviously stupid”, but these amount to little more than the fact that we find the viola

6Slider7y

I was thinking of another agent judging my strategies and making a backed argument why I am wrong. If someone said "you were suboptimal on fruit front, I fixed that mistake for you" and I arrive at a table with 2 worm apples, I would be annoyed/pissed. I am assuming that the other agent can't evaluate their cleanness - it's all fruit to them. Moreover it might be that worm apples are rare and observing my trade activity it might be inductively well supported that I seem to value "fruit-maximization" a great deal (nutrition maximisation with clean fruit is just fruit maximisation). And it might be important to understand that he didn't mean to cause wormy apples (he isn't even capable of meaning that) but his actions might have infact caused it. In the case that wormy apples are frequent the hypothesis that I am a fruit-maximiser is violated clearly enough that he knows to be on shaky grounds on modelling me as a fruitmaximiser. For some very unskilled traders they might confuse one type of fruit with another and be inconsistent because they can't get their fruit categories straight. At some midskill "fruitmaximisement" peaks and those that don't understand things beyond that point will confuse those that are yet to get to fruitmaximization and those that are past that. Expecting super-intelligent things to be consistent kind of assumes that if a metric ever becomes a good goal higher levels will never be weaker on that metric, that maximation strictly grows and never decreases with ability for all submetrics.

[-]Vaniver5y40

Incidentally, a handful of things have crossed my path at the same time, such that I think I have a better explanation for the psychology underlying the Allais Paradox. [I'm not sure this will seem new, but something about the standard presentation seems to be not giving it the emphasis it deserves, or speaking generally instead of particularly.]

The traditional explanation is that you're paying for certainty, which has some value (typically hugely overestimated). But I think 'certainty' should really be read as something more like "not being blameworthy." ... (read more)

6Unnamed5y

Sounds like the thing that is typically called "regret aversion".

[-]Zack_M_Davis5y40Nomination for 2019 Review

This is the second nomination in order to get this in the official Review pool, in order for John S. Wentworth's future "more detailed review" to be in the official Review pool.

[-]Chris_Leong7y40

My understanding of the arguments against using a utility maximiser is that proponents accept that this will lead to sub-optimal or dominated outcomes, but they are happy to accept this because they believe that these AIs will be easier to align. This seems like a completely reasonable trade-off to me. For example, imagine that choosing option A is worth 1 utility. Option B is worth 1.1 utility if 100 mathematical statements are all correct, but -1000 otherwise (we are ignoring the costs of reading through and thinking about all 100 mathematical statements... (read more)

[-]Richard_Ngo5y30Review for 2019 Review

It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review.

In an earlier review, johnswentworth argues:

I think instrumental convergence provides a strong argument that...we can use trade-offs with those resources in order to work out implied preferences over everything else, at least for the sorts of "agents" we actually care about (i.e. agents which have significant impact on the world).

I think this... (read more)

4ESRogs5y

If the post is the best articulation of a line of reasoning that has been influential in people's thinking about alignment, then even if there are strong arguments against it, I don't see why that means the post is not significant, at least from a historical perspective. By analogy, I think Searle's Chinese Room argument is wrong and misleading, but I wouldn't argue that it shouldn't be included in a list of important works on philosophy of mind. Would you (assuming you disagreed with it)? If not, what's the difference here? (Put another way, I wouldn't think of the review as a collection of "correct" posts, but rather as a collection of posts that were important contributions to our thinking. To me this certainly qualifies as that.)

4Richard_Ngo5y

Your argument is plausible. On the other hand, this review is for 2019, not 2017 (when this post was written) nor 2013 (when this series of ideas was originally laid out). So it seems like it should reflect our current-ish thinking. I note that the page for the review doesn't have anything about voting criteria. This seems like something of an oversight?

3TAG5y

Context is important. If you publish something without comment or counterpoint, you're hinting that it's to be taken as true.

2DanielFilan5y

It occurs to me that one plausible answer here is that cognition requires computational resources, and therefore effective cognition will generically involve trading off these resources in a way that does not reliably lose them. But my more relevant response is that in that section I don't see Eliezer saying that coherence theorems are the justification for his claim about the anti-naturalness of deference.

2Richard_Ngo5y

If coherence theorems are consistent with deference being "natural", then I'm not sure what argument Eliezer is trying to make in this post, because then couldn't they also be consistent with other deontological cognition being natural, and therefore likely to arise in AGIs? In principle, maybe. In practice, if we'd been trying to predict how monkeys will evolve, what does this claim imply about human-monkey differences?

[-]Tetraspace5y30Nomination for 2019 Review

I have used this post quite a few times as a citation when I want to motivate the use of expected utility theory as an ideal for making decisions, because it explains how it's not just an elegant decisionmaking procedure from nowhere but a mathematical inevitability of the requirements to not leave money on the table or to accept guaranteed losses. I find the concept of coherence theorems a better foundation than the normal way this is explained, by pointing at the von Neumann-Morgensten axioms and saying "they look true".

[-]Dan Tobias7y30

The hypothetical person with circular preferences in where to be reminds me of the hero of The Phantom Tollbooth, Milo, whose own location preferences are described this way: "When he was in school he longed to be out, and when he was out he longed to be in. On the way he thought about coming home, and coming home he thought about going. Wherever he was he wished he were somewhere else, and when he got there he wondered why he'd bothered."

[-]orthonormal7y30

Formatting request: can the footnote numbers be augmented with links that jump to the footnote text? (I presume this worked in Arbital but broke when it was moved here.)

[-]Tyrrell_McAllister6y20

Typo: "And that's why the thingies you multiply probabilities by—the thingies that you use to weight uncertain outcomes in your imagination,"

Here, "probabilities" should be "utilities".

[-]romeostevensit7y20

I have to trade off the cost of following high complexity decision theory against the risk of being dominated*the badness of being dominated.

[-]Chris_Leong7y20

"Is a fleeting emotional sense of certainty over 1 minute, worth automatically discarding the potential $5-million outcome?" - I know it's mostly outside of what is being modelled here, but suspect that someone who takes the 90% bet and wins nothing might experience much more than just a fleeting sense of disappointment, much more than someone who takes the 45% chance and doesn't win.

[-]Samuel Hapák7y20

There is one other explanations for the results of those experiments.

In a real world, it's quite uncommon that somebody tells you exact probabilities—no you need to infer them from the situation around you. And we the people, we pretty much suck at assigning numeric values to probabilities. When I say 99%, it probably means something like 90%. When I say 90%, I'd guess 70% corresponds to that.

But that doesn't mean that people behave irrationally. If you view the proposed scenarios through the described lens, it's more like:

a) Certainty ... (read more)

3Adele Lopez7y

I think you're right that this is part of where the intuition comes from. But it's still irrational in a context where you actually know the probabilities accurately enough.

1Samuel Hapák7y

True, but that’s usually very artificial context. Often when someone claims they know the probabilities accurately enough, they are mistaken or lying.

[-]Kerrigan8mo10

Since humans are not EU maximizers and are exploitable, can someone give an example of how they are exploitable?

[-]Kerrigan8mo10

Although there would be pressure for an AI to not be exploitable, wouldn't there also be pressure for adaptability and dynamism? The ability to alter preferences and goals given new environments?

[-]Pimgd4y11

So the fact that Alice can't be viewed as having any coherent relative value for apples and oranges, corresponds to her ending up with qualitatively less of some category of fruit (without any corresponding gains elsewhere).

It's possible that the fruit has negative value, and that the behavior aims to reduce the total negative value.

The situations:

8a1o, 0a3o, 2a2o, 5a1o.

If apples are minus two and oranges are minus seven then all trades are rational. 8a1o is valued at -23, 0a3o is valued at -21, 2a2o is valued at -18, 5a1o is valued at -17.

LESSWRONG
LW

LESSWRONG
LW

156

Coherent decisions imply consistent utilities

156

156

Things To Take Away From The Essay

Introduction to the introduction: Why expected utility?

Why not circular preferences?

Human lives, mere dollars, and coherent trades

Probabilities and expected utility

Probabilities summing to 1

Dutch book arguments

Conditional probability

The Allais Paradox

Conclusion

Further reading