# 2

Previously: Starting Up

So, you want to be rational, huh? You want to be Less Wrong than you were before, hrmmm? First you must pass through the posting titles of a thousand groans. Muhahahahaha!

Let's start with the idea of preference rankings.  If you prefer A to B, well, given the choice between A and B, you'd choose A.

For example, if you face a choice between a random child being tortured to death vs them leading a happy and healthy life, all else being equal and the choice costing you nothing, which do you choose?

This isn't a trick question. If you're a perfectly ordinary human, you presumably prefer the latter to the former.

Therefore you choose it. That's what it means to prefer something. That if you prefer A over B, you'd give up situation B to gain situation A. You want situation A more than you want situation B.

Now, if there're many possibilities, you may ask... "But, what if I prefer B to A, C to B, and A to C?"

The answer, of course, is that you're a bit confused about what you actually prefer. I mean, all that ranking would do is just keep you switching between those, looping around.

And if thinking in terms of resources, the universe or an opponent or whatever could, for a small price, sell each of those to you in sequence, draining you of the resource (time, money, whatever) as you go around the vortex of confused desires.

This, of course, translates more precisely into a sequence of states, Ai, Bi, Ci, and preferences of the form A0 < B1 < C2 < A3 < B4 ... where each one of those is the same as the original name except you also have a drop less of the relevant resource as you did before. ie, indicating a willingness to pay the price. If the sequence keeps going all the way, then you'll be drained, and that's a rather inefficient way of going about it if you just want to give the relevant resource up, no? ;)

Still, a strict loop, A > B, B > C, C > A really is an indication that you just don't know what you want. I'll just dismiss that at this point as "not really what I'd call preferences" as such.

Note, however, that it's perfectly okay to have some states of reality, histories of the entire universe, whatever, such that A, B, and C are all ranked equally in your preferences.

If you, however, say something like "I don't prefer A less than B, nor more than B, nor equally to B", I'm just going to give you a very stern look until you realize you're rather confused. (note, ranking two things equally doesn't mean you are incapable of distinguishing them. Also, what you want may be a function of multiple variables that may end up translate to something like "in this instance I want X, though in that other instance I would have wanted Y." This is perfectly acceptable as long as the overal ranking properties (and other rules) are being followed. That is, as long as you're Not Being Stupid.)

Let's suppose there're two states A and B that for you fall under this relative preference nonpreference zone. Let's further suppose that somehow the universe ends up presenting you with a situation in which you have to choose between them.

What do you do? When it actually comes down to it, so that your options are "choose A, choose B, or something else does the deciding." (either coin flip, or someone else who's willing to choose between them, or basically some something other than you.)

If you can say "if pressed, I'd have to choose... A", then in the end, you have ranked one above the other. If you choose option 3, then basically you're saying "I know it's going to be one or the other, but I don't want to be the one making that choice." Which could be interpreted as either indifferent or at least _sufficiently_ indifferent that the (emotional or whatever) cost to you of you yourself directly making that choice is much greater.

At that point, if you say to me "nope, I still neither prefer A to B, prefer B to A, nor am indifferent to the choice. It's simply not meaningful for my preferences to state any relative ranking, even equal", well, I would be at that point rather confused as to what it is that you even meant by that statement. If in the above situation you would actually choose one of A or B, then clearly you have a relative ranking for them. If you went by the third option, and state that you're not indifferent to them, but prefer neither to the other, well, I honestly don't know what you would mean then. It at least seems to me at this point that such thought would be more a confusion than anything else. Or, at least, that at that point it isn't even what I think I or most other people mean by "preferences." So I'm just going to declare this as the "Hey everyone, look, here's the weakest point I think I can find so far, even though it doesn't seem like all that weak a weak point to me."

So, for now, I'm going to move on and assume that preferences will be of the form A < B < C < D, E, F, G < H, I < J < K (assuming all states are comparable, "Don't Be Stupid" does actually seem to imply rejection of cycles.)

For convinience, let's introduce a notion of numerically representing these rankings. The rule simply is this: If you rank two things the same, assign them the same real number. If you rank something B higher than A, then assign B a higher number than A. (Why real numbers? Well, we've got an ordering here. Complex numbers aren't going to be helping at all, so real numbers are perhaps the most general useful way of doing this.)

For any particular preference ranking, there's obviously many valid ways of numerically representing it given the above rules. Further, one can always use a strictly increasing function to translate between any of those. And there will be an inverse, so you can translate back to your prefered encoding.

(A strictly increasing function is, well, exactly what it sounds like. If x > y, f(x) > f(y). Try to visualize this. It never changes direction, never doubles back on itself. So there's always an inverse, for every output, there's always a unique input. So later on. when I start focusing on indexings of the preferences that has specific mathematical properties, no generality is lost. One can always translate into another numerical coding for the preferences, and then back again.)

A few words of warning though: While this preference ranking thing is the ideal, any simple rule for generating the ranking is not going to reproduce your preferences, your morality, your desires. Your preferences are complex. Best to instead figure out what you want in specific cases. In conflicting decisions, query yourself, see which deeper principles "seem right", and extrapolate from there. But any simple rule for generating your own One True Preference Ranking is simply going to be wrong. (Don't worry about what a "utility function" is exactly yet. I'll get to that later. For now, all you need to know is that it's one of those numerical encodings of preferences that has certain useful mathematical properties.)

(EDIT: added in the example of how lack of having a single ranking for all preferences can lead to Being Stupid)

(EDIT2: (4/29/2009) okay, so I was wrong thinking that I've shown "don't be stupid" (in the sense used in this sequence) prohibits uncomparable states. (That is, preference functions that can, when input two states, output "invalid pair" rather than ">" "<" or "=". I've removed that argument and replaced it with a discussion that I think gets more to the heart of that matter.))

# 2

New Comment

I don't think you've shown very convincingly that it's always wrong to have two states that you're simply unable to compare with one another. The notion isn't inherently incoherent (as you can see, e.g., from the fact that there are mathematical structures that work that way, such as Conway-style combinatorial games) and it needn't lead to horrors like your two separate and incompatible series. In any case, your argument about those series is itself confused; if you know that you can do A1 -> B2 and then B2 -> A3, then you know that you can do A1 -> A3, and you will certainly do that. The fact that the transition happens via B2 is just an implementation detail, and there's no point pretending that you can't see past it. If you don't know about B2 -> A3 when you have to choose whether to do A1 -> B2, though, your problem is just ignorance, and there's nothing irrational about sometimes reaching suboptimal decisions when you don't have all the relevant information.

Actually, the confused-with relation between Conway games basically means "there are C,D such that A+C>B+C but A+D<B+D", which rather suggests that some structure along those lines might be appropriate for modelling preferences over incomplete states. Realistically, of course, all our preferences are over incomplete states; we have very limited information and very limited minds. Which is one reason why it seems excessive to me to make claims about our preferences that implicitly assume that we're working with complete states-of-the-universe all the time.

And, speaking of Conway games, the closely related Conway numbers ("surreal numbers", as they are usually called) show that even if your preferences are totally ordered it's not obvious that they can be embedded into the real numbers. Of course, if you take advantage of the finiteness of your brain to point out that you only have finitely many possible preferences then all is well -- but then you lose in a different way, because if you take that into account then you also have to bid farewell to all hope of a total ordering over all states.

The point of all this quibbling is simply this: if you are going to claim that a particular way of thinking is rationally mandatory then you need to either deal with all the little holes or acknowledge them. For my part, I don't think we yet have a firm enough theoretical foundation to claim that a rational agent's preferences (or, anticipating a bit, credences) must be representable by real numbers.

(Also, typo alert: there's a missing word somewhere after "the universe or an opponent or whatever could"; and you have "rankled" for "ranked" a few paragraphs after that.)

Replacing A with 'coffee' and B with 'tea' may be useful, here. It seems reasonable to me to not know offhand whether you prefer coffee or tea - I suspect most people have never thought about that directly - but most people would still know that they'd prefer (for example) an espresso from Starbucks (A1) to a cup of Earl Grey (B1), and either of those to a cup of coffee from the local diner where the coffee always tastes like soap (A2).

And, speaking of Conway games, the closely related Conway numbers ("surreal numbers", as they are usually called) show that even if your preferences are totally ordered it's not obvious that they can be embedded into the real numbers.

If you have lexically ordered 'orders' of utility, only the highest order will ever affect your actions in non-toy situations, and you might as well use reals.

I think that's debatable. For instance, consider Eliezer's "torture versus dust specks" question from way back on OB. (In case you weren't reading OB then or have forgotten: let N be a vastly unimaginably huge number (Eliezer chose 3^^^3 in Knuth arrow notation), and ask: which is worse, for one person to be tortured horribly for 50 years or for N people each to get a small speck of dust in their eye, just enough to be marginally annoying, but to suffer no longer-term consequences from it? I claim that having separate "orders" of utility is at most as irrational as choosing SPECKS rather than TORTURE, and that it's at least arguable that SPECKS is a defensible answer.

I claim that having separate "orders" of utility is at most as irrational...

The point isn't about the (ir)rationality of separate "orders" of utility. It's a "without loss of generality" argument. Preferences not found at the highest order are effectively irrelevant, so you don't lose any expressive power by restricting yourself to the reals.

Er, sorry, I was unclear. (I wrote unclearly because I wasn't thinking clearly enough. It's annoying how that happens.) So, the point I was trying to make but didn't actually get around to writing down because I forgot about it while writing down what I did :-) is that those people for whom dust specks and torture are incommensurable -- which I think they have to be, to prefer 3^^^3.SPECK to 1.TORTURE -- don't, so far as I can tell, generally spend their entire lives estimating how many people are going to get tortured-or-worse on account of their actions, neither do they entirely ignore minor inconveniences; so it doesn't seem to be the case that having that sort of utility function implies ignoring everything but the highest order.

[EDITED above, about a day after posting, to fix a formatting glitch that I hadn't noticed before.]

Arguably it would do if those people were perfectly consistent -- one of the more convincing arguments for preferring TORTURE to SPECKS consists of exhibiting a series of steps between SPECK and TORTURE of length, say, at most 100 in which no step appears to involve a worse than, say, 100:1 difference in badness, so maybe preferring TORTURE to SPECKS almost always involves intransitivity or something like that. And maybe some similar charge could be brought against anyone who has separate "orders" but still gives any consideration to the lower ones. Hence my remark that the one doesn't seem more irrational than the other.

For my part, I don't think we yet have a firm enough theoretical foundation to claim that a rational agent's preferences (or, anticipating a bit, credences) must be representable by real numbers.

I think this is correct. On the other hand, I'm happy take real numbers as a point of departure just to see what we can get.

Actually, imagine for instance you have a set of preferences A1 < A2 < A3 < A4 ... and B1 < B2 < B3 < B4 ... such that your opinion with regards to any A compared with any B is like the above confusion

If this is meant to be a kind of introductory piece for decision theory, I don't think it'll work for most people. I'm a programmer (well, I know how to and used to do it for money but not currently), and my eyes start to roll into the back of my head when I read a sentence like the one above and I am not convinced it's important. It seems to me most of the comments are from people who have already thought about preference rankings and are using this to refine there ideas/check yours. I doubt people who don't already know this stuff (and therefor why it's important) will take the time to understand sentences like the one above. To work, it needs more qualitative generalizing statements (less A>B>C>D which is a pain to look at) and more examples (the first example of saving a kid is so obvious it makes one tend to think, "preference ranking is easy" which doesn't motivate someone not otherwise motivated to do the hard work of getting through this.)

You would neither exchange any A for any B, nor vice versa. Then, let's say you knew there was a situation that would allow you to give up A1 to get B2, and you knew that if you did that, you could give up B2 to get A3. Then the refusal to sort your preferences together makes you lose out on objectively climbing up your preferences which are sorted!

I don't understand this argument. The availability of the option of exchanging B2 for A3 changes things. For example, A1 = "get ten dollars", B2 = "horrible torture", A3 = "get twenty dollars". I'd just do the two exchanges, winning twenty dollars without torture. Does this mean torture is better than ten dollars?

So please provide a better argument why every pair of events must be commensurable.

I agree, this point is confused. What are the items that are being compared? Psy-Kosh: try to come up with a specific example, without As and Bs.

Actually this becomes very problematic when you include the cost of determining the ranking of your preferences into the states of the universe on which you are estimating the preferences. This makes the general problem uncomputable.

So clearly we're talking about a degraded version of this preference ranking. But as always we need to be careful when we prove properties on an abstract unrealizable model and to validate them on the degraded potentially viable version.

I suggest instead that any deeply viable theory of values requires explicitly taking into account the limits of computation and in a related sense, communication.

On the other hand, your goal seems to me to be a first approximation theory of values (of which many could be constructed). If that is true, I encourage you to be clear and consistent about the limits of your theory. After all, applying a solution in the wrong situation is also a way to "Be Stupid".

Well, yeah. This sequence is more to establish what the actual ideal is that should be approximated, rather than "what's the best approximation given computational limits" (which is a real question. I don't deny that.)

As I mentioned in another comment, I originally was considering naming this sequence "How Not to be Stupid (given unbounded computational resources)" And actually, the name was more due to the nature of primary argument that the math will be built on. (ie, "don't automatically lose")

I probably should have stated that explicitly somewhere. If nothing else, think of it also as separation of levels. This is describing what an agent should do, rather than what it should think about when it's doing the doing. It's more "whatever computations it does, the net outcome should be this behavior"

I'm not sure what you mean by "a first approximation theory of values". My goal is basically to give a step by step construction of Bayesian decision theory and epistemic probabilities. ie, an argument for "why this math is the Right Way", not "these are the things that it is moral to value"

Well, uncomputable is a whole lot worse than approximating due to computational limits. You can't even generally solve the problem with unbounded resources, the best you can hope for is in some narrow domains of application.

If you make it much broader, you are guaranteed halting problem type situations where you won't be able to consistently assign preferences.

Theory of values (i.e. model of how people do or should value or decide things) very interchangable with what have you. First approximation meaning that I know you're keeping the math simple, so I don't expect you to delve too deeply into exactly how you should value or prefer things in order to achieve goals more effectively, including details like human cognition limits, faulty information, communication, faulty goals, and the list goes on.

I just feel that talking about Bayesian decision theory as the right way is reasonable, but it's more the beginning of the story, rather than the end.

I have to say I think this post would be better if it were turned into an annotated bibliography for rationality and I guess considering the post focusing on decision theory.

If you, however, say something like "I don't prefer A less than B, nor more than B, nor equally to B", I'm just going to give you a very stern look until you realize you're rather confused.

The simplest way to show the confusion here is just to ask: if those are your only two options (or both obviously dominate all other options – say, Omega will blow up the world if you don't pick one, though it might be best to avoid invoking Omega when you can avoid it), how do you choose?

What is invalid with answering? "By performing further computation and evidence gathering."

And if Omega doesn't give that option, then that significantly changes the state of the world, and hence your priority function - including the priority of assigning the relative priority between A and B.

As I said on the top level post, you can't treat this priority assignment as non-self-referential.

Edited to add: You should not call people confused because they don't want to cache thoughts for which they do not as of yet know the answer.

Yes, the question only applies to final, stable preferences – but AFAIK not everyone agrees that final, stable preferences should be totally ordered.

How do you conclude that a preference is final and stable?

That seems an extremely strong statement to be making about the inner workings of your own mind.

I don't believe any of my own preferences are final and stable. The intent is to characterize the structure that (I believe) an idealized agent would have / that the output of my morality has / that I aim at (without necessarily ever reaching).

for a small price, [sell you] each of those

Typo.

Whoops, thanks.

Minor quibble:

A strictly increasing function is, well, exactly what it sounds like. If x > y, f(x) > f(y). Try to visualize this. It never changes direction, never doubles back on itself. So there's always an inverse, for every output, there's always a unique input.

Strictly increasing guarantees the function is one-to-one. However, it also needs to be onto to guarantee an inverse exists. Of course, you can restrict the codomain to the image of the function, but anyhow...

Fair enough.

And yeah, given that the functions in question are intended to be "translators" from one way of numerically encoding one's preferences to another, restricting the codomain to the range would be implicit in that, I guess.

But yeah, strictly speaking, you're definitely right.

That's what it means to prefer something. That if you prefer A over B, you'd give up situation B to gain situation A. You want situation A more than you want situation B.

I don't want this to devolve into an argument about precisely how to talk about preferences, but I think this is a more substantive assumption that you are regarding it. If I prefer going to the Italian restaurant to going to the Mexican restaurant, I might still choose the Mexican restaurant over the Italian restaurant, because of the preferences of others.

It seems like you are also glossing over the importance of the possible difference between what I prefer when choosing to what I would have preferred had I chosen differently.

"go to an Italian restaurant with friends" > "go to a Mexican restaurant with friends" > "ditch my friends and go to an Italian restaurant alone"

I agree, however, the definition of preferring A to B that he gave was choosing A over B (and if we don't specify that A and B must be total world-states, then it would turn out that I prefer Mexican to Italian because I chose Mexican over Italian). Psy-Kosh's comment above explains why that isn't what he meant.

Well, I was talking about total states. I guess that was at least one thing that I wasn't being clear on. But the preferences would basically be "universe in which I choose Mexican and my friends want Mexican" vs "universe in which I choose Italian and my friends want Mexican", etc...

Or did I misunderstand your objection?

I was going edit-to-add the following to my rather terse comment above:

These sorts of preference orderings aren't restricted to "pure" preferences such as "ceteris paribus I prefer Italian food to Mexican" -- they apply also to preferences over multivariate states like in the example above, where the variables are {food type, company at dinner}.

But you've pre-empted it with this comment. Still, I liked it enough that I wanted an excuse to post it.

So you preferred posting it to not posting it, and you preferred having an excuse to do so to not having an excuse to do so? ;)

Exactly!

That takes care of the first concern, but not necessarily the second one.

Sorry, I misread. I thought that was just a restating of the original concern. Mind rephrasing it? Thanks.

(note, however, that I'm talking about what "ideal rational agents that don't want to be stupid" do. As I indicated in the warning, trying to actually fully and completely translate a human's entire preferences to this is a highly nontrivial task)

I am thinking more like this: I am a scaredy-cat about roller coasters. So I prefer the tea cups to big thunder mountain rail road. And I maintain that preference after choosing the Tea Cups (I don't regret my decision). However, had I ridden Big Thunder Mountain Rail Road, I would have been able to appreciate that it is awesome, and would have preferred Big Thunder Mountain Rail Road to the Tea Cups.

Since this case seems pretty possible, if the sorts of lessons you are going to draw only apply to hyper-idealized agents who know all their preferences perfectly and whose preferences are stable over time, that is a good thing to note, since the lessons may not apply to those of us with dynamic preference sets.

I dunno, this looks like it's relatively easily resolved, to me. The confusion is that there are three possible outcome-states, not two. If you go on the roller coaster, you may or may not receive an update that lets you appreciate roller coaster rides. If you do receive it, it'll allow you to enjoy that ride and all future ones, but there's no guarantee that you will.

Your most logical course of action would depend on how much you valued that update, and how likely it was that riding TBMRR would provide it.

There are really two cases here. In the first case, you predict prior to going on either ride that your preferences are stable, but you're wrong -- having been coerced to ride BTMRR, you discover that you prefer it. I don't believe this case poses any problems for the normative theory that will follow -- preference orderings can change with new information as long as those changes aren't known in advance.

In the second case, you know that whichever choice you make, you will ex post facto be glad that you made that choice and not the other. Can humans be in this state? Maybe. I'm not sure what to think about this.

Well, a couple things. You can in part interpret that as being an underlying preference to do so, but you seem to have akrasia stopping you from actually choosing what you know you actually want.

Or perhaps you actually would prefer not to go on coasters, and consider the "after the fact" to be the same as "after the fact of taking some addictive drug, you might like it, so you wouldn't want to in the first place"

As far as changing of preferences, you may think of your true preferences as encoded by the underlying algorithm your brain is effectively implementing, the thing that controls how your more visible to yourself preferences change in response to new information, arguments, etc etc etc...

Those underlying underlying preferences are the things that you wouldn't want to change. You wouldn't want to take a pill that makes you into the type of person that enjoys committing genocide or whatever, right? But you can predict in advance that if such a pill existed and you took it, then after it rewrote your preferences, you would retroactively prefer genociding. But since you (I assume) don't want genocides to happen, you wouldn't want to become the type of person that would want them to happen and would try to make them happen.

(skipping one or two minor caveats in this comment, but you get the idea, right?)

But also, humans tend to be slightly (minor understatement here) irrational. I mean, isn't the whole project of LW and OB and so on based on the notion of "they way we are is not the way we wish to be. Let us become more rational"? So if something isn't matching the way people normally behave, well... the problem may be "the way people normally behave"... I believe the usual phrasing is "this is a normative, rather than descriptive theory"

Or did I misunderstand?

For the most part I think that starts to address it. At the same time, on your last point, there is an important difference between "this is how fully idealized rational agents of a certain sort behave" and "this is how you, a non-fully idealized, partially rational agent should behave, to improve your rationality".

Someone in perfect physical condition (not just for humans, but for idealized physical beings) has a different optimal workout plan from me, and we should plan differently for various physical activities, even if this person is the ideal towards which I am aiming.

So if we idealize our bayesian models too much, we open up the question: "How does this idealized agent's behavior relate to how I should behave?" It might be that, were we to design rational agents, it makes sense to use these idealized reasoners as models, but if the goal is personal improvement, we need some way to explain what one might call the Kantian inference from "I am an imperfectly rational being" to "I ought to behave the way such-and-such a perfectly rational being would".

First you must pass through the posting titles of a thousand groans. Muhahahahaha!

I read this, scrolled down the page to vote, and then resumed reading.

I upvoted.

Hee hee, thanks. Though when finished reading, lemme know what you think of the actual content and so on. Thanks. :) (oh, you may want to referesh. I made a slight edit (added a small thing) to the posting since the time you posted this comment.)