# Ω 4

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is related to wireheading, utility functions, taste, and rationality. It is a series of puzzles meant to draw attention to certain tensions in notions of rationality and utility functions for embedded agents. I often skip the particulars of embedded situations. I often show multiple sides of an argument without staking out a clear positions myself. Even though I often write a utility function as if it were unique, this is shorthand---it is of course unique up to affine transformation.

Imagine you are in a situation with three outcomes: A, B, and C. You prefer A to C and C to B, and your preference ordering is transitive. Furthermore, assume that your preferences can be represented by a utility function.

I now give you a choice. You can either choose C, or you can choose to make a second decision between A and B. Obviously if this were the only thing going on, you would choose to choose between A and B and then you would choose A. However, there is something else going on. You know that if you chose to reject C and make the second choice then an Angel will reorder your preferences so that your new preference ordering is . Thus, at the second choice, you would presumably choose B. Remember, this was the worst outcome in your original preference ordering.

Now, my question is whether a rational agent should choose to pick C, or whether it should choose to pick to choose between A and B?

Position 1: You should pick C.

An agent is rational if it makes choices that maximize its expected utility. This utility is relative to a utility function (which is grounded in preferences). Thus, what is rational for one agent may be irrational for another if they have different utility functions.

When you are in the decision problem as described above, you are trying you maximize your utility as generated by your preferences. If you choose C, you will definitely get C. If you choose to choose, then you will definitely get B, which is the worst outcome, relative to your utility function. Thus, being rational, you should choose to not choose, and take C.

Consider the following option. I give you two options: I can either murder everyone you love and give you both a pill that makes you no longer care about them and $100 dollars, or I can give you$10. Assume your utility is linear with money. Then, if you are willing to choose to choose in the earlier case, you have no problems with choosing to have your utility function change, and you should be fine with having everyone you love murdered. You would never actually do this. Thus, you should choose C.

Position 2: You should choose to choose.

As a rational agent, you should maximize your utility. Even if your utility function changes, it is still yours. If you choose C you get a middle ranked option, whereas if you choose to choose then you will choose B, which at that time will be the best outcome for you.

Consider this in the case of choosing a snack. Say there are three options: a piece of chocolate, an apple, and a banana. Say you prefer apples to chocolate, and chocolate to bananas. Now, I give you a choice: you can either have a chocolate, or I can let you take a pill which changes your preferences to preferring bananas to chocolate, and chocolate to apples. Then I let you choose between an apple and a banana.

Naturally, you would prefer to take the pill and then choose the banana. This is the same as the abstract case above, thus you should choose to choose. QED.

My own intuition from the example still tells me to pick C. I wonder if the people who would choose to choose are confusing levels of utility functions.

This is apparent when considering the example given by the person holding position 2. They use the example to try to illustrate how one's utility function changes. I take a pill which changes my preferences. I'd reply by saying that you didn't change my utility function, you just changed my taste function (so to speak).

How would this work? Well, presumably I prefer (in a simple case) one snack to another because of the flavour of the snacks. My taste buds respond in a certain way to the chemical compounds in the snacks, and then secrete such and such chemicals into my brain. It is really these chemicals (or whatever subsequent process they trigger or of which they are a part) that I desire. Thus, the pill doesn't change my utility function, but rather my taste function.

How do we represent this? The arguments to my utility function could be outputs from other functions. Imagine that all I care about is how good the food I am eating tastes, and how good the book I am reading is. Let my taste function be where is the food I am eating, and my book function be where is the book I am reading. Then, my utility function has two inputs, and --- --- where and . If I care equally about books and food and there is some kind of comparability of the utilities, then it might be the case that , .

Now, if changes, has my utility function changed?

I don't think so. My utility function is still , even if what y ends up being in different cases is different.

I can see this being a problem, though. What is to stop us from doing the following. Let be my true utility function and let be my so called *practical* utility function. Furthermore, let so that . If we agree that changing the taste function doesn't alter the utility function, then changing shouldn't alter my utility function --- but this is all it is based on!

Is what matters that it is my utility function? If there is some other agent in the world, Atticus, and he has a utility function, I don't directly care about maximizing Atticus' utility function (I might indirectly care about trying to maximize his utility function if he is my friend or something, and we could incorporate that into my practical utility function).

Atticus cares, ultimately, only about his, and I mine. If someone where to actually change my utility function in a strong way (say, an angel) would I still be I? Or would Whispermute qua rational agent have gone out of existence?

This seems to be getting into difficult questions about identity and whatnot, which is an area away from which I had hoped to stay. Alack, I must venture forth.

If choosing to choose places me in a position in which I know I will have my utility function messed around with, and I think that having my utility function changed ends me qua rational agent, then in some sense it seems that I would cease being I were I to have my utility function changed. If this were the case, I think I would almost certainly be irrational were I to choose to choose.

Maybe for some clarity let us use a thought experiment. Suppose Maltrion is an assasin, and he serves his Master. Maltrion's utility function takes as direct input his Master's utility function. .

However, Maltrion is clever, and he has found a certain spell in his Master's cabinet that he can use to change his own utility function. He contemplates changing it to be maximized by his immediate death, which he could easily fulfill due to his technical assassin skills. But he knows that this would greatly displease his master.

Given that , what should he do?

Position 1: Maltrion should not use the spell. This is easily seen --- if is lowered by his action, then so is which is what is guiding Maltrion's actions. This, Maltrion should quite obviously not use the spell, as this would decrease his utility.

Position 2: Maltrion should use the spell. Maltrion wants to increase his own utility, and there is nothing essential that . He is in a position to change it so that has a domain of , where is the highest could ever be *in any possible form of * (suppose ). Maltrion, being a rational agent, is perfectly able to reason this through. Thus, since he wants maximize he should use the spell.