cubefox

Interesting! I have a few remarks, but my reply will have to wait a few days as I have to finish something.

The way I think about it: The utility maximizer looks for the available action with the highest utility and only then decides to do that action. A decision is the event of setting the probability of the action to 1, and, because of that, its utility to 0. It's not that an agent decides for an action (sets it to probability 1) because it has utility 0. That would be backwards.

There seems to be some temporal dimension involved, some "updating" of utilities. Similar to how assuming the principle of conditionalization formalizes classical Bayesian updating when something is observed. It sets to a new value, and (or because?) it sets to 1.

A rule for utility updating over time, on the other hand, would need to update both probabilities and utilities, and I'm not sure how it would have to be formalized.

I'm not perfectly sure what the connection with Bayesian updates is here. In general it is provable from the desirability axiom that
This is because any (e.g. ) is logically equivalent to for any (e.g. ), which also leads to the "law of total probability". Then we have a disjunction which we can use with the desirability axiom. The denominator cancels out and gives us in the nominator instead of , which is very convenient because we presumably don't know the prior probability of an action . After all, we want to figure out whether we should do (= make ) by calculating first. It is also interesting to note that a utility maximizer (an instrumentally rational agent) indeed chooses the actions with the highest utility, not the actions with the highest *expected* utility, as is sometimes claimed.

Yes, after you do an action you become certain you have done it; its probability becomes 1 and its utility 0. But I don't see that as counterintuitive, since "Doing it again", or "continuing to do it" would be a different action which has not utility 0. Is that what you meant?

Well, the "expected value" of something is just the value multiplied by its probability. It follows that, if the thing in question has probability 1, its value is equal to the expected value. Since is a tautology, it is clear that .

Yes, this fact is independent of , but this shouldn't be surprising I think. After all, we are talking about the utility of a *tautology* here, not about the utility of itself! In general, is usually not 1 ( and are only presumed to be mutually exclusive, not necessarily exhaustive), so its utility and expected utility can diverge.

In fact, in his book "The Logic of Decision" Richard Jeffrey proposed for his utility theory that the utility of any tautology is zero: This should make sense, since learning a tautology has no value for us, neither positive not negative. This assumption also has other interesting consequences. Consider his "desirability axiom", which he adds to the usual axioms of probability to obtain his utility theory:

If and are mutually exclusive, then (Alternatively, this axiom is provable from the expected utility hypothesis I posted a few days ago, by dividing both sides of the equation by .)

If we combine this axiom with the assumption (tautologies have utility zero), it is provable that if then . Jeffrey explains this as follows: Interpreting utility subjectively as degree of desire, we can only desire things we don't have, or more precisely, things we are not certain are true. If something is certain, the desire for it is already satisfied, for better or for worse. Another way to look at it is that the "news value" of a certain proposition is zero. If the utility of a proposition is how good or bad it would be if we learned that it is true, then learning a certain proposition doesn't have any value, positive or negative, since we knew it all along. So it should be assigned the value 0.

Another provable consequence is this: If (with not necessarily being certain), then . In other words, if we don't care whether is true or not, if we are indifferent between and , then the utility of is zero. This seems highly plausible.

Yet another provable consequence is that we actually obtain a negation rule for utilities: In other words, the utility of the negation of is the utility of times its negative odds.

I also wondered whether it is then possible to also derive other rules for utility theory, such as for where and are not presumed to be mutually exclusive, or for . It would also be helpful to have a definition of conditional utility , i.e. the utility of under the assumption that is satisfied (certain). Presumably we would then have facts like .

Regarding the problem with the random variable : Since I believe probabilities of the values of a random variable sum to 1, I think we would have to assign all random variables probability 1 if we interpret the probability of a random variable as the probability of the disjunction of its values, and consequently utility zero if we accept that tautologies have utility zero.

But I'm not very familiar with random variables, and I'm not sure we even need them in subjective utility theory, a theory of instrumental rationality where we deal with propositions ("events") which can be believed and desired (assigned a probability and a utility). A random variable does not straightforwardly correspond to a proposition, except the binary random variable which has the two values and .

Ah, thanks. I still find this strange, since in your case and are events, which can be assigned specific probabilities and utilities, while is apparently a random variable. A random variable is, as far as I understand, basically a set of mutually exclusive and exhaustive events. E.g. = The weather tomorrow = {good, neutral, bad}. Each of those events can be assigned a probability (and they must sum to 1, since they are mutually exclusive and exhaustive) and a utility. So it seems it doesn't make sense to assign itself a utility (or a probability). But I might be just confused here...

Edit: It would make more sense, and in fact agree with the formula I posted in my last comment, if a random variable would correspond to an event that is the disjunction of its possible values. E.g. = weather will be good or neutral or bad. In which case the probability of a random variable will be always 1, such that the expected utility of the disjunction is just its utility, and my formula above is identical to yours.

I'm probably missing something here, but how is a defined expression? I thought takes as inputs events or outcomes or something like that, not a real number like something which could be multiplied with ? It seems you treat not as an event but as some kind of number? (I get of course, since returns a real number.)

The thing I would have associated with "expected utility hypothesis": If and are mutually exclusive, then

Could you explain the "expected utility hypothesis"? Where does this formula come from? Very intriguing!

In Jeffrey's desirability formula you write . But isn't this value always 1 for any i? Which would mean the term can be eliminated since multiplying with 1 makes no difference? Assume p = "the die comes up even". So the partition of p is (the die comes up...) {2,4,6}. And for all i. E.g. P(even|2)=1.

I guess you (Jeffrey) rather meant ?

Similar recommendation to blog post writers: Try to include only relatively important links, since littering your post with links will increase effective reading time for many readers. Which will cause fewer people to read the (whole) post.

This is similar to post length: There is an urge to talk about everything somewhat relevant to the topic, respond to all possible objections and the like. But longer posts will, on average, be read by fewer people. There is a trade-off between being concise and being thorough.

Regarding the time stamp: Yeah, this is the right way to think about it, at least in the case of subjective utility theory, where utilities represent desires, and probabilities represent beliefs, and it also the right way to think about for Bayesianism (subjective probability theory). U and P only represent the subjective state of an agent at a particular point in time. They don't say anything how they should be changed over time. They only say that at any point in time, these functions (the agents) should satisfy the axioms.

Rules for change over time would need separate assumptions. In Bayesian probability theory this is usually the rule of classical conditionalization or the more general rule of Jeffrey conditionalization. (Bayes' theorem alone doesn't say anything about updating. Bayes' rule = classical conditionalization + Bayes' theorem)

Regarding the utility of a, you write the probability part in the sum is P(ω|a)−P(ω). But it is actually just P(ω|a)!

To see this, start with the desirability axiom: U(A∨B)=P(A)U(A)+P(B)U(B)P(A)+P(B) This doesn't tell us how to calculate U(A), only U(A∨B). But we can write A as the logically equivalent (A∧B)∨(A∧¬B)). This is a disjunction, so we can apply the desirability axiom: U(A)=U((A∧B)∨(A∧¬B))=P(A∧B)U(A∧B)+P(A∧¬B)U(A∧¬B)P(A∧B)+P(A∧¬B) This is equal to U(A)=P(A∧B)U(A∧B)+P(A∧¬B)U(A∧¬B)P(A). Since P(A∧B)P(A)=P(B|A), we have U(A)=P(B|A)U(A∧B)+P(¬B|A)U(A∧¬B). Since A was chosen arbitrarily, it can be any proposition whatsoever. And since in Jeffrey's framework we only consider propositions, all actions are also described by propositions. Presumably of the form "I now do x". Hence, U(a)=P(B|a)U(a∧B)+P(¬B|a)U(a∧¬B) for any B.

This proof could also be extended to longer disjunctions between mutually exclusive propositions apart from B and ¬B. Hence, for a set S of mutually exclusive propositions s, U(a)=∑s∈SP(s|a)U(a∧s). The set Ω, the "set of all outcomes", is a special case of S where the mutually exclusive elements ω of Ω sum to 1. One interpretation is to regard each ω as describing one complete possible world. So, U(a)=∑ω∈ΩP(ω|a)U(a∧ω). But of course this holds for any proposition, not just an action a. This is the elegant thing about Jeffrey's decision theory which makes it so general: He doesn't need special types of objects (acts, states of the world, outcomes etc) and definitions associated with those.

Regarding the general formula for U(A∨B). Your suggestion makes sense, I also think it should be expressible in terms of U(A), U(B), and U(A∧B). I think I've got a proof.

Consider (A∧B)∨(A∧¬B)∨(¬A∧B)∨(¬A∧¬B)=⊤. The disjunctions are exclusive. By the expected utility hypothesis (which should be provable from the desirability axiom) and by the U(⊤)=0 assumption, we have 0=E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B))+E(U(¬A∧¬B)). Then subtract the last term: −E(U(¬A∧¬B))=E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B)). Now since E(U(A))+E(U(¬A))=0 for any A, we have E(U(¬A))=−E(U(A)). Hence, −E(U(¬A∧¬B)=E(U(¬(¬A∧¬B)). By De Morgan, ¬(¬A∧¬B)=A∨B. Therefore E(U(A∨B))=E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B)). Now add E(U(A∧B)) to both sides: E(U(A∨B))+E(U(A∧B))=2E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B)). Notice that A=(A∧B)∨(A∧¬B) and B=(A∧B)∨(¬A∧B). Therefore we can write E(U(A∨B))+E(U(A∧B))=E(U(A))+E(U(B)). Now subtract E(U(A∧B)) and we have E(U(A∨B))=E(U(A))+E(U(B))−E(U(A∧B)). which is equal to P(A∨B)U(A∨B)=P(A)U(A)+P(B)U(B)−P(A∧B)U(A∧B). So we have U(A∨B)=P(A)U(A)+P(B)U(B)−P(A∧B)U(A∧B)P(A∨B). and hence our

theoremU(A∨B)=P(A)U(A)+P(B)U(B)−P(A∧B)U(A∧B)P(A)+P(B)−P(A∧B) which we can also write as U(A∨B)=P(A|A∨B)U(A)+P(B|A∨B)U(B)−P(A∧B|A∨B)U(A∧B). Success!Okay, now with U(A∨B) solved, what about the definition of U(A|B)? I think I got it: U(A|B):=U(A∧B)−U(B) This correctly predicts that U(A|A)=0. And it immediately leads to the plausible consequence U(A∧B)=U(A|B)+U(B). I don't know how to further check whether this is the right definition, but I'm pretty sure it is.