This post is more about articulating motivations than about presenting anything new, but I think readers may learn something about the foundations of classical (evidential) decision theory as they stand.
Most people interested in decision theory know about the VNM theorem and the Dutch Book argument, and not much more. The VNM theorem shows that if we have to make decisions over gambles which follow the laws of probability, and our preferences obey four plausible postulates of rationality (the VNM axioms), then our preferences over gambles can be represented as an expected utility function. On the other hand, the Dutch Book argument assumes that we make decisions by expected utility, but perhaps with a non-probabilistic belief function. It then proves that any violation of probability theory implies a willingness to take sure-loss gambles. (Reverse Dutch Book arguments show that indeed, following the laws of probability eliminates these sure-loss bets.)
So we can more or less argue for expected utility theory starting from probability theory, and argue for probability theory starting from expected utility theory; but clearly, this is not enough to provide good reason to endorse Bayesian decision theory overall. Subsequent investigations which I will summarize have attempted to address this gap.
But first, why care?
- Logical Induction can be seen as resulting from a small tweak to the Dutch Book setup, relaxing it enough that it could apply to mathematical uncertainty. Although we were initially optimistic that Logical Induction would allow significant progress in decision theory, it has proven difficult to get a satisfying logical-induction DT. Perhaps it would be useful to instead understand the argument for DT as a whole, and try to relax the foundations of DT in "the same way" we relaxed the foundations of probability theory.
- It seems likely to me that such a re-examination of the foundations would automatically provide justification for reflectively consistent decision theories like UDT. Hopefully I can make my intuitions clear as I describe things.
- Furthermore, the foundations of DT seem like they aren't that solid. Perhaps we've put blinders on by not investigating these arguments for DT in full. Even without the kind of modification to the assumptions which I'm proposing, we may find significant generalizations of DT are given just by dropping unjustified axioms in the existing foundations. We can already see one such generalization, the use of infinitesimal probability, by studying the history; I'll explain this more.
Justifying Probability Theory
Before going into the attempts to justify Bayesian decision theory in its entirety, it's worth mentioning Cox's theorem, which is another way of justifying probability alone. Unlike the Dutch Book argument, it doesn't rely on a connection between beliefs and decisions; instead, Cox makes a series of plausible assumptions about the nature of subjective belief, and concludes that any approach must either violate those assumptions or be essentially equivalent to probability theory.
There has been some controversy about holes in Cox's argument. Like other holes in the foundations which I will discuss later, it seems one conclusion we can draw by dropping unjustified assumptions is that there is no good reason to rule out infinitesimal probabilities. I haven't understood the issues with Cox's theorem yet, though, so I won't remark on this further.
This is an opinionated summary of the foundations of decision theory, so I'll remark on the relative quality of the justifications provided by the Dutch Book vs Cox. The Dutch Book argument provides what could be called consequentialist constraints on rationality: if you don't follow them, something bad happens. I'll treat this as the "highest tier" of argument. Cox's argument relies on more deontological constraints: if you don't follow them, it seems intuitively as if you've done something wrong. I'll take this to be the second tier of justification.
Justifying Decision Theory
Before we move on to attempts to justify decision theory in full, let's look at the VNM axioms in a little detail.
The set-up is that we've got a set of outcomes , and we consider lotteries over outcomes which associate a probability with each outcome (such that and ). We have a preference relation over outcomes, , which must obey the following properties:
- (Completeness.) For any two lotteries , either , or , or neither, written . (" or " will be abbreviated as "" as usual.)
- (Transitivity.) If and , then .
- (Continuity.) If , then there exists such that a gamble assigning probability to and to satisfies .
- (Independence.) If , then for any and , we have .
Transitivity is often considered to be justified by the money-pump argument. Suppose that you violate transitivity for some ; that is, and , but . Then you'll be willing to trade away for and then for (perhaps in exchange for a trivial amount of money). But, then, you'll have ; and since , you'll gladly pay (a non-trivial amount) to switch back to . I can keep sending you through this loop to get more money out of you until you're broke.
The money-pump argument seems similar in nature to the Dutch Book argument; both require a slightly unnatural setup (making the assumption that utility is always exchangeable with money), but resulting in strong consequentialist justifications for rationality axioms. So, I place the money-pump argument (and thus transitivity) in my "first tier" along with Dutch Book.
Completeness is less clear. According to the SEP, "most decision theorists suggest that rationality requires that preferences be coherently extendible. This means that even if your preferences are not complete, it should be possible to complete them without violating any of the conditions that are rationally required, in particular Transitivity." So, I suggest we place this in a third tier, the so-called structural axioms: those which are not really justified at all, except that assuming them allows us to prove our results.
"Structural axioms" are a somewhat curious artefact found in almost all of the axiom-sets which we will look at. These axioms usually have something to do with requiring that the domain is rich enough for the intended proof to go through. Completeness is not usually referred to as structural, but if we agree with the quotation above, I think we have to regard it as such.
I take the axiom of independence to be tier two: an intuitively strong rationality principle, but not one that's enforced by nasty things that happen if we violate it. It surprises me that I've only seen this kind of justification for one of the four VNM axioms. Actually, I suspect that independence could be justified in a tier-one way; it's just that I haven't seen it. (Developing a framework in which an argument for independence can be made just as well as the money-pump and dutch-book arguments is part of my goal.)
I think many people would put continuity at tier two, a strong intuitive principle. I don't see why, personally. For me, it seems like an assumption which only makes sense if we already have the intuition that expected utility is going to be the right way of doing things. This puts it in tier 3 for me; another structural axiom. (The analogs of continuity in the rest of the decision theories I'll mention come off as very structural.)
Leonard Savage took on the task of providing simultaneous justification of the entire Bayesian decision theory, grounding subjective probability and expected utility in one set of axioms. I won't describe the entire framework, as it's fairly complicated; see the SEP section. I will note several features of it, though:
- Savage makes the somewhat peculiar move of separating the objects of belief ("states") and objects of desire ("outcomes"). How we go about separating parts of the world into one or the other seems quite unclear.
- He replaces the gambles from VNM with "acts": an act is a function from states to outcomes (he's practically begging us to make terrible puns about his "savage acts"). Just as the VNM theorem requires us to assume that the agent has preferences on all lotteries, Savage's theorem requires the agent to have preferences over all acts; that is, all functions from states to outcomes. Some of these may be quite absurd.
- As the paper Actualist Rationality complains, Savage's justification for his axioms is quite deontological; he is primarily saying that if you noticed any violation of the axioms in yourself, you would feel there's something wrong with your thinking and you would want to correct it somehow. This doesn't mean we can't put some of his axioms in tier 1; after all, he's got a transitivity axiom like everyone else. However, on Savage's account, it's all what I'd call tier-two justification.
- Savage certainly has what I'd call tier-three axioms, as well. The SEP article identifies P5 and P6 as such. His axiom P6 requires that there exist world-states which are sufficiently improbable so as to make even the worst possible consequences negligible. Surely it can't be a "requirement of rationality" that the state-space be complex enough to contain negligible possibilities; this is just something he needs to prove his theorem. P6 is Savage's analog of the continuity axiom.
- Savage chooses not to define probabilities on a sigma-algebra. I haven't seen any decision-theorist who prefers to use sigma-algebras yet. Similarly, he only derives finite additivity, not countable additivity; this also seems common among decision theorists.
- Savage's representation theorem shows that if his axioms are followed, there exists a unique probability distribution and a utility function which is unique up to a linear transformation, such that the preference relation on acts is also the ordering with respect to expected utility.
In contrast to Savage, Jeffrey's decision theory makes the objects of belief and the objects of desire the same. Both belief and desire are functions of logical propositions.
The most common axiomatization is Bolker's. We assume that there is a boolean field, with a preference relation , following these axioms:
- is transitive and complete. is defined on all elements of the field except . (Jeffrey does not wish to require preferences over propositions which the agent believes to be impossible, in contrast to Savage.)
- The boolean field is complete and atomless. More specifically:
- An upper bound of a (possibly infinite) set of propositions is a proposition implied by every proposition in that set. The supremum of is an upper bound which implies every upper bound. Define lower bound and infimum analogously. A complete Boolean algebra is one in which every set of propositions has a supremum and an infimum.
- An atom is a proposition other than which is implied by itself and , but by no other propositions. An atomless Boolean algebra has no atoms.
- (Law of Averaging.) If ,
- If , then
- If , then
- (Impartiality.) If and , then if for some where and not , then for every such .
- (Continuity.) Suppose that is the supremum (infimum) of a set of propositions , and . Then there exists such that if is implied by (or where is the infimum, implies ), then .
The central axiom to Jeffrey's decision theory is the law of averaging. This can be seen as a kind of consequentialism. If I violate this axiom, I would either value some gamble less than both its possible outcomes and , or value it more. In the first case, we could charge an agent for switching from the gamble to ; this would worsen the agent's situation, since one of or was true already, , and the agent has just lost money. In the other case, we can set up a proper money pump: charge the agent to keep switching to the gamble , which it will happily do whichever of or come out true.
So, I tentatively put axiom 3 in my first tier (pending better formalization of that argument).
I've already dealt with axiom 1, since it's just the first two axioms of VNM rolled into one: I count transitivity as tier one, and completeness as tier two.
Axioms two and five are clearly structural, so I place them in my third tier. Bolker is essentially setting things up so that there will be an isomorphism to the real numbers when he derives the existence of a probability and utility distribution from the axioms.
Axiom 4 has to be considered structural in the sense I'm using here, as well. Jeffrey admits that there is no intuitive motivation for it unless you already think of propositions as having some kind of measure which determines their relative contribution to expected utility. If you do have such an intuition, axiom 4 is just saying that propositions whose weight is equal in one context must have equal weight in all contexts. (Savage needs a similar axiom which says that probabilities do not change in different contexts.)
Unlike Savage's, Bolker's representation theorem does not give us a unique probability distribution. Instead, we can trade between utility and probability via a certain formula. Probability zero events are not distinguishable from events which cause the utilities of all sub-events to be constant.
Zoltan Domotor provides an alternative set of axioms for Jeffrey's decision theory. Domotor points out that Bolker's axioms are sufficient, but not necessary, for his representation theorem. He sets out to construct a necessary and sufficient axiomatization. This necessitates dealing with finite and incomplete boolean fields. The result is a representation theorem which allows nonstandard reals; we can have infinitesimal probabilities, and infinitesimal or infinite utilities. So, we have a second point of evidence in favor of that.
Although looking for necessary and sufficient conditions seems promising as a way of eliminating structural assumptions like completeness and atomlessness, it ends up making all axioms structural. In fact, Domotor gives essentially one significant axiom: his axiom J2. J2 is totally inscrutable without a careful reading of the notation introduced in his paper; it would be pointless to reproduce it here. The axiom is chosen to exactly state the conditions for the existence of a probability and utility function, and can't be justified in any other way -- at least not without providing a full justification for Jeffrey's decision theory by other means!
Another consequence of Domotor's axiomatization is that the representation becomes wildly non-unique. This has to be true for a representation theorem dealing with finite situations, since there is a lot of wiggle room in what probabilities and utilities represent preferences over finite domains. It gets even worse with the addition of infinitesimals, though; the choice of nonstandard-real field confronts us as well.
Conditional Probability as Primitive
In What Conditional Probabilities Could Not Be, Alan Hajek argues that conditional probability cannot possibly be defined by Bayes' famous formula, due primarily to its inadequacy when conditioning on events of probability zero. He also takes issue with other proposed definitions, arguing that conditional probability should instead be taken as primitive.
The most popular way of doing this are Popper's axioms of conditional probability. In Learning the Impossible (Vann McGee, 1994), it's shown that conditional probability functions following Popper's axioms and nonstandard-real probability functions with conditionals defined according to Bayes' theorem are inter-translatable. Hajek doesn't like the infinitesimal approach because of the resulting non-uniqueness of representation; but, for those who don't see this as a problem but who put some stock in Hajek's other arguments, this would be another point in favor of infinitesimal probability.
In A unified Bayesian decision theory, Richard Bradley shows that Savage's and Jeffrey's decision theories can be seen as special cases of a more general decision theory which takes conditional probabilities as a basic element. Bradley's theory groups all the "structural" assumptions together, as axioms which postulate a rich set of "neutral" propositions (essentially, postulating a sufficiently rich set of coin-flips to measure the probabilities of other propositions against). He needs to specifically make an archimedean assumption to rule out nonstandard numbers, which could easily be dropped. He manages to derive a unique probability distribution in his representation theorem, as well.
OK, So What?
In general, I have hope that most of the tier-two axioms could become tier-one; that is, it seems possible to create a generalization of dutch-book/money-pump arguments which covers most of what decision theorists consider to be principles of rationality. I have an incomplete attempt which I'll develop for a future post. I don't expect tier-three axioms to be justifiable in this way.
With such a formalism in hand, the next step would be to try to derive a representation theorem: how can we understand the preferences of an agent which doesn't fall into these generalized traps? I'm not sure what generalizations to expect beyond infinitesimal probability. It's not even clear that such an agent's preferences will always be representable as a probability function and utility function pair; some more complicated structure may be implicated (in which case it will likely be difficult to find!). This would tell us something new about what agents look like in general.
The generalized dutch-book would likely disallow preference functions which put agents in situations they'll predictably regret. This sounds like a temporal consistency constraint; so, it might also justify updatelessness automatically or with a little modification. That would certainly be interesting.
And, as I said before, if we have this kind of foundation we can attempt to "do the same thing we did with logical induction" to get a decision theory which is appropriate for situations of logical uncertainty as well.
I am not optimistic about this project. My primary reason is that decision theory has two parts. First, there is the part that is related to this post, which I'll call "Expected Utility Theory." Then, there is the much harder part, which I'll call "Naturalized Decision Theory."
I think expected utility theory is pretty well understood, and this post plays around with details of a well understood theory, while naturalized decision theory is not well understood at all.
I think we agree that the work in this post is not directly related to naturalized decision theory, but you think it is going to help anyway.
My understanding of your argument (correct me if I am wrong) is that probability theory is to logical uncertainty as expected utility theory is to naturalized decision theory, and dutch books lead to LU progress, so VNMish should lead to NDT progress.
I challenge this in two ways.
First, Logical Inductors look like dutch books, but this might be because things related to probability theory can be talked about with dutch books. I don't think that thinking about Dutch books lead to the invention of Logical Inductors (Although maybe they would have if I followed the right path), and I don't think that the post hoc connection provides much evidence that thinking about dutch books is useful. Perhaps whenever you have a theory, you can do this formal justification stuff, but formal justification does not create theories.
I realize that I actually do not stand behind this first challenge very much, but I still want to put it out there as a possibility.
Second, I think that in a way Logical Uncertainty is about resource bounded Probability theory, and this is why a weakening of dutch books helped. On the other hand, Naturalized Decision Theory is not about resource bounded Expected Utility Theory. We made a type of resource bounded Probability theory, and magically got some naturalistic reasoning out of it. I expect that we cannot do the same thing for decision theory, because the relationship is more complicated.
Expected Utility Theory is about your preferences over various worlds. If you follow the analogy with LI strongly, if you succeed, we will be able to extend it to having preferences over various worlds which contain yourself. This seems very far from a solution to naturalized decision theory. In fact, it does not feel that far from what we might be able to easily do with existing Expected Utility Theory plus logical inductors.
Perhaps I am attacking a straw man, and you mean “do the same thing we did with logical induction” less literally than I am interpreting it, but in this case there is way more special sauce in the part about what you do to generalize expected utility theory, so I expect it to be much harder than the Logical Induction case.
I think there's a sense in which I buy this but it might be worth explaining more.
My current suspicion is that "agents that have utility functions over the outcome of the physics they are embedded in" is not the right concept for understanding naturalized agency (in particular, the "motive forces" of the things that emerge from processes like abiogenesis/evolution/culture/AI research). This concept is often argued for using dutch-book arguments (e.g. VNM). I think these arguments are probably invalid when applied to naturalized agents (if taken literally they assume something like a "view from nowhere" and unbounded computation, etc). As such, re-examining what arguments can be made about coherent naturalized agency while avoiding inscription errors* seems like a good path towards recovering the correct concepts for thinking about naturalized agency.
*I'm getting the term "inscription error" from Brian Cantwell Smith (On the Origin of Objects, p. 50):
I think most of our disagreement actually hinges on this part. My feeling is that I, at least, don't understand EU well enough; when I look at the foundations which are supposed to argue decisively in its favor, they're not quite as solid as I'd like.
If I was happy with the VNM assumption of probability theory (which I feel is circular, since Dutch Book assumes EU), I think my position would be similar to this (linked by Alex), which strongly agrees with all of the axioms but continuity, and takes continuity as provisionally reasonable. Continuity would be something to maybe dig deeper into at some point, but not so likely to bear fruit that I'd want to investigate right away.
However, what's really interesting is justification of EU and probability theory in one stroke. The justification of the whole thing from only money-pump/dutch-book style arguments seems close enough to be tantalizing, while also having enough hard-to-justify parts to make it a real possibility that such a justification would be of an importantly generalized DT.
All I have to say here is that I find it somewhat plausible outside-view; an insight from a result need not be an original generator of the result. I think max-margin classifiers in machine learning are like this; the learning theory which came from explaining why they work was then fruitful in producing other algorithms. (I could be wrong here.)
I don't think naturalized DT is exactly what I'm hoping to get. My highest hope that I have any concrete reason to expect is a logically-uncertain DT which is temporally consistent (without a parameter for how long to run the LI).
Suppose A < B but pA+(1-p)C > pB + (1-p)C. A genie offers you a choice between pA+(1-p)C and pB + (1-p)C, but charges you a penny for the former. Then if A is supposed to happen rather than C, the genie offers to make B happen instead, but will charge you another penny for it. If you pay two pennies, you're doing something wrong. (Of course, these money-pumping arguments rely on the possibility of making arbitrarily small side payments.)
Sure, it is structural, but your description of structural axioms made it sound like something it would be better if you didn't have to accept, in case they end up not being true, which would be very inconvenient for the theorem. But if the continuity axiom is not an accurate description of your preferences, pretending it is changes almost nothing, so accepting the continuity axiom anyway seems well-justified from a pragmatic point of view. See this and this (section "Doing without Continuity") for explanations.
This is annoying. Does anyone here know why they do this? My guess is that it's because their nice theorems about the finite case don't have straightforward generalizations that refer to sigma-algebras (I'm guessing this mainly because it appears to be the case for the VNM theorem, which only works if lotteries can only assign positive probability to finitely many outcomes).
Is it indeed the case that the VNM theorem cannot be generalized to the measure-theoretic setting?
Hypothesis: Consider X a compact Polish space. Let R⊆P(X)×P(X) be closed in the weak topology and satisfy the VNM axioms (in the sense that μ≤ν iff (μ,ν)∈R). Then, there exists u:X→R continuous s.t. (μ,ν)∈R iff Eμ[u]≤Eν[u].
One is also tempted to conjecture a version of the above where X is just a measurable space, R is closed in the strong convergence topology and u is just measurable. However, there's the issue that if u is not bounded from either direction, there will be μ s.t. Eμ[u] is undefined. Does it mean u automatically comes out bounded from one direction? Or that we need to add an additional axiom, e.g. that there exists μ which is a global minimum (or maximum) in the preference ordering?
Both of your conjectures are correct. In the measurable / strong topology case, u will necessarily be bounded (from both directions), though it does not follow that the bounds are achievable by any probability distribution.
I described the VNM theorem as failing on sigma-algebras because the preference relation being closed (in the weak or strong topologies) is an additional assumption, which seems much more poorly motivated than the VNM axioms (in Abram's terminology, the assumption is purely structural).
I think that one can argue that a computationally bounded agent cannot reason about probabilities with infinite precision, and that therefore preferences have to depend on probabilities in a way which is in some sense sufficiently regular, which can justify the topological condition. It would be nice to make this idea precise. Btw, it seems that the topological condition implies the continuity axiom.
Actually, the way you formulated it, completeness seems quite clear. If completeness is violated then there are A and B s.t. A<B and B<A which is an obvious money-pump. It is transitivity that is suspect: in order to make the Dutch book argument, you need to assume the agent would agree to switch between A and B s.t. neither A<B nor B<A. On the other hand, we could have used ≤ as the basic relation and defined A<B as "A≤B and not B≤A." In this version, transitivity is "clear" (assuming appropriate semantics) but completeness (i.e. the claim that for any A and B, either A≤B or B≤A) isn't.
Btw, what would be an example of a relation that satisfies the other axioms but isn't coherently extensible?
Why do we need utility functions and expected utilities at all?