Mentioned in

Generalizing Foundations of Decision Theory II

1Vanessa Kosoy

0abramdemski

0abramdemski

0SamEisenstat

New Comment

*Regret Theory with General Choice Sets* by John Quiggen is a generalization of DT of more the sort I was initially hoping to produce. It doesn't try to justify probability theory (it assumes it). Like me, it considers sets of options rather than only binary choices. Unlike me, it requires that if the bookie makes sequential offers, the bookie must keep all previously-given offers in the set (where I only require the bookie to keep the option which the agent chooses). This blocks the money-pump argument for transitivity, but still allows significant constraints on preferences to be argued by money-pump.

The result of this modification to the setup is that the agent can have many different utility functions which are used for different choice-sets. The condition on this is that the utility function must stay the same whenever the "best achievable outcome" is the same. (Best see the paper for that notion.)

This isn't too related to your main point, but every ordered field can be embedded into a field of Hahn series, which might be simpler to work with than surreals.

That page discusses the basics of Hahn series, but not the embedding theorem. (Ehrlich, 1995) treats things in detail, but is long and introduces a lot of definitions. The embedding theorem is stated on page 23 (24 in the pdf).

As promised in the previous post, I develop my formalism for justifying as many of the decision-theoretic axioms as possible with generalized dutch-book arguments. (I'll use the term "generalized dutch-book" to refer to arguments with a family resemblance to dutch-book or money-pump.) The eventual goal is to relax these assumptions in a way which addresses bounded processing power, but for now the goal is to get as much of classical decision theory as possible justified by a generalized dutch-book.

I made some predictions about what I would and wouldn't be able to justify with generalized dutch-book. These turned out to be only partially correct. I was able to get a version of Jeffrey's evidential decision theory, but with nonstandard utility and probability functions and with less assumptions about the structure of the boolean algebra. Nonstandard probability and utility functions are not a very big generalization in themselves. This makes it a little less interesting, supporting Scott's comment. I still have some hope that more interesting things can be developed from this going forward, though.

I wouldn't be

toosurprised if this has all been done before and I just haven't been able to find it. Justifying all of the axioms in terms of generalized dutch-book arguments seems like an obviously desirable thing to do. Yet, I haven't seen such a complete justification of common axioms via generalized dutch-books; that is part of why I initially assumed it would not be possible. The closest thing I've heard about is Stuart Armstrong's post showing that immunity to a generalization of the money-pump argument is necessary and sufficient for the VNM axioms minus continuity. So, although the result here doesn't give as much of a generalization of decision theory as I might have hoped for, it's the nicest justification of evidential decision theory that I know of.##Agents

The basic representation of an agent's preference is as follows:

I'm representing preferences on sets only so that I can argue that this reduces to binary preference. Forcing the choice function to return at least one item is like assuming the completeness of binary preference. This deprives me of the chance to apply Vanessa's money-pump argument for completeness, but I think "the agent has to make a choice" is a decent justification for completeness. I allow the agent to return

morethan one choice because to do otherwise would be equivalent to assuming anti-symmetry; this seems too strong. We want an agent to be able to be indifferent between propositions. (You can imagine that the agent would choose randomly when confronted with such choices, but the choice function encodes the set of choices they might make.)##Money and Contracts

In order to make the money-pump argument, I'll need a concept of money. I assume that money comes from an ordered field M. To make dutch-book arguments, I also need contracts which pay money conditional on certain propositions. These can be represented as a pair, [P,m]∈B×M. The agent holds a multiset of contracts, since it's meaningful to hold several copies of the same contract. In fact, for one proof I'll need counts of contracts to come from M, so that we can have fractionally many, negatively many, and whatever else M provides. (This is a somewhat unfortunate complication, but I didn't find a way to do without it.) However, to avoid headaches, I consider only multisets with finitely many non-zero counts. So, putting these together, we will extend the preferences of the agent to triples giving a proposition, an amount of money, and a multi-set of contracts: (P,m,c)∈(B−⊥)×M×(B×M)→M. Such triples will be called

prospects. I will still use the abbreviations ≺, ∼, and ⪯. I'll also write P+m to indicate the addition of money to a prospect. Multi-set addition will be written ⊎, so for example c′=c⊎{[A,1]} means that c′ is like c except that the count of contract [A,1] is one greater.Lemma 1 (constraints on value of money).We can extend Ch() on non-empty sets of propositions to a function Ch′() on non-empty sets of prospects, in a way which satisfies the following requirements:Proof.Let M be the ordered field defined by rational-coefficient polynomials in ϵ, considering integer powers of ϵ and taking ϵ0=1 and aϵn<bϵn−1 for all n∈Z,a,b∈Q.For a set of prospects S, let Ch′(S) be the set chosen by the following process: order the prospects first by the negative-power elements of the money alone, discarding any prospects which are non-maximal in this respect, to yield S′; Apply Ch() to the set of propositions of prospects in S′, keeping all prospects whose propositions would be kept by Ch(); finally, order the remaining set by money (considering non-negative powers in the polynomials this time), keeping only the maximal elements. In other words, we first consider the infinite amounts of money; then, we consider what Ch() would do; finally, we consider the finite and infinitesimal amounts of money. This definition of Ch′() satisfies all properties above. □

Note that the given construction of Ch′() is not necessarily the one the agent will use; for example, most agents typically considered would be better-represented by taking M=R and assigning real values to propositions. Rather, this lemma is just to show that my assumptions about how the agent's preferences are extended to prospects aren't restricting the agent's native preferences on propositions at all. Any preferences on propositions can be extended to prospects in a way which obeys these rules. The same could not be said of other similarly natural assumptions, for example assuming all of the above while requiring real-valued money. We don't want to sneak in restrictions on agent's preferences via assumptions about money; the money should

onlyserve to facilitate dutch-book arguments.Henceforth, Ch′() refers to

somefunction which obeys the properties from lemma 1.As for the question of why

these particularassumptions are made rather than some other set, this is at the level of asking why an agent should care about dutch-book or money-pump arguments, and I'll make little attempt to answer that here. The assumptions are made with the express purpose of facilitating the proof. One way to look at it is that we're asking the agent hypothetical questions, augmenting the propositions the agent understands with totally imaginary objects called "money" and "contracts". The possibility of our assumptions about these being "wrong" is meaningless. Instead, the question is whether arguments employing such hypotheticals to conclude various principles of rationality are compelling.##Games

The agent plays games with the bookie, as follows:

(Letting the bookie choose when the agent is indifferent is a convenience; we could consider the agent's response to be random, but we wouldn't want to consider an agent immune to dutch-book if it sometimes escaped them via random choice. We'd condemn the agent if it had a chance of being dutch-booked. So, it's simpler to just say that the bookie chooses.)

Fixing a particular bookie strategy and outcomes for each test round, we get a

play-out. Suppose the agent starts off holding (A,m,c). Call the prospect which the agentactuallyholds at the end of the play-out Q. Then an agent experiencesregretif, in every situation, Q is as bad or worse than it would have if it had stuck with the starting prospect. More formally, regret is defined in two cases:setP of allpossibleresults if the agent had stuck with the starting prospect, and the bookie had performed the same tests, but the tests could go either way within the limits of logical possibility. In this case, the agent is only said to experience regret if Q is less than or equal to every element of P. The regret is strict if Q is strictly less than any element of P.A bookie's strategy is a

generalized dutch-bookif the agent regrets every possible play-out, and strictly regrets at least one play-out.My definition of regret is a bit clunky. The intuition behind splitting up the cases in this way can be understood by a couple of examples.

Suppose the agent starts with prospect (A,0,∅). The bookie offers the choice of trading this for (A,−10,{[B,1]}), which the agent accepts. This is interpreted as paying $10 for a $1 bet on B. The bookie then tests for B, which comes up true. The agent now holds (A∧B,−9,∅). Clearly, the agent should regret this outcome. If it had stuck with (A,0,∅), it would have (A∧B,0,∅) after a successful test for B, which is strictly preferred by my assumptions concerning money.

On the other hand, suppose that the agent starts with (A,0,∅) and is offered the trade of (A∧B,−1,∅), which it prefers. The bookie then tests B, which can only come out true. The agent now has (A∧B,−1,∅). In this case, it doesn't make sense to say that the agent could have had (A∧B,0,∅) if it had only declined the trade. B came out true

because ofthe trade. To establish regret in this case, we would want both (A∧B,−1,∅)≤(A∧B,0,∅) and (A∧B,−1,∅)≤(A∧¬B,0,∅).Still, I'm not totally happy comparing to the starting prospect in this way. It would be more natural to compare the agent to agents with differing preferences: an agent has (strict) regret if some other preferences would have given it (strictly) better outcomes, given the bookie's strategy. We then deem a system of preferences Ch′() irrational if some other system of preferences achieves superior results even by the judgement of Ch′(). However, that definition wouldn't work here; bookies could reward irrationality, making every preference system dutch-bookable. The definition I use avoids this.

##Arguments for Necessity

Here, I prove a number of properties on the assumption that the agent is not susceptible to generalized dutch-books.

We can show that all preferences reduce to binary preference (a version of Independence of Irrelevant Alternatives):

Theorem 1 (Reduction to Binary Preference).If Ch′() is not susceptible to a generalized dutch-book, then Ch′(S) is exactly the set of prospects P∈S such that Q⪯P for all Q∈S.Proof.Suppose not. Then either there is a A∈Ch′(S) with a B∈S such that A≺B, or otherwise, there is a A∈S with no such B but A∉Ch′(S).In the first case, the bookie can start the agent with (B,0,∅), present the agent with the set of choices S, and make the agent choose (A,0,∅). At this point, the bookie can offer (B,−m,∅), with m sufficiently small so as to be favorable. The agent is now strictly worse off than it started, contradicting the assumption that it has no generalized dutch-books.

In the second case, start the agent with A, and present the choice of S. The agent will choose something other than A. Now the bookie can charge to switch back, offering (A,−m,∅). Again, this should be impossible. □

Here's the classic money-pump:

Theorem 2 (Transitivity).If Ch′() is not susceptible to a generalized dutch-book, then whenever A⪯B and B⪯C, A⪯C.Proof.Suppose not. Then we can find A,B,C such that A≤B and B≤C, but C<A. The bookie can start the agent holding (A,0,∅) and offer the choice to switch to (B,0,∅) and then (C,0,∅). The agent either strictly prefers these or is indifferent; in either case, the bookie can get the agent to switch. Since C<A, the bookie can now offer (A,−m,∅). By the properties of money, there exists an m small enough to ensure this trade is favorable. The agent is now strictly worse off than it started. □Note that binary preference is complete by definition, so we have complete, transitive preferences. Also note that transitivity of ⪯ implies transitivity of ≺ and of ∼.

Now we prove a version of Jeffrey's law of averaging:

Theorem 3 (Averaging).Suppose Ch′() is not susceptible to a generalized dutch-book. If A∧B=⊥, then A⪯B implies A⪯A∨B⪯B.Proof.Suppose not. Then by transitivity, either the disjunction is strictly preferred to both disjuncts, or both disjuncts are strictly preferred to it.In the first case, suppose wlog that A⪯B. Start the agent with (B,0,∅). Offer the choice to switch to (A∧B,−m,∅) with m small enough to be appealing. The agent accepts. Then, test whether A. The agent either ends up with (A,−m,∅) or (B,−m,∅). In both cases, it experiences strict regret.

In the second case, suppose again that A⪯B wlog. Start the agent with (A∨B,0,∅) and charge m to switch to A. Then, test for A. This creates strict regret, since the agent would have ended up in either (A,0,∅) or (B,0,∅) if it did nothing, both of which are better than (A,−m,∅). □

Next, we want to make classic dutch-book arguments to establish probability laws. To do this, however, we need to establish some properties of Ch′(). Specifically, we can show from transitivity and completeness that prospects act like values which extend the ordered field M.

Definition.Suppose that M is isomorphic with a subfield of the surreal numbers. Fix one such subfield to identify with M. Also fix an arbitrary well-ordering of prospects, ⋖, which will act like the surreal ordering of stages. Then therelative valueof prospect P=(A,m,c) relative to prospect Q=(B,n,d), denoted VQ(P), measures the prices at which the agent will accept offers to switch: it is defined as the surreal number {L|R} where L is the set of monetary values v such that (B,n+v,d)≺P, plus those values VQ(P′) where P′≺P and P′⋖P. R is the set of monetary values v such that P≺(B,n+v,d), plus those values V(P′) where P≺P′ and P′⋖P. Thevalue(non-relative) shall be defined as V(P)=V(⊤,0,∅)(P).(There are broad conditions under which M can be embedded into the surreal numbers as required here; in NBG set theory, this is always possible.)

Lemma 2.If M is isomorphic to a subfield of the surreal numbers, then the relative value given above is well-defined, and the closure of all prospect values under the field operations is an ordered field. Furthermore, V(P)<V(Q) if and only if P≺Q.Proof.Due to completeness, any prospect A splits the set of values so far (IE, M plus all A′⋖A) into disjoint sets worth less than, more than, and (possibly) the same as the trade of ⊤ for A. Call the first set L and the second R. Due to transitivity, it must be that any element from L is less than any element from R. We can conclude that V(A) is a well-formed surreal number. We can therefore take the closure under field operations to get a new subfield of the surreals, which will itself be an ordered field.Suppose V(A)<V(B), with V(A)={L|R} and V(B)={L′|R′}. By the definition of ordering on surreal numbers, there must either be an n∈R such that n≤{L′|R′}, or an n∈L′ such that {L|R}≤n. If n∈M, let A′=(⊤,n,∅); otherwise, n is V(A′) for some A′⋖A. So, we have either A≺A′⪯B or A⪯A′≺B. In either case, A≺B by transitivity.

Now, suppose A≺B. Either A⋖B or B⋖A. In the first case, V(B) must have V(A) in its left set; in the second case, V(A) must have V(B) in its right set. Either way, V(A)<V(B). □

Ordinarily, after getting a notion of value like this we'd like to show that it is unique up to linear transformations or something similar. This type of result is very unlikely here, since the arbitrary choice of ⋖ can result in significant differences in V() (on top of the already non-unique choice of Ch′() to represent a given Ch()). Nonetheless, we have the first half of a representation theorem. We can use this to define probability.

For notational convenience, define the value of a contract VP(c) to be the value of adding that contract to the contract set of prospect P, with V(c)=V(⊤,0,∅)(c). We define probability via the value of $1 bets:

Definition.The probability of a propositions Pr(A) is defined as the value $V ([A, $1])$.Theorem 4 (Probability Laws).If Ch′() has no generalized dutch-books,Proof.(Non-negativity.) Suppose Pr(A)<0. This means (⊤,0,{[A,1]})≺(⊤,0,∅). But since a strict preference implies a preference we're willing to pay for, (⊤,0,{[A,1]})⪯(⊤,−c,∅) for some c>0. A game which offers that trade and then tests A is a generalized dutch-book. The agent is worse off by c in the case where A comes out false, and worse off by 1+c if A comes out true. So, this can't happen.(Normalization.) Suppose A is a tautology and Pr(A)<1. This is a preference it's willing to pay for, so $(\top, 0, { [A,$1] } ) \prec (\top, 1-n, \emptyset)$ for some n>0. But this implies that the agent can be general-dutch-booked by starting it out with $(\top, 0, { [A,$1] } )$, offering it the trade for (⊤,1−n,∅), and then testing for A (which will certainly come true); the agent could have had $1 rather than $1-n. Similarly, if Pr(A)>1, we can start the agent with (⊤,1+n,∅) for some n>0, and get the agent to trade for (⊤,0,{[A,1]}), ultimately leaving it with only (⊤,1,∅) after testing A. So Pr(A)=1.

(Finite Additivity.) Suppose A∧B=⊥. Since A and B are mutually exclusive, the agent must be indifferent between (A∨B,1) and the pair of contracts [A,1], [B,1]. (Holding that pair of contracts pays out in exactly the same way as holding [A∨B,1], so if they aren't valued equally, then the bookie can charge a small amount for switching from one to the other to make a dutch book.) However, the pair of contracts must be valued as V([A,1])+V([B,1]); otherwise, the amount of money the agent is willing to pay for a direct offer of the pair would be different than what the agent would be willing to pay if offered first [A,1] and then [B,1]. In the case where V([A,1])+V([B,1])>V({[A,1],[B,1]}), the bookie can start the agent out holding both [A,1] and [B,1], pay the agent an amount of money slightly more than V({[A,1],[B,1]}) but less than V([A,1])+V([B,1]) (a quantity of money whose existence is assured by order-density) to give them up, and then charge individually to give them back. In the reverse case, the bookie would pay for their removal individually and then charge to give them back jointly. Either way, we have a dutch book.

(Non-Dogmatism.) Suppose A≠⊥ and Pr(A)=0. The bookie can start the agent with (⊤,0,{[A,1]}) and offer (⊤,0,∅), which the agent is willing to take, and then test A. If A, the agent is worse off by one dollar; otherwise, the outcome is the same as it would have been. So, this is a generalized dutch book. □

So, we've got our probability and utility functions. We just need to do a little more work to connect them to each other.

Lemma 3 (Exchange Rate).Probabilities are exchange rates between money and conditional bets: V([A,m])=mPr(A).Proof.Remember that by a property of money, (A,m,d⊎{[B,x]})⪯(A,m+n,d) if and only if (A,m′,d′⊎{[B,x]})⪯(A,m′+n,d′) for all m′,d′. This implies that V(A,n,d)(c) depends on only A and c. In particular, the value is independent of the number of copies of the contract already in d. Since n copies of a contract [A,m] will have equivalent payouts to one copy of a contract [A,nm], it must have the same value to avoid a generalized dutch book. Since the multi-set of contracts allows the number of copies of a contract to be anything in M, this argument applies for any n∈M. Since M has multiplicative inverses, for any m but 0 we can take n=1/m, proving the result. The result also holds for m=0, since the value of a contract paying zero must be zero. □In what follows, I will abbreviate V((A,0,∅)) as V(A).

Theorem 5 (Expected Utility).If propositions A and B are mutually exclusive and Ch′() has no generalized dutch-books, V(A∨B)Pr(A∨B)=V(A)Pr(A)+V(B)Pr(B).Proof.Suppose V(A∨B)Pr(A∨B)<V(A)Pr(A)+V(B)Pr(B). Then for any m,m′,m′′∈M such that m<V(A∨B), m′>V(A), m′′>V(B) we have mPr(A∨B)<m′Pr(A)+m′′Pr(B). Taking the negative of both sides and applying the exchange rate lemma, we have V([A∨B,−m])>V([A,−m′])+V([B,−m′′]). This means there is a price, p, which the agent is willing to pay in order to trade the two contracts [A,−m] and [B,−m′′] for the contract [A∨B,−m]. Choose m,m′,m′′ so that V(A∨B)−m<p, m′−V(A)<p, and m′′−V(B)<p. Now, an agent who pays p for the trade (starting from (⊤,0,{[A,−m],[B,−m′′]}) and being offered (⊤,−p,{[A∨B,−m]})) is strictly worse off in every outcome after A and B have been tested, as compared with an agent who didn't. This contradicts our assumption that there are no generalized dutch-books.Now suppose V(A∨B)Pr(A∨B)>V(A)Pr(A)+V(B)Pr(B). By a similar application of the lemma, we get V([A∨B,−m])<V([A,−m′])+V([B,−m′′]) and make the agent pay for the trade in the opposite direction. This also contradicts our assumption. So we must have the desired result. □

This completes the proof that an agent immune to generalized dutch books must have preferences which can be represented by (nonstandard) probability and utility functions obeying the law of expected utility. This in itself does not tell us much, because it may be that I can argue all sorts of constraints from the assumption that no generalized dutch book exists -- the condition could even be unsatisfiable. Next we want to see that preferences being representable as probability and utility functions is

sufficient.##Argument for Sufficiency

Theorem 6 (Sufficiency).If Ch() can be represented as taking the highest-V() option according to a (possibly nonstandard) V() on propositions (except ⊥) which obeys the law of expected utility as mediated by a (possibly nonstandard) Pr() function which follows the probability laws, then there exists a Ch′() which is immune to generalized dutch books.Proof.Define a value on prospects, V′((A,m,c))=V(A)+m+∑[B,n]∈cc([B,n])nPr(B), where c([B,n]) gives the multiset count of the contract [B,n] in c. Let Ch′() choose the subset with maximal V′. This definition of V′ obeys a generalized expected value rule, in which the value of a prospect is the expected value of all possible outcomes after applying any combination of test rounds to the prospect. Now, if Ch′() is willing to trade during any choice rounds, it must be for a prospect with higher or equal expected value. This makes a generalized dutch-book impossible. There are two cases to consider, based on the definition of regret. Either the agent has traded for a prospect whose proposition is the same as the starting proposition, or the agent has traded for one which is different. In the first case, the probabilities of different test outcomes all remain the same. It is not possible that each outcome is the same or worse as a result of the trade, with at least one being worse; the expected utility would have been lower, and no trade would have been made. In the second case, the probabilities may change. However, due to non-dogmatism, all possibilities still have positive probability. It is therefore not possible that every outcome after trade is the same as or worse than every possible outcome of the prospect before trade, with at least one of the new worse than every old; this would assure a lower expected utility, preventing trade. □Now, it follows from results in the previous section that a Ch′() which is immune to generalized dutch-books implies a Pr() and V() following the law of expected utility, which represent the original preferences Ch(). So, combined with theorem 6, we've got necessary and sufficient conditions for such a representation.

##Discussion

I still feel uncomfortable with certain aspects of this, especially the assumptions which I had to edit as I went to make the arguments go through. The most glaring case is the long list of assumptions about money. Other bits I feel uneasy about are the M-valued multisets of contracts and the definition of regret. It's not a flaw in the formal argument that worries me, but rather, the question of how much contrivance we can throw into the setup. With a few different choices in definitions, would I get a significantly different decision theory?

The overall structure of the argument is that we propose a set of thought experiments to an agent whose choice function (on non-thought-experiments) is Ch(). The agent is asked to construct a function Ch′() representing what it would do under the conditions of the thought experiments. It can construct any Ch′() it wants, so long as it obeys some assumptions we make as part of the thought experiment. However, if it gives us a Ch′() which violates some rule of classical decision theory, we give it back a generalized dutch-book illustrating a case where Ch′() seems "inconsistent": sub-optimal by its own standards.

From these thought experiments, the agent is not only supposed to arrive at a "consistent" Ch′(); it is supposed to be motivated to revise Ch() if necessary to do the job. Presumably, this is because "inconsistencies" in the thought experiments are supposed to indicate real inconsistencies in Ch() (where the system of preferences itself judges some other system to be better, in some sense). Of course, this isn't directly true.

Perhaps a better way of thinking about it is not that we're trying to convince an agent to change its own preferences, but that we're trying to convince an agent's designer to give it coherent preferences in the first place. Still, it's unclear why the designer should pay special attention to this class of hypothetical scenarios.

I did the whole argument this way to avoid "structural" assumptions, where the agent's beliefs are assumed to already contain things like fair coins in order to define probability, and so on. In the end, though, an agent may only find these hypotheticals convincing to the extent that they resemble the structure of situations it actually believes in.

Perhaps arguments based on money can be seen as making transitivity and other decision-theoretic properties "contagious": if there is something you care about approximately independently of everything else, which has properties of money, then you can argue for clean decision-theoretic properties for everything else based on that one thing.

However, some aspects seem strange even with this kind of thinking. One of the basic aspects of money-pump arguments, which I've faithfully replicated here, is that the agent apparently must "forget" previous choices, and decide only on the basis of the current choice in front of it. I think there is a strong case to be made that by requiring a choice to be made only as a function of the prospects on offer in a given round, we're not letting the agent fully understand the game it is playing. This leads to objections such as "you can refuse further transactions when you notice a money-pump" (for a discussion, see

Money Pumps, Diachronic and Synchronic, Yair Levy).In any case, these are questions which I hope to answer more deeply in the next stage. Now that I have a version of classical DT justified exclusively by generalized dutch-book, I'll be thinking about how to adapt this kind of thing to logical uncertainty and perhaps other MIRI problems. I won't promise another post in the series; it might not go anywhere, or I might decide that the expected value is low and I should do other things for a while. Hopefully this post has some value.