Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I recently posted about an optimality result for modal UDT, which shows that for every modal decision problem , there is a closed modal formula such that the version of modal UDT that searches for proofs in will perform optimally on .

Paul commented on this post and suggested a stronger version: For every modal decision theory and every provably extensional modal decision problem , modal UDT will do at least as well on as does if it is using a proof system that can prove what action chooses on this decision problem, and which outcome it obtains as a result. In this post, I give a detailed proof of this.

Prerequisite: An optimality result for modal UDT, and the prerequisites therein.


Let be the fixed point of and , and recall my notation for the sequence of formulas which has as the 'th entry, and as all of its other entries (with the length of the sequence being clear from context). Now suppose that is a true closed formula in the language of such that that is, proves that chooses action , and achieves the outcome as a result. (By saying that is a "true" formula we mean that its translation to the language of arithmetic is true about the standard natural numbers: .) It's ok to talk about and in the definition of and because fixed points are unique (up to provable equivalence), and can prove that the fixed point is in fact a fixed point.

My claim, then, is that will perform at least as well as on the decision problem ; that is, the outcome it achieves will be ranked .

Intuitively, this is straight-forward. searches through all pairs of outcomes and actions in lexicographical order, until it finds a pair such that it can prove that if it takes action , it will achieve outcome ; as soon as it finds such a pair, it takes action . (This is justified becaues it searches outcomes best-first, so it takes an action that leads to as good an outcome as it's able to prove it can get.) So if it can prove that taking action will lead to outcome , it will either take that action and get that outcome, or there's some pair such that it takes action and obtains outcome . (Remember that outcomes are numbered from best to worst.)

Let's go through the details of showing that it actually works out that way.


Let's write for the fixed point of with . The part in our argument that we need to check carefully is that UDT will in fact stop at the pair if it hasn't already stopped before that; i.e., If this is satisfied, then we're done: We know that there will be some pair such that outputs and such that and hence, since (a) is sound on , (b) , and (c) by assumption, , it follows that , i.e., achieves the outcome . Thus, let's check that does indeed prove .

To do so, we make use of provable extensionality, that is, of the fact that Since as a modal decision theory, is a p.m.e.e. sequence (provably mutually exclusive and exhaustive), proves that implies , i.e., . Hence, together with provable extensionality, we obtain (since by definition of ). But on the other hand, recall our initial assumption that again by provable extensionality, this implies and since by definition of , this simplifies to But together with our earlier result, this implies which is equivalent to as desired.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 9:41 AM

Further simplification (which Benja and Marcello worked out): since every modal formula is decidable conditional on for large enough , you don't need special axioms for each modal decision problem and decision theory, you just need a strong enough consistency axiom. That's a pretty nifty optimality result.

(It requires the decidability result above, which currently is an unpublished folk theorem proved in an alternate draft of the modal combat paper, but we'll see if we can get that included in a nice peer-reviewed citable source.)

Two further notes:

  1. Said folk theorem is in fact shown in Boolos.

  2. Benja verified that the optimality result should work for chicken-playing modal UDT as well as descending-search-order modal UDT.