Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Summary: This is a quick expository recap, with links, of the posts on this forum on the topic of updateless decision theory, through 3/19/15. Read this if you want to learn more about UDT, or if you're curious about what we've been working on lately!

Introduction

Updateless decision theory (henceforth UDT) is a proposed algorithm for making good decisions in a world that can contain predictions and other echoes of the agent's decision algorithm. It is motivated by a set of Newcomblike problems on which the classical decision theories (evidential decision theory and causal decision theory) have known failure modes. For more context and motivation for UDT, see the MIRI research document Toward Idealized Decision Theory.

Philosophically speaking, UDT looks over all possible strategies (maps from "agent observes X" to "agent does Y"), picks the global strategy that would be best for all copies of UDT in aggregate, and then performs the action that the strategy recommends for an agent in its position. (One explanation for choosing strategies rather than actions is that it defends against blackmail: if you are being blackmailed, then paying up appears to be the better action; but since the strategy of refusing to pay deters all savvy blackmailers, UDT knowably refuses to pay up and thus is rarely blackmailed. Or so the theory goes.)

This leads to some interesting ambiguities, most importantly the counterfactuals about "what would happen" if the deterministic UDT algorithm chose strategy B, when in fact it will end up choosing strategy A. These logical counterfactuals can be confronted straightforwardly in mathematical models by trying to formally prove that if UDT selects strategy S, then the universe is in a state of utility U. But then one faces the problem of spurious counterfactuals ("if the agent chose X then Y would happen", which is only logically valid because we can prove that the agent does not choose X), which lead to further complications.

The Gödel-Löb modal logic GL (also called provability logic) allows us to construct a good heuristic model of UDT in a mathematical universe, using Gödel numbering and quining to capture self-reference, and evaluating counterfactuals via unbounded proof searches (i.e. a halting oracle). This "modal UDT" has some interesting properties: for instance, it is provably optimal on decision problems that "only care what it does", if given sufficiently powerful reasoning axioms. Moreover, the tools of Kripke semantics allow the actions of modal UDT and other modally defined agents to be formally verified in polynomial time!

Lest we get carried away, modal UDT is neither a perfect instantiation of the philosophical UDT algorithm, nor is it optimal in problems involving universes that predict the agent's decisions via provability. Even so, it is an incredibly fruitful object for further study in decision theory!

Without further ado, here is an overview of all the posts on this forum thus far that relate to updateless decision theory:

Expository material

New research on modal UDT

  • Using modal fixed points to formalize logical causality, Vladimir Slepnev. I believe this is actually the first time that someone constructed a UDT algorithm in provability logic!

  • "Evil" decision problems in provability logic, Benja Fallenstein. For every modal decision theory, you can set up a fair decision problem (i.e. one that simply maps actions to outcomes rather than depending on the inner details of how the agent deliberates) which punishes the agent, such that other decision theories can get a better outcome. This is an important limitation on any optimality results, but as we shall see, it is not the final word.

  • An optimality result for modal UDT, Benja Fallenstein. For each fixed fair decision problem, there is a set of axioms (in this case, simply the true axioms about the mapping from actions to outcomes) such that modal UDT equipped with those axioms will achieve the best possible outcome.

  • Improving the modal UDT optimality result, Benja Fallenstein. We can use more general axioms than the ones in the previous post, and still achieve the best possible outcome.

  • Obstacle to modal optimality when you're being modalized, Patrick LaVictoire. The optimality result cannot carry over to even simple decision problems where the outcome depends on whether the agent provably takes a certain action.

  • On notation for modal UDT, Benja Fallenstein. This emphasizes that our use of modal agents and decision theories relies on fixed points of modal logic operators, and thus we should be careful about using functional notation.

  • An implementation of modal UDT, Benja Fallenstein. The polynomial-time algorithm for figuring out what a modal UDT agent does in a decision problem, now in a simple Haskell program!

New research on logical counterfactuals

  • Uniqueness of UDT for transparent universes, Tsvi Benson-Tilson. This is a different abstract model of UDT, using what we call "playing chicken with the universe": before reasoning about consequences, take each possible action and attempt to prove you don't take it; if you ever succeed, immediately take that action. This is more useful than it sounds, because it rules out certain spurious counterfactuals.

  • A different angle on UDT, Nate Soares. An exploration of spurious counterfactuals, with an example (where UDT is used to analyze another agent, rather than to take an action itself).

  • Why conditioning on "the agent takes action a" isn't enough, Nate Soares. Continuing the exploration of spurious counterfactuals, with probabilistic agents.

  • Third-person counterfactuals, Benja Fallenstein. There is at least some set of assumptions (a "sufficiently informative" decision problem) for which modal UDT analyzes an algorithm (perhaps itself) adequately and avoids acting on spurious counterfactuals... or does it?

  • The odd counterfactuals of playing chicken, Benja Fallenstein. Nope, modal UDT is perfectly susceptible to spurious counterfactuals when analyzing other algorithms, even if the decision problem is sufficiently informative.

Other new research relevant to UDT

New Comment