Updateless Decision Theory

One valuable insight from EDT is reflected in "UDT 1.1" (see the article by McAllister in references), a variant of UDT in which the agent takes into account that some of its algorithm (mapping from observations to actions) may be prespecified and not entirely in its control, so that it has to gather evidence and draw conclusions about part of its own mental makeup. The difference between UDT 1.0 and 1.1 is that UDT 1.1 iterates over policies, whereas UDT 1.0 iterates over actions. 

Both UDT and Timeless Decision Theory (TDT) make decisions on the basis of what you would have pre-committed to. The difference is that UDT asks what you would have pre-committed to without the benefit of any observations you have made about the universe, while TDT asks what you would have pre-committed to givegiven all information you've observed so far. This means that UDT pays in Counterfactual Mugging, while TDT does not.

Updateless Decision Theory (UDT) is a decision theory meant to deal with a fundamental problem in the existing decision theories: the need to treat the agent as a part of the world in which it makes its decisions. In contrast, in the most common decision theory today, Causal Decision Theory (CDT), the deciding agent is not part of the world model--model—its decision is the output of the CDT, but the agent's decision in the world context is "magic": in the moment of deciding, no causal links feed into its chosen action. It acts as though its decision was causeless, as in some dualist free-will theories.

UDT specifies that the optimal agent is the one with the best algorithm--algorithm—the best mapping from observations to actions--actions—across a probability distribution of all world-histories. ("Best" here, as in other decision theories, means one that maximizes a utility/reward function.)

This definition may seem trivial, but in contrast, CDT says that an agent should choose the best *option* at any given moment, based on the effects of that action. As in Judea Pearl's definition of causality, CDT ignores any causal links inbound to the decider, treating this agent as an uncaused cause. The agent is unconcerned about what evidence its decision may provide about the agent's own mental makeup--makeup—evidence which may suggest that the agent will make suboptimal decisions in other cases.

A robust theory of logical uncertainty is essential to a full formalization of UDT.  A UDT agent must calculate probabilities and expected values on the outcome of its possible actions in all possible worlds--worlds—sequences of observations and its own actions. However, it does not know its own actions in all possible worlds. (The whole point is to derive its actions.) On the other hand, it does have some knowledge about its actions, just as you know that you are unlikely to walk straight into a wall the next chance you get. So, the UDT agent models itself as an algorithm, and its probability distribution about what it itself will do is an important input into its maximization calculation.