Motivation

Updateless Decision Theory (UDT) is a decision theory meant to deal with a fundamental problem in the existing decision theories: dynamic inconsistency, IE, having conflicting desires over time. In behavioral economics, humans are often modeled as hyperbolic discounters, meaning that rewards further away in time are seen as proportionately less important (so getting $200 one week from now is as good as $100 two weeks from now). This is dynamically inconsistent because the relative value of rewards changes as they get closer or further away in time. (Getting $200 one year from now sounds about the same as getting $100 one year plus one week from now.) This model explains some human behaviors, such as snoozing alarms repeatedly.[1]

The dynamic inconsistency inherent in hyperbolic discounting can be fixed by exponential discounting, amongst other possibilities. However, dynamic inconsistencies can still occur for other reasons. The two most common decision theories today, Causal Decision Theory (CDT) and Evidential Decision Theory (EDT), are both dynamically inconsistent about Counterfactual Mugging: they refuse Omega when faced with the problem, but if asked beforehand, would see the value of agreeing.[2][3]

Getting this issue right is critical in building a safe artificial general intelligence, as such an AI must analyze its own behavior and that of a next generation that it may build. Dynamically inconsistent AI systems have an incentive to engage in self-modification, but such self-modification is inherently risky. ...

(Read More)