Reflective Decision Theory

Reflective decision theory is a term occasionally used to refer to a decision theory that would allow an agent to take actions in a way that they do not trigger regret. This regret is conceptualized, according to the Causal Decision Theory, as a Reflective inconsistency, a divergence between the agent who took the action and the same agent reflecting upon it after.

When considering though experiments such as Newcomb’s Problem, it has been suggested that a sufficiently powerful AGI would be able to access its own source code and self-modify. This would allow for the AGI to alter its own behavior and decision process, beating the paradox through the definition of a precommitment to a certain choice. In order for us to understand the AGI's behavior in this and other situations and to be able to implement it, we will have to create a reflectively consistent decision theory. Particularly, reflective consistency would be needed to ensure that an AGI preserved a friendly value system throughout its self-modifications.

Eliezer Yudkowsky's has proposed theoretical solution to the problem in his Timeless Decision Theory.

LESSWRONGTags
LW

Reflective Decision Theory

Further Reading & References

See also