Anvil Problem

It has been pointed out by Eliezer Yudkowsky and others that AIXI lacks a self-model: It extrapolates its own actions into the future indefinitely, on the assumption that it will keep working in the same way in the future.

AIXI does not "model itself" to figure out what actions it will take in the future. Implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI's definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed or changed. This is not accurate for real-world implementations which may malfunction, self-modify, be destroyed, be changed, etc.

Though AIXI is an abstraction, any real AI would have a physical embodiment that could be damaged, and an implementation which could be changed or could change its behavior due to bugs. The AIXI formalism completely ignores these possibilities (Yampolskiy & Fox, 2012).

This is called the Anvil problem: AIXI would not care if an anvil was about to drop on its head.

The "Anvil problem" is not a mere detail necessarily left out of a formalized abstraction. Self-analysis and self-modification may be essential parts of any future Friendly AI. First, as it must work to avoid changes in its own goal system, the question of self-modeling cannot be ignored. Our decision theory must be improved to include Reflective decision theory. Second, because human values are not well-understood or formalized, the FAI may need to refine its goal of maximizing human values. "Refining" the goal without changing its essentials is another demanding problem in reflective decision theory.

References

R.V. Yampolskiy, J. Fox (2012) Artificial General Intelligence and the Human Mental Model. In Amnon H. Eden, Johnny Søraker, James H. Moor, Eric Steinhart (Eds.), The Singularity Hypothesis.The Frontiers Collection. London: Springer.

LESSWRONG
LW

LESSWRONG
LW

References

Anvil Problem

References