LESSWRONG
LW

1070
Wikitags
Main
2
LW Wiki
1

LW Wiki

Edited by Alex_Altair last updated 27th Jun 2012

Utility indifference is a type of defense against an AI becoming unfriendly.

In creating an AGI, a utility function is explicitly or implicitly chosen. Imagine we set up a safeguard against the AGI acting against our intentions. Perhaps we surround the computer with explosives, so that we may destroy the AGI if it misbehaves. A sufficiently advanced AGI will realize this, and will quickly act to disarm the explosives. One way to prevent this would be to design its utility function so that it was indifferent to the explosives going off. That is, in any situation, the utility of the explosives going off would be equal to the utility if they did not.

Researcher Stuart Armstrong, of the Future of Humanity Institute, has published mathematical models of this idea.

Blog Posts

  • AI indifference through utility manipulation
  • Trapping AIs via utility indifference

External Links

  • Utility Indifference by Stuart Armstrong
Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Utility indifference
53Trading off Lives
jefftk
2y
12
38Satisficers want to become maximisers
Stuart_Armstrong
14y
70
Add Posts