Utility Indifference

Ruby
Alex_Altair (+115/-62)
Alex_Altair (+117/-4)
Alex_Altair (+746) Created page with "'''Utility indifference''' is a type of defense against an AI becoming [[Unfriendly AI|unfriendly]]. In creating an [[AGI]], a [[utility function]] is explicitly or implicitly c..."

In creating an AGI, a utility function is explicitly or implicitly chosen. Imagine we wanted to set up a safeguard against the AGI acting against our intentions, such as surroundingintentions. Perhaps we surround the computer with explosives.explosives, so that we may destroy the AGI if it misbehaves. A sufficiently advanced AGI will realize this, and will quickly act to disarm it.the explosives. One way to prevent this would be to design its utility function so that it was indifferent to the explosives going off. That is, in any situation, the utility of the explosives going off would be equal to the utility if they did not.

Researcher Stuart Armstrong, of the Future of Humanity Institute, has writtenpublished mathematical models of this idea.

In creating an AGI, a utility function is explicitly or implicitly chosen. Imagine we wanted to set up a safeguard against the AGI acting against our intentions, such as surrounding the computer with explosives. A sufficiently advanced AGI will realize this, and will quickly act to disarm it. One way to prevent this would be to design its utility function so that it werewas indifferent to the explosives going off. That is, in any situation, the utility of the explosives going off would be equal to the utility if they did not.

Utility indifference is a type of defense against an AI becoming unfriendly.

In creating an AGI, a utility function is explicitly or implicitly chosen. Imagine we wanted to set up a safeguard against the AGI acting against our intentions, such as surrounding the computer with explosives. A sufficiently advanced AGI will realize this, and will quickly act to disarm it. One way to prevent this would be to design its utility function so that it were indifferent to the explosives going off.

Researcher Stuart Armstrong, of the Future of Humanity Institute, has written mathematical models of this idea.

Blog Posts

External Links