Simpleton Gambit

The Simpleton Gambit is a thought experiment in which an agent has the opportunity to take an action that will give it an extremely high reward (a larger reward than it would be able to obtain by doing anything else), with the catch that it requires the agent to modify itself to be unable to take any further actions (aside from some default action). A utility-maximizing agent would accept the gambit, as it results in the highest possible utility.
However, if an agent mistakenly believes that it is facing the Simpleton Gambit, and it accepts, it would face a very low payoff as a result of its inability to take actions in the future. Ring and Orseau (2011)¹ discuss how various self-modifying agents would react to the Simpleton Gambit.

References

Ring and Orseau (2011)↩

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

See also

References

Simpleton Gambit

See also

References