Corrigibility doesn't always have a good action to take

Stuart_Armstrong

19 Corrigibility doesn't always have a good action to take

by Stuart_Armstrong

28th Aug 2018

AI Alignment Forum

1 min read

0

19 Ω 7

In a previous critique of corrigibility, I brought up the example of a corrigible AI-butler that was in a situation where it was forced to determine the human's values through its actions - it had no other option.

Eliezer pointed out that, in his view of corrigibility, there could be situations where the AI had no corrigible actions it could take - where, in effect, all it could do was say "I cannot act in corrigible way here".

This makes corrigibility immune to my criticism in the previous post, while potentially opening the concept up to other criticisms - it's hard to see how a powerful agent, whose actions affect the future in many ways, including inevitably manipulating the human, can remain corrigible AND still do something. But that's a point for a more thorough anaylsis.

Corrigibility

Frontpage

19 Ω 7

Mentioned in

1630. CAST: Corrigibility as Singular Target

21Petrov corrigibility

18Alignment Newsletter #22

New Comment

Moderation Log