2075

LESSWRONG
LW

2074
Corrigibility
Frontpage

19

Corrigibility doesn't always have a good action to take

by Stuart_Armstrong
28th Aug 2018
AI Alignment Forum
1 min read
0

19

Ω 7

Corrigibility
Frontpage

19

Ω 7

New Comment
Moderation Log
More from Stuart_Armstrong
View more
Curated and popular this week
0Comments

In a previous critique of corrigibility, I brought up the example of a corrigible AI-butler that was in a situation where it was forced to determine the human's values through its actions - it had no other option.

Eliezer pointed out that, in his view of corrigibility, there could be situations where the AI had no corrigible actions it could take - where, in effect, all it could do was say "I cannot act in corrigible way here".

This makes corrigibility immune to my criticism in the previous post, while potentially opening the concept up to other criticisms - it's hard to see how a powerful agent, whose actions affect the future in many ways, including inevitably manipulating the human, can remain corrigible AND still do something. But that's a point for a more thorough anaylsis.

Mentioned in
1500. CAST: Corrigibility as Singular Target
20Petrov corrigibility
18Alignment Newsletter #22