LESSWRONGTags
LW

Corrigibility

EditHistory
Discussion (0)
Help improve this page(2 flags)
Corrigibility
Contributors
0Ben Pace

A corrigible agent is one that doesn't interfere with what we would intuitively see as attempts to 'correct' the agent, or 'correct' our mistakes in building it; and permits these 'corrections' despite the apparent instrumentally convergent reasoning saying otherwise.

Posts tagged Corrigibility
Most Relevant
9
40CorrigibilityΩ
paulfchristiano
3y
Ω
4
3
68A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
3y
9
2
121AI Alignment 2018-19 ReviewΩ
rohinmshah
1y
Ω
6
2
63Non-Obstruction: A Simple Concept Motivating CorrigibilityΩ
TurnTrout
7mo
Ω
19
2
50Three mental images from thinking about AGI debate & corrigibilityΩ
Steven Byrnes
10mo
Ω
35
2
48Solving the whole AGI control problem, version 0.0001Ω
Steven Byrnes
2mo
Ω
4
2
41Towards a mechanistic understanding of corrigibilityΩ
evhub
2y
Ω
26
2
36Corrigibility as outside viewΩ
TurnTrout
1y
Ω
11
2
35Can corrigibility be learned safely?Ω
Wei_Dai
3y
Ω
115
2
34Do what we mean vs. do what we sayΩ
rohinmshah
3y
Ω
14
2
28Corrigible but misaligned: a superintelligent messiah
zhukeepa
3y
26
2
26Thoughts on implementing corrigible robust alignmentΩ
Steven Byrnes
2y
Ω
2
2
25The limits of corrigibility
Stuart_Armstrong
3y
9
2
22Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.Ω
RyanCarey
3y
Ω
1
2
13A Critique of Non-ObstructionΩ
Joe_Collman
4mo
Ω
10
Load More (15/35)
Add Posts