LESSWRONGTags
LW

Corrigibility

EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
Corrigibility
Random Tag
Contributors
2Ben Pace

A corrigible agent is one that doesn't interfere with what we would intuitively see as attempts to 'correct' the agent, or 'correct' our mistakes in building it; and permits these 'corrections' despite the apparent instrumentally convergent reasoning saying otherwise.

Posts tagged Corrigibility
Most Relevant
9
50CorrigibilityΩ
paulfchristiano
4y
Ω
6
3
71A Gym Gridworld Environment for the Treacherous TurnΩ
Michaël Trazzi
4y
Ω
9
2
125AI Alignment 2018-19 ReviewΩ
Rohin Shah
2y
Ω
6
2
106Let's See You Write That Corrigibility TagΩ
Eliezer Yudkowsky
10d
Ω
65
2
104A broad basin of attraction around human values?Ω
Wei_Dai
3mo
Ω
16
2
92Reward Is Not EnoughΩ
Steven Byrnes
1y
Ω
18
2
64Non-Obstruction: A Simple Concept Motivating CorrigibilityΩ
TurnTrout
2y
Ω
19
2
62Corrigibility Can Be VNM-IncoherentΩ
TurnTrout
7mo
Ω
24
2
60Consequentialism & corrigibilityΩ
Steven Byrnes
7mo
Ω
27
2
58Solving the whole AGI control problem, version 0.0001Ω
Steven Byrnes
1y
Ω
7
2
55Three mental images from thinking about AGI debate & corrigibilityΩ
Steven Byrnes
2y
Ω
35
2
44Towards a mechanistic understanding of corrigibilityΩ
evhub
3y
Ω
26
2
39Solve Corrigibility WeekΩ
Logan Riggs
7mo
Ω
21
2
36Corrigibility as outside viewΩ
TurnTrout
2y
Ω
11
2
35Can corrigibility be learned safely?Ω
Wei_Dai
4y
Ω
115
Load More (15/54)
Add Posts