LESSWRONG
LW

3947
If I were a well-intentioned AI...

If I were a well-intentioned AI...

Jan 22, 2020 by Stuart_Armstrong

I look at how some of the major problems in AI alignment - Goodhart problems, distributional shift, mesaoptimising, etc.. - look from the perspective of a well-intentioned but ignorant AI. And if this perspective can suggest methods of safety improvements.

35If I were a well-intentioned AI... I: Image classifier
Ω
Stuart_Armstrong
6y
Ω
4
20If I were a well-intentioned AI... II: Acting in a world
Ω
Stuart_Armstrong
6y
Ω
0
22If I were a well-intentioned AI... III: Extremal Goodhart
Ω
Stuart_Armstrong
6y
Ω
0
26If I were a well-intentioned AI... IV: Mesa-optimising
Ω
Stuart_Armstrong
6y
Ω
2