LESSWRONGAI Safety Subprojects
LW

AI Safety Subprojects

Sep 20, 2021 by Stuart_Armstrong

There are the AI safety subprojects designed for elucidating "model splintering" and "learning the preferences of irrational agents".

32Immobile AI makes a move: anti-wireheading, ontology change, and model splinteringΩ
Stuart_Armstrong
2y
Ω
3
14AI, learn to be conservative, then learn to be less so: reducing side-effects, learning preserved features, and going beyond conservatismΩ
Stuart_Armstrong
2y
Ω
4
30AI learns betrayal and how to avoid itΩ
Stuart_Armstrong
2y
Ω
4
17Force neural nets to use models, then detect theseΩ
Stuart_Armstrong
1y
Ω
8
15Preferences from (real and hypothetical) psychology papersΩ
Stuart_Armstrong
1y
Ω
0
15Finding the multiple ground truths of CoinRun and image classificationΩ
Stuart_Armstrong
1y
Ω
3