LESSWRONG
LW

816
Shaping safer goals

Shaping safer goals

Jul 01, 2020 by Richard_Ngo

How can we move the needle on AI safety? In this sequence I think through some approaches that don't rely on precise specifications - instead they involve "shaping" our agents to think in safer ways, and have safer motivations. This is particularly relevant to the prospect of training AGIs in multi-agent (or other open-ended) environments.

Note that all of the techniques I propose here are speculative brainstorming; I'm not confident in any of them as research directions, although I'd be excited to see further exploration along these lines.

31Multi-agent safety
Ω
Richard_Ngo
5y
Ω
8
31Safety via selection for obedience
Ω
Richard_Ngo
5y
Ω
1
38Competitive safety via gradated curricula
Ω
Richard_Ngo
5y
Ω
5
24AGIs as collectives
Ω
Richard_Ngo
5y
Ω
23
26Safer sandboxing via collective separation
Ω
Richard_Ngo
5y
Ω
6
31Emergent modularity and safety
Ω
Richard_Ngo
4y
Ω
15