LESSWRONG
LW

Iterated Amplification

Oct 29, 2018 by paulfchristiano

This is a sequence curated by Paul Christiano on one current approach to alignment: Iterated Amplification.

44Preface to the sequence on iterated amplification
Ω
paulfchristiano
7y
Ω
8
Problem statement

The first part of this sequence clarifies the problem that iterated amplification is trying to solve, which is both narrower and broader than you might expect.

44The Steering Problem
Ω
paulfchristiano
7y
Ω
12
67Clarifying "AI Alignment"
Ω
paulfchristiano
7y
Ω
84
31An unaligned benchmark
Ω
paulfchristiano
7y
Ω
0
48Prosaic AI alignment
Ω
paulfchristiano
7y
Ω
10
Basic intuition

The second part of the sequence outlines the basic intuitions that motivate iterated amplification. I think that these intuitions may be more important than the scheme itself, but they are considerably more informal.

31Approval-directed agents
Ω
paulfchristiano
7y
Ω
10
24Approval-directed bootstrapping
Ω
paulfchristiano
7y
Ω
0
39Humans Consulting HCH
Ω
paulfchristiano
7y
Ω
9
57Corrigibility
Ω
paulfchristiano
7y
Ω
8
The scheme

The core of the sequence is the third section. Benign model-free RL describes iterated amplification, as a general outline into which we can substitute arbitrary algorithms for reward learning, amplification, and robustness. The first four posts all describe variants of this idea from different perspectives, and if you find that one of those descriptions is clearest for you then I recommend focusing on that one and skimming the others.

48Iterated Distillation and Amplification
Ω
Ajeya Cotra
7y
Ω
14
15Benign model-free RL
Ω
paulfchristiano
7y
Ω
1
45Factored Cognition
Ω
stuhlmueller
7y
Ω
6
29Supervising strong learners by amplifying weak experts
Ω
paulfchristiano
7y
Ω
1
33AlphaGo Zero and capability amplification
Ω
paulfchristiano
7y
Ω
23
What needs doing

The fourth part of the sequence describes some of the black boxes in iterated amplification and discusses what we would need to do to fill in those boxes. I think these are some of the most important open questions in AI alignment.

48Directions and desiderata for AI alignment
Ω
paulfchristiano
6y
Ω
1
26The reward engineering problem
Ω
paulfchristiano
6y
Ω
3
24Capability amplification
Ω
paulfchristiano
6y
Ω
8
27Learning with catastrophes
Ω
paulfchristiano
6y
Ω
9
Possible approaches

The fifth section of the sequence breaks down some of these problems further and describes some possible approaches.

30Thoughts on reward engineering
Ω
paulfchristiano
6y
Ω
30
23Techniques for optimizing worst-case performance
Ω
paulfchristiano
6y
Ω
12
24Reliability amplification
Ω
paulfchristiano
6y
Ω
3
21Security amplification
Ω
paulfchristiano
6y
Ω
2
20Meta-execution
Ω
paulfchristiano
7y
Ω
1