Nov 10, 2018
This sequence describes iterated amplification, a possible strategy for building an AI that is actually trying to do what we want out of ML systems trained by gradient descent.
Iterated amplification is not intended to be a silver bullet that resolves all of the possible problems with AI; it’s an approach to the particular alignment problem posed by scaled-up versions of modern ML systems.
Iterated amplification is based on a few key hopes
If all of these hopes panned out, then at every point in training “a team of the smartest agents we’ve been able to train so far” would be a suitable overseer for training a slightly smarter aligned successor. This could let us train very intelligent agents while preserving alignment (starting the induction from an aligned human).
Iterated amplification is still in an preliminary state and is best understood as a research program rather than a worked out solution. Nevertheless, I think it is the most concrete existing framework for aligning powerful ML with human interests.
The purpose of this sequence is to communicate the basic intuitions motivating iterated amplification, to define iterated amplification, and to present some of the important open questions.
I expect this sequence to be most useful for readers who would like to have a somewhat detailed understanding of iterated amplification, and are looking for something more structured than ai-alignment.com to help orient themselves.
The sequence is intended to provide enough background to follow most public discussion about iterated amplification, and to be useful for building intuition and informing research about AI alignment even if you never think about amplification again.
The sequence will be easier to understand if you have a working understanding of ML, statistics, and online learning, and if you are familiar with other work on AI alignment. But it would be reasonable to just dive in and just skip over any detailed discussion that seems to depend on missing prerequisites.
The sequence is not intended to be building towards a big reveal---after the first section, each post should stand on its own as addressing a basic question raised by the preceding posts. If the first section seems uninteresting you may want to skip it; if future sections seem uninteresting then it’s probably not going to get any better.
Some readers might prefer starting with the third section, while being prepared to jump back if it’s not clear what’s going on or why. (It would still make sense to return to the first two sections after reading the third.)
If you already understand iterated amplification you might be interested in jumping around the fourth and fifth sections to look at details you haven’t considered before.
The posts in this sequence link liberally to each other (not always in order) and to outside posts. The sequence is designed to make sense when read in order without reading other posts, following links only if you are interested in more details.
Tomorrow's AI Alignment Forum sequences post will be 'Future directions for ambitious value learning' by Rohin Shah, in the sequence 'Value Learning'.
The next post in this sequence will come out on Tuesday 13th November, and will be 'The Steering Problem' by Paul Christiano.