I recently gave a two-part talk on the big picture of alignment, as I see it. The talk is not-at-all polished, but contains a lot of stuff for which I don't currently know of any good writeup. Major pieces in part one:
- Some semitechnical intuition-building for high-dimensional problem-spaces.
- Optimization compresses information "by default"
- Resources and "instrumental convergence" without any explicit reference to agents
- A frame for thinking about the alignment problem which only talks about high-dimensional problem-spaces, without reference to AI per se.
- The central challenge is to get enough bits-of-information about human values to narrow down a search-space to solutions compatible with human values.
- Details like whether an AI is a singleton, tool AI, multipolar, oracle, etc are mostly irrelevant.
- Fermi estimate: just how complex are human values?
- Coherence arguments, presented the way I think they should be done.
- Also subagents!
Note that I don't talk about timelines or takeoff scenarios; this talk is just about the technical problem of alignment.
Here's the video for part one:
Big thanks to Rob Miles for editing! Also, the video includes some good questions and discussion from Adam Shimi, Alex Flint, and Rob Miles.