LESSWRONG
LW

Winding My Way Through Alignment

May 05, 2022 by David Udell

A sequence of my alignment distillations, written up as I work my way through understanding AI alignment theory.

My rough guiding research algorithm is to focus in on the biggest hazard in my current model of alignment, try to understand and explain that hazard and the proposed solutions to it, and then recurse.

This now-finished sequence is representative of my developing alignment model before I became substantially informationally entangled, in-person, with the Berkeley alignment community. It's what I was able to glean from just reading a lot online.

15HCH and Adversarial Questions
David Udell
3y
7
25Agency and Coherence
David Udell
3y
2
16Deceptive Agents are a Good Way to Do Things
David Udell
3y
0
26But What's Your *New Alignment Insight,* out of a Future-Textbook Paragraph?
David Udell
3y
18
85Gato as the Dawn of Early AGI
David Udell
3y
29
28Intelligence in Commitment Races
David Udell
3y
8
12How Deadly Will Roughly-Human-Level AGI Be?
David Udell
3y
6