AI Alignment: A Comprehensive Survey
We have just released an academic survey of AI alignment. We identify four main categories of alignment research: 1. Learning from feedback (e.g. scalable oversight) 2. Learning under distribution shift 3. Assurance (e.g. interpretability) 4. Governance We mainly focused on academic references but also included some posts from LessWrong and...