LESSWRONG
LW

UK AISI Alignment Team: Debate Sequence

The UK AI Security Institute's Alignment Team focuses on research relevant to reducing risks to safety and security from AI systems which are autonomously pursuing a course of action which could lead to egregious harm, and which are not under human control.

This sequence will examine our initial focus: using scalable oversight to train honest AI systems, using a combination of theory about training equilibria and empirical evidence about the results of training.