Scalable oversight is an approach to AI control ^[1]in which AIs supervise each other. Often groups of weaker AIs supervise a stronger AI, or AIs are set in a zero-sum interaction with each other.

Scalable oversight techniques aim to make it easier for humans to evaluate the outputs of AIs, or to provide a reliable training signal that can not be easily reward-hacked.

Variants include AI Safety via debate, iterated distillation and amplification, and imitative generalization.

^{^}
People used to refer to scalable oversight as a set of AI alignment techniques, but they usually work on the level of incentives to the AIs, and have less to do with architecture.