Outer Alignment

Applied to Shutdown-Seeking AI by Simon Goldstein 16d ago

Outer alignment asks the question - "What should we aim our model at?" In other words, is the model optimizing for the correct reward such that there are no exploitable loopholes? It is also known as the reward misspecification problem. This is because if we are unable to tell the AI what we want correctly then we have not been able to specify our reward.

To solve the outer alignment problem, some sub-problems that we would have to make progress on include specification gaming, value learning, and reward shaping/modeling. Paul Christiano is a researcher who focuses onSome proposed solutions to outer alignment and who has proposedinclude scalable oversight techniques such as IDA, as potential solutions to this problem.well as adversarial oversight techniques such as debate.