x
Validator models: A simple approach to detecting goodharting — LessWrong