x
Don't design agents which exploit adversarial inputs — LessWrong