Don't design agents which exploit adversarial inputs — LessWrong