New paper shows truthfulness & instruction-following don't generalize by default — LessWrong