Shah and Yudkowsky on alignment failures — LessWrong