Aligning alignment with performance — LessWrong