An unaligned benchmark — LessWrong