Understanding the two-head strategy for teaching ML to answer questions honestly — LessWrong