How do I choose the best metric to measure my calibration? — LessWrong