Test Data for Calibration Analysis — LessWrong