Broken Benchmark: MMLU — LessWrong