This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Benchmark Study
LW
Login
Benchmark Study
10
Benchmark Study #1: MMLU (Pile, MCQ)
Bruce W. Lee
4mo
0
11
Benchmark Study #2: TruthfulQA (Task, MCQ)
Bruce W. Lee
4mo
2
2
Benchmark Study #3: HellaSwag (Task, MCQ)
Bruce W. Lee
4mo
4
6
Benchmark Study #4: AI2 Reasoning Challenge (Task(s), MCQ)
Bruce W. Lee
4mo
0
6
Benchmark Study #5: Social Intelligence QA (Task, MCQ)
Bruce W. Lee
3mo
0