This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
AI Benchmarking
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
AI Benchmarking
Random Tag
Contributors
Posts tagged
AI Benchmarking
Most Relevant
2
23
Broken Benchmark: MMLU
awg
8mo
5
1
33
Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Arjun Panickssery
,
agg
4mo
0
1
14
MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures
Ω
corey morris
7mo
Ω
2
1
3
LLM Psychometrics: A Speculative Approach to AI Safety
pskl
3mo
4