This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
AI Benchmarking
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
AI Benchmarking
Random Tag
Contributors
Posts tagged
AI Benchmarking
Most Relevant
2
24
Broken Benchmark: MMLU
awg
1y
5
1
33
Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Arjun Panickssery
,
agg
11mo
0
1
26
Improving Model-Written Evals for AI Safety Benchmarking
Ω
Sunishchal Dev
,
Marius Hobbhahn
2mo
Ω
0
1
20
Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents
Ω
Sam F. Brown
,
BasilLabib
,
Codruta (Coco) Lugoj
,
Sai Sasank Y
5mo
Ω
0
1
18
MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures
Ω
corey morris
1y
Ω
2
1
3
Workshop Report: Why current benchmarks approaches are not sufficient for safety?
Tom DAVID
,
Pierre Peigné
14d
1
1
3
LLM Psychometrics: A Speculative Approach to AI Safety
pskl
10mo
4