x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
METR (org) — LessWrong
METR (org)
Edited by
Ruby
last updated
1st Jul 2024
Formerly ARC Evals
Subscribe
Discussion
0
Subscribe
Discussion
0
Posts tagged
METR (org)
Most Relevant
2
100
METR's Observations of Reward Hacking in Recent Frontier Models
Daniel Kokotajlo
8mo
9
2
97
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
habryka
7mo
43
2
21
AXRP Episode 47 - David Rein on METR Time Horizons
Ω
DanielFilan
1mo
Ω
0
2
10
Review of METR’s public evaluation protocol
nahoj
,
JaimeRV
2y
0
1
242
METR: Measuring AI Ability to Complete Long Tasks
Ω
Zach Stein-Perlman
10mo
Ω
106
1
153
ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Ω
Beth Barnes
3y
Ω
12
1
145
METR's Evaluation of GPT-5
Ω
GradientDissenter
6mo
Ω
15
1
108
Clarifying METR's Auditing Role
Ω
Beth Barnes
2y
Ω
1
1
90
Introducing METR's Autonomy Evaluation Resources
Megan Kinniment
,
Beth Barnes
2y
0
1
70
Interpreting the METR Time Horizons Post
Ω
snewman
9mo
Ω
13
1
67
Reactions to METR task length paper are insane
Cole Wyeth
10mo
43
1
65
METR is hiring!
Beth Barnes
2y
1
1
64
CoT May Be Highly Informative Despite “Unfaithfulness” [METR]
Ω
GradientDissenter
6mo
Ω
3
1
40
ARC Evals: Responsible Scaling Policies
Zach Stein-Perlman
2y
10
1
40
Is METR Underestimating LLM Time Horizons?
andreasrobinson
15d
6