LESSWRONG
LW

999
Wikitags

METR (org)

Edited by Ruby last updated 1st Jul 2024

Formerly ARC Evals

Subscribe
Discussion
Subscribe
Discussion
Posts tagged METR (org)
99METR's Observations of Reward Hacking in Recent Frontier Models
Daniel Kokotajlo
5mo
9
97Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
habryka
4mo
43
10Review of METR’s public evaluation protocol
nahoj, JaimeRV
1y
0
242METR: Measuring AI Ability to Complete Long Tasks
Ω
Zach Stein-Perlman
7mo
Ω
106
153ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Ω
Beth Barnes
2y
Ω
12
141METR's Evaluation of GPT-5
Ω
GradientDissenter
3mo
Ω
15
108Clarifying METR's Auditing Role
Ω
Beth Barnes
1y
Ω
1
90Introducing METR's Autonomy Evaluation Resources
Megan Kinniment, Beth Barnes
2y
0
70Interpreting the METR Time Horizons Post
Ω
snewman
6mo
Ω
12
65METR is hiring!
Beth Barnes
2y
1
64CoT May Be Highly Informative Despite “Unfaithfulness” [METR]
Ω
GradientDissenter
3mo
Ω
3
59Reactions to METR task length paper are insane
Cole Wyeth
7mo
43
40ARC Evals: Responsible Scaling Policies
Zach Stein-Perlman
2y
10
26Improved visualizations of METR Time Horizons paper.
LDJ
7mo
4
20How far along Metr's law can AI start automating or helping with alignment research?
Q
Christopher King
7mo
Q
21
Load More (15/19)
Add Posts