x

LESSWRONG

LW

Jonas Moss — LessWrong

Jonas Moss

Jonas Moss

Message

33

1

6

6y

Jonas Moss

33

6y

(Updated) METR's data can't distinguish between trajectories (and 80% horizons are an order of magnitude off)

Update: Added GPT-5.2 to the main part of the text, this uses all data from v1.1. Added appendix using all METR models, by joining v1.0 and v1.1. Added appendix with marginal vs typical P(success) curves. Thanks to Thomas Kwa for telling me about this. TLDR I reanalyzed the METR task...