Is METR Underestimating LLM Time Horizons?
TL;DR * Using METR human-baseline data, I define an alternate LLM time-horizon measure, i.e. the longest time horizon over which an LLM exceeds human baseline reliability (or equivalently the intersection point of the human and LLM logistic curves), and this measure shows a much faster growth-trend than METR's fixed-threshold trends:...
Jan 1840