Regarding the 3x/1.25x copy/speedup multipliers, do you have any kind of back-of-the-envelop justification for why those are plausible, e.g. based on the trend of algorithmic-progress, such that you might expect about this much inference-efficiency gain paralleling the assumed 10x (or so) training-efficiency gain?
I did not claim that Claude is a "supercoder" or even human-level at coding; rather, the Claude addendum continued with: "to be clear, we shouldn't over-interpret this specific 444 billion figure" and "Realistically, this highlights that to really make accurate projections of the time to catch up with human horizons based on METR data, we need better human baselines." In my view, the natural takeaway is that that Claude has now basically caught up with METR's existing human baselines, which they have acknowledged were not that well incentivized, which does not mean that it is better than properly incentived software engineers.
However, per the sensitivity analysis, if we assume well incentivized humans could do ~2x better than METR's baselines on METR's longest benchmark tasks, "then Claude 4.5 Opus has an intersection-based time horizon of only 35.9 minutes", ie far from human-level. So as I said in the post, I do think this highlights the need for better human baselines for METR, but while the current horizon estimates are quite sensitive to the baselines, the estimated time to human-level doesn't actually shift that much with this stronger baseline, i.e. from early 2026 to late 2026.
In general, the primary point of the post wasn't that the current baselines are good enough to make an accurate prediction of human-level horizons using METR data, but rather "my main takeaway from this analysis is probably that we shouldn't over-interpret the METR trends at fixed reliability as a direct marker of progress towards human-level software horizons" (because the METR metrics are likely underestimating the progress rate, due to using fixed reliabilities at all horizons)
I provided both theoretical and statistical arguments (e.g. AIC) in the post for why the human-relative time horizon trend is likely hyperbolic rather than exponential, and your comment does not address or acknowledge either of those arguments. Note the post does argue that METR's metrics likely are exponential, so the hyperbolic claim is specifically about human-relative time horizon metrics (per the proposal in the post).