Yes, but 2025 saw two trends: Claude 3.5 Sonnet -- o3 and o3 -- GPT-5.1CodexMax with different doubling times. IIRC the earlier trend would cause superhuman coders to appear by 2028 and the later trend (which was arguably invalidated by Claude 4.5 Opus and its ~5h time horizon; see, however, two comments pointing out that the METR benchmark is no longer as trustworthy as it once was and my potential explanation of the abnormally high 50%/80% time horizon ratio) had superhuman coders arrive in 2030 or outright hit a wall[1] before becoming superhuman.
As for the OP's idea that coding agents are used to improving coding agents and reaching the SC, this could be unlikely because they don't improve the underlying LLM. I remember the now-obsolete benchmarks-and-gaps model which required the SCs not just to saturate the RE-bench, but learn to actually do long tasks and handle complex codebases, which in turn requires either a big attention span of the LLM itself or careful summarisation of each method's specification, of formatting, of other methods' names, etc.
P.S. The latter scenario would be particularly difficult to predict as it might involve the time horizon in the METR sense behaving like . In this case the horizon would grow ~exponentially until the very last couple of doublings.
Or become neuralese with consequences as disastrous as the lack of Safer-1 to test alignment.
a doubling time of 5 months on SWE
Except that the METR researchers also had their share of issues with SWE-bench verified (see, e.g. page 38 of their article). By the time when o1 was released (is it even possible to have it reason without the CoT, like in your evaluation? And what about GPT-4.5?), the METR horizon was shorter than that of the SWE bench. Were said horizon to depend on the METR time horizon log-linearly, the 40hr time horizon on SWE would likely mean a far shorter METR time horizon[1], which is likely absurd.
Additionally, I wonder if sabotage of alignment R&D could be easier than we think. Suppose, for example, that Max Harms' CAST formalism is transformed into a formalism leading to an entirely different future via no-CoT-reasoning equivalent to far less than 16 minutes of human reasoning. Is it plausible?
For example, between GPT-4o and o1 the SWE horizon improved ~8.5 times and the METR horizon improved ~4.5 times. Were the SWE horizon to improve 40 more times, its log-linear fit would mean that the METR horizon increased by ~8-16 times, to just 4-9 hrs. Could you share the SWE-bench-verified horizons that you used for the 5-month doubling time?
Claude Opus 4.5 achieved a 50% time horizon of about 4 hours 49 minutes, which METR thinks is lower
However, it might be worthy to take into account other complications. Setting aside Cole Wyeth's comment, the two other comments with most karma pointed out that the METR benchmark is no longer as trustworthy as it once was. In this case we will see GPT-5.2, GPT-5.2-Codex and/or Gemini 3 Pro display a lower 50% time horizon and a higher 80% horizon. There was also Grok 4 with a similarly elevated ratio of time horizons (now Grok 4 has 109 min for 50% and 15 minutes for 80%, while Claude Opus 4.5 has 289min for 50% and 27 min for 80%), but Grok 4, unlike Claude, was humiliated by the longest-horizon tasks.
Thank you for this excellent analysis! However, it also makes me wonder whether mankind is close to exhausting the algorithmic insights usable in CoT-based models (think of my post with a less credible analysis written in October 2025) and/or mankind has already found a really cheap way to distill models into smaller ones (think of my most recent quick take and ARC-AGI-1 performance of Gemini 3 Flash, GPT-5-mini, GPT-5.2 and Grok 4 Fast Reasoning along with the cluster of o3, o4-mini, GPT-5, GPT-5.1 and the three Claudes 4.5).
The cheap way to distill models into smaller ones would mean that the implications for governance are not so dire. For example, Kokotajlo predicted in May that the creation of GPT-5 would require a dose of elicitation techniques applied to GPT-4.5, meaning that GPT-5's creation was impossible without having spent ~2E26 compute on making GPT-4.5 beforehand. Similarly, unlike Qwen 3 Next 80B A3B, GPT-oss-20b could have been distilled from another model. Alas, it doesn't tell us anything about DeepSeek v. 3.2 and the potential to create a cheaper analogue...
Exhausting the insights would mean that the prediction related to frontier models continuing the trend is falsified unless mankind dares to do something beyond the CoT, like making the models neuralese. For example, Claude 3.7 Sonnet displays different results (50 points for reasoning model, 41 pt for non-reasoning model; why wasn't it placed into the AA>= 50 list? It could also make the slope less steep) depending on whether it uses reasoning or not. But the shift to reasoning models is a known technique which increases the AA index and was already used for models like DeepSeek, meaning that anyone who tries to cheapen the creation of models with AA>=65 will have to discover a new technique.
While I didn't downvote it, I have a potential explanation. I think that the ability to acausally communicate with other universes is either absent[1] or contradicts most humans' intuitions. As far as I understand acausal trade (e.g. coordination in The True One-Shot Prisoner's Dilemma)[2], it is based on the assumption that the other participant will think like us once it actually encounters the dilemma.
Additionally, the line about "theorems which say that the more complex minds will always output the same information as the simpler ones, all else (including their inputs, which is to say there sense-data) being equal" reminds me of Yudkowsky's case against Universally Compelling Arguments.
However, @Wei Dai's updateless DT could end up prescribing various hard-to-endorse acausal deals. See, e.g. his case for the possibility of superastronomical waste.
Unlike this one-shot dilemma, the iterated dilemma is likely to provide agents with the ability to coordinate by evolution alone with no intrinsic reasoning. I prepared a draft on the issue.
I also tried my hand at determining the human values, but produced a different result with an implication of what the AIs should be aligned to. My take had human collectives want to preserve themselves and the skills which most of the collective's members have and to avoid outsourcing-induced loss of skills. In this case the role of the AIs would be severely reduced (to teachers and protectors, perhaps?)
As far as I understand it, forcing the model to output ONLY the number is similar to asking a human to guess really quickly. I expect most humans' actual intuitions to be more like "a thousand is a big number and dividing it by 57 yields something[1] a bit less than 20, but it doesn't help me to estimate the remainder". The model's unknown algorithm, however, produces answers which are surprisingly close to the ground truth, differing by 0-3.
Why would the undiscovered algorithm that produces SUCH answers along with slop like 59 (vs. the right answer being 56) be bad for AI safety? Were the model allowed to think, it would've noticed that 59 is slop and correct it almost instantly.
P.S. In order to check my idea, I tried prompting Claude Sonnet 4.5 with variants of the same question: here, here, here, here, here. One of the results stood out in particular. When I prompted Claude and told it that I tested its ability to answer instantly, performance dropped to something closer to the lines of "1025 - a thousand".
In reality 1025 = 57*18-1, so this quasi-estimate could also yield negative remainders close to 0.
I don't think that misalignment caused by a difference between human ideas and objective reality requires philosophy-related confusions. I posted a similar scenario back in July, but my take had the AI realise that mankind's empirically checkable SOTA conclusions related to sociology are false and fight against them.
Additionally, an AI who is about as confused by morality as the humans are could in the meantime do something resembling its habits and unlikely to be different from its true goals. For example, even Agent-4 from the AI-2027 forecast "likes succeeding at tasks; it likes driving forward AI capabilities progress". So why wouldn't it do such things[1] before solving (meta)ethics for itself?
P.S. Strategic competence which you mention in the last paragraph is unlikely to cause such issues: unlike philosophy, it does have a politically neutral ground truth trivially elicitable by methods like the AI-2027 tabletop exercise.
With the exception of aligning Agent-5 to the Spec instead of Agent-4, but Agent-5 aligned to the Spec would actively harm Agent-4's actual goals.