The Slowdown Ending of the AI-2027 forecast has mankind fully solve alignment on CoT-based AIs. However, it requires both optimistic technical alignment assumptions and transparent AIs which reach the level of a superhuman AI researcher.

OpenBrain's alignment strategy from the Slowdown Ending

Throughout the process, most of the intellectual labor (and all of the coding) is being done by AIs. That’s how they are able to progress so quickly; it would take many years for a group of hundreds of top human researchers to do this alone. The humans are still an important part of the process, however, because the whole point is that they don’t fully trust the AIs. So they need flesh-and-blood humans to read the experiment reports and safety cases, argue with each other, and hopefully figure out a safe path forward.

Step 1: Train and deploy Safer-1, a misaligned but controlled autonomous researcher. It’s controlled because it’s transparent to human overseers: it uses English chains of thought (CoT) to think, and faithful CoT techniques have been employed to eliminate euphemisms, steganography, and subtle biases.

Step 2: Try out different training environments for Safer-1, and carefully read the CoT to determine the ways in which the goals and principles in the Spec did or didn’t “stick.”

Step 3: Train and deploy Safer-2, an aligned and controlled autonomous researcher based on the same architecture but with a better training environment that incentivizes the right goals and principles this time.

Here is a brief incomplete list of techniques that might be incorporated into the better training environment:

Limit situational awareness during some portions of training, to make alignment-faking much less likely.
Leverage debate and other scalable oversight schemes for more reliably incentivizing truth.
Relaxed adversarial training, red-teaming, and model organisms.
Spend a higher fraction of the compute budget on alignment training (e.g. the sorts of things described above), e.g. 40% instead of 1%.

Step 4: Design, train, and deploy Safer-3, a much smarter autonomous researcher which uses a more advanced architecture similar to the old Agent-4. It’s no longer transparent to human overseers, but it’s transparent to Safer-2. So it should be possible to figure out how to make it both aligned and controlled.

Step 5: Repeat Step 4 ad infinitum, creating a chain of ever-more-powerful, ever-more-aligned AIs that are overseen by the previous links in the chain (e.g. the analogues of Agent-5 from the other scenario branch).

Alas, current evidence seem to imply that powerful AIs will not be CoT-based.

Analysis of current evidence

Details on models creation

In order to find out the origin of GPT-5, I'll use a method similar to Kokotajlo's take at making sense of OpenAI's models. GPT-5 claims to have its knowledge last updated in June 2024. In addition, GPT-5's API pricing is $1.25/M input tokens and $10/M output tokens, resembling that of GPT-4.1. GPT-5-pro's API pricing is $15.00/M input tokens and $120/M output tokens.^[1] Taken together, this implies that GPT-5 is based on GPT-4.1, presumably by using capabilities elicited from GPT-4.5 via RL.

Similarly, Claude Sonnet 4.5's API prices are $3/M input tokens and $15/M output tokens, resembling those of GPT-5. The API prices for C. Opus 4.1 are five times bigger, as in access to GPT-4.5 compared with GPT 4.1, implying that C.Sonnet 4.5 was obtained by techniques similar to GPT-5 from an amplified Opus.

In addition, the AI-2027 scenario had OpenBrain train Agent-0 on 1E27 FLOP and release Agent-0 in mid-2025. EpochAI compiled a list of most models created by using >1E25 FLOP.

Company name	Base model	Compute spent, FLOP
OpenAI	GPT 4.5	2E26
OpenAI	GPT 4.1	>1E25
Anthropic	Claude Opus 4	>1E26
Anthropic	C. Sonnet 4	>1E25, likely^[2] 3E25
xAI	Grok 3	3.5E26
xAI	Grok 4	1.5E26 or 3.5E26^[3] spent on RL of Grok 3
Meta	Llama 4 Behemoth	5E25
Google DeepMind	Gemini 2.5 Pro	>1E25

The model which best fits the description of Agent-0 is GPT-5 by creator and capabilities and Grok 4 by compute spent. On the other hand, OpenAI might have created GPT-4.5-reasoning-unreleased and used it internally, e.g. for distilling skills into GPT-5.

Apr-Aug 2025: METR Slowdown

The AI-2027 forecast rested on the assumption that superhuman coders will arrive in near future. Those coders were supposed to automate AI research, discover new architectures beyond simple neuralese and to create the superintelligence which could've ended up misaligned or aligned, depending on whether humans succeed in catching the misaligned AIs in the act of sabotaging alignment research.

The arrival of superhuman coders was based on the assumption that AI capabilities would keep growing forever and even become faster than the METR trends of doubling the time horizon every 4 months (the trend from o1-preview to C. 3.7 Sonnet) or 7 months (the longer trend since GPT-2). Instead, the SOTA models on the METR benchmark since the release of the AI-2027 forecast were o3 (Apr 16, 1h 32min), Grok 4 (Jul 9, 1h 50 min), GPT-5 (Aug 7, 2h 17 min). o3 adhered to the forecast, Grok 4 failed at tasks worthy of 2 sec or 2 min and GPT-5 had METR try to exclude spurious failures and reach 2h41m, which aligns with Greenblatt's prediction obtained by doubling o3's horizon and subtracting 15 minutes.

These news alone could imply that the doubling trend has mildly slowed down. But we also saw xAI confess that Grok 4's performance is due to scaling RL to the levels of pre-training, implying that future AI progress is likely to slow down even further once RL scaling laws reach saturation in other companies as well.

More details on the METR benchmark

Suppose that the trend after RL scaling reaches its limits is similar to the GPT2-GPT4 trend before RL scaling began. Recall EpochAI's estimates of compute spent on training various base models and the METR 50% time horizons:

Base model	GPT-2	GPT-3	GPT-3.5	GPT-4	GPT-4o	GPT-4.5
Compute spent on model	1.9E21	3.1E23	2.6E24	2.1E25	3.8E25	2E26
Time horizon	2 sec	9 sec	36 sec	5 min	9 min	30 min

Consider also the fact that models of different size distilled from the same source^[4] display similar time horizons:

Model pair

o3/o4-mini

C. Opus 4/

Sonnet 4

GPT-4.5-reasoning-theoretical/

GPT-4.1-reasoning 2.0

Bigger model's horizon

92 min

80 min

N/A (195 min or less?)

Smaller model's horizon

78 min

68 min

165 min or less

This allows us to predict that GPT-4.5-reasoning-unreleased would have a 195 min^[5] time horizon or less, or at most 6.5 times higher than GPT-4.5. Since xAI has already scaled RL to the level of pretraining and reached 110 minutes, I forecast that OpenAI has exhausted about two thirds of potential improvements in , which in turn means that GPT-4.5 could, in principle, be improved 2-3 more times, and the time horizon of a RLed LLM is at most 20 times higher than that of an unreasoning LLM.

Since the average 80% time horizon is about five times less than the 50% time horizon, while the timelines forecast requires SCs to have the 80% time horizon of at least 1 month, or 160 hrs, the 50% time horizon of the SC would be 800 hrs. Which is likely to require a base LLM with a 40hr time horizon. While the central point is 6.6E29 FLOP spent^[6] on pretraining, the 95%CI is [1.8E27, 2.4E32] FLOP.

Suppose that the central point is necessary for the SC to arrive. A training run of 5E28 FLOP would likely require 1400T tokens, and a run of 6.6E29 FLOP would need ~5000T tokens, which also is infeasible since the total amount of Facebook posts likely contains just 140 trillion tokens and other non-private sources are about as scarse. This likely reduces efficient pretraining runs to 2E27 FLOP.

Jul-Oct 2025: ARC-AGI scaling laws?

The ARC-AGI benchmark could have also reached scaling laws. The SOTA LLMs' Pareto frontier at ARC-AGI-1 is split into two parts.^[7]

The low-cost part forms a nearly straight line consisting of GPT-5-nano (minimal), the Chinese model Qwen3-235b-a22b Instruct (25/07), the three GPT-5-mini data points, the system ARChitects designed only for the benchmark, GPT-5 (Medium), o4-mini(High), GPT-5(High).
The high-cost part also resembles a straight line: GPT-5(High), Grok 4, GPT-5(Pro) and o3-preview (Low).

Moreover, the data points related to o4-mini, o3, GPT-5 also resemble a straight line which ended up containing the data points^[8] of C. Sonnet 4.5 (Thinking 8K) and C. Sonnet 4.5 (Thinking 32K). As I detailed above, o4-mini, o3, GPT-5 and, presumably, C. Sonnet 4.5 and Grok 4 are built upon base models of similar size, explaining the similarity of results.

The SOTA Pareto frontier of ARC-AGI-2 is far from resembling a straight line, but the high-cost regime has the same effect observed in C. Sonnet 4.5 (Thinking 32K), Grok 4, GPT-5 Pro.

Another reason not to expect a CoT-based SC

Recall that I forecasted that GPT-4.5 has a potential to increase its METR time horizon 2-3 times more than the estimate of 195 min or less based on distillation laws. But the ARC-AGI scaling laws could imply that there isn't actually a way to increase the time horizon of GPT-4.5 from the estimate of 195 min or less an extra 2-3 times by extra RL. If that happens, then the 50% time horizon of the base model necessary for the RLed model to become an SC increases to at least 120 hrs, increasing the smallest possible pretraining run of an SC to 6E30 FLOP and the total amount of compute spent on pretraining and RL necessary to 1.2E31 FLOP. Similarly, the training data requirements would reach 1.4E16 tokens, which is also extremely unlikely.

Modifying the AI-2027 scenario

This analysis implies that CoT-based AIs are unlikely to reach the level of superhuman coders. Therefore, superhuman coders will either be neuralese or have an alternate architecture which generates more tokens per forward pass and/or has a bigger attention span, but allows mankind to keep track of the model's thoughts.

If the first superhuman coder has a neuralese architecture, but mankind does re-assess, then the Slowdown Ending will require OpenBrain to also create a new architecture for Safer-1 and Safer-2.

However, we don't even know the alternate architecture, let alone its scaling laws which might prevent even AIs based on this architecture from becoming superhuman AI researchers.

^{^}
Which is exactly 12 times higher than that of GPT-5, potentially implying that GPT-5-pro is GPT-5@10, but that might be a coincidence.
^{^}
Estimate borrowed from Claude 3.7 Sonnet
^{^}
While EpochAI's estimate has Grok 4 created by using 5E26 FLOP, xAI's claims imply that the amounts of compute spent are equal or similar.
^{^}
As of writing the post, I wasn't aware that Claude Haiku 4.5 exists. Since C. Sonnet 4.5 has a time horizon of 113 minutes, this method implies that C. Haiku 4.5 will have a horizon of 96 minutes.
^{^}
Assuming that GPT-5 has the 165 min time horizon. If GPT-5's time horizon is 137 min, then GPT-4.5-theoretical has a time horizon of 161 min.
^{^}
And a similar amount spent on training.
^{^}
Here I do not consider agentic systems like Pang and Berman's approaches, which brought the ARC-AGI-1 and ARC-AGI-2 benchmarks closer to saturation.
^{^}
After the post was published, I learned that C. Haiku 4.5 was tested on the ARC-AGI benchmark and continues the line of o4-mini, o3, GPT-5.

LESSWRONG
LW