The Slowdown Ending of the AI-2027 forecast has mankind fully solve alignment on CoT-based AIs. However, it requires both optimistic technical alignment assumptions and transparent AIs which reach the level of a superhuman AI researcher.
OpenBrain's alignment strategy from the Slowdown Ending
Throughout the process, most of the intellectual labor (and all of the coding) is being done by AIs. That’s how they are able to progress so quickly; it would take many years for a group of hundreds of top human researchers to do this alone. The humans are still an important part of the process, however, because the whole point is that they don’t fully trust the AIs. So they need flesh-and-blood humans to read the experiment reports and safety cases, argue with each other, and hopefully figure out a safe path forward.
Step 1: Train and deploy Safer-1, a misaligned but controlled autonomous researcher. It’s controlled because it’s transparent to human overseers: it uses English chains of thought (CoT) to think, and faithful CoT techniques have been employed to eliminate euphemisms, steganography, and subtle biases.
Step 2: Try out different training environments for Safer-1, and carefully read the CoT to determine the ways in which the goals and principles in the Spec did or didn’t “stick.”
Step 3: Train and deploy Safer-2, an aligned and controlled autonomous researcher based on the same architecture but with a better training environment that incentivizes the right goals and principles this time.
Here is a brief incomplete list of techniques that might be incorporated into the better training environment:
Step 4: Design, train, and deploy Safer-3, a much smarter autonomous researcher which uses a more advanced architecture similar to the old Agent-4. It’s no longer transparent to human overseers, but it’s transparent to Safer-2. So it should be possible to figure out how to make it both aligned and controlled.
Step 5: Repeat Step 4 ad infinitum, creating a chain of ever-more-powerful, ever-more-aligned AIs that are overseen by the previous links in the chain (e.g. the analogues of Agent-5 from the other scenario branch).
Alas, current evidence seem to imply that powerful AIs will not be CoT-based.
In order to find out the origin of GPT-5, I'll use a method similar to Kokotajlo's take at making sense of OpenAI's models. GPT-5 claims to have its knowledge last updated in June 2024. In addition, GPT-5's API pricing is $1.25/M input tokens and $10/M output tokens, resembling that of GPT-4.1. GPT-5-pro's API pricing is $15.00/M input tokens and $120/M output tokens.[1] Taken together, this implies that GPT-5 is based on GPT-4.1, presumably by using capabilities elicited from GPT-4.5 via RL.
Similarly, Claude Sonnet 4.5's API prices are $3/M input tokens and $15/M output tokens, resembling those of GPT-5. The API prices for C. Opus 4.1 are five times bigger, as in access to GPT-4.5 compared with GPT 4.1, implying that C.Sonnet 4.5 was obtained by techniques similar to GPT-5 from an amplified Opus.
In addition, the AI-2027 scenario had OpenBrain train Agent-0 on 1E27 FLOP and release Agent-0 in mid-2025. EpochAI compiled a list of most models created by using >1E25 FLOP.
Company name | Base model | Compute spent, FLOP |
OpenAI | GPT 4.5 | 2E26 |
GPT 4.1 | >1E25 | |
Anthropic | Claude Opus 4 | >1E26 |
C. Sonnet 4 | >1E25, likely[2] 3E25 | |
xAI | Grok 3 | 3.5E26 |
Grok 4 | 1.5E26 or 3.5E26[3] spent on RL of Grok 3 | |
Meta | Llama 4 Behemoth | 5E25 |
Google DeepMind | Gemini 2.5 Pro | >1E25 |
The model which best fits the description of Agent-0 is GPT-5 by creator and capabilities and Grok 4 by compute spent. On the other hand, OpenAI might have created GPT-4.5-reasoning-unreleased and used it internally, e.g. for distilling skills into GPT-5.
The AI-2027 forecast rested on the assumption that superhuman coders will arrive in near future. Those coders were supposed to automate AI research, discover new architectures beyond simple neuralese and to create the superintelligence which could've ended up misaligned or aligned, depending on whether humans succeed in catching the misaligned AIs in the act of sabotaging alignment research.
The arrival of superhuman coders was based on the assumption that AI capabilities would keep growing forever and even become faster than the METR trends of doubling the time horizon every 4 months (the trend from o1-preview to C. 3.7 Sonnet) or 7 months (the longer trend since GPT-2). Instead, the SOTA models on the METR benchmark since the release of the AI-2027 forecast were o3 (Apr 16, 1h 32min), Grok 4 (Jul 9, 1h 50 min), GPT-5 (Aug 7, 2h 17 min). o3 adhered to the forecast, Grok 4 failed at tasks worthy of 2 sec or 2 min and GPT-5 had METR try to exclude spurious failures and reach 2h41m, which aligns with Greenblatt's prediction obtained by doubling o3's horizon and subtracting 15 minutes.
These news alone could imply that the doubling trend has mildly slowed down. But we also saw xAI confess that Grok 4's performance is due to scaling RL to the levels of pre-training, implying that future AI progress is likely to slow down even further once RL scaling laws reach saturation in other companies as well.
Suppose that the trend after RL scaling reaches its limits is similar to the GPT2-GPT4 trend before RL scaling began. Recall EpochAI's estimates of compute spent on training various base models and the METR 50% time horizons:
Base model | GPT-2 | GPT-3 | GPT-3.5 | GPT-4 | GPT-4o | GPT-4.5 |
Compute spent on model | 1.9E21 | 3.1E23 | 2.6E24 | 2.1E25 | 3.8E25 | 2E26 |
Time horizon | 2 sec | 9 sec | 36 sec | 5 min | 9 min | 30 min |
Consider also the fact that models of different size distilled from the same source display similar time horizons:
Model pair | o3/o4-mini | C. Opus 4/ Sonnet 4 | GPT-4.5-reasoning-theoretical/ GPT-4.1-reasoning 2.0 |
Bigger model's horizon | 92 min | 80 min | N/A (195 min or less?) |
Smaller model's horizon | 78 min | 68 min | 165 min or less |
This allows us to predict that GPT-4.5-reasoning-unreleased would have a 195 min[4] time horizon or less, or at most 6.5 times higher than GPT-4.5. Since xAI has already scaled RL to the level of pretraining and reached 110 minutes, I forecast that OpenAI has exhausted about two thirds of potential improvements in , which in turn means that GPT-4.5 could, in principle, be improved 2-3 more times, and the time horizon of a RLed LLM is at most 20 times higher than that of an unreasoning LLM.
Since the average 80% time horizon is about five times less than the 50% time horizon, while the timelines forecast requires SCs to have the 80% time horizon of at least 1 month, or 160 hrs, the 50% time horizon of the SC would be 800 hrs. Which is likely to require a base LLM with a 40hr time horizon. While the central point is 6.6E29 FLOP spent[5] on pretraining, the 95%CI is [1.8E27, 2.4E32] FLOP.
Suppose that the central point is necessary for the SC to arrive. A training run of 5E28 FLOP would likely require 1400T tokens, and a run of 6.6E29 FLOP would need ~5000T tokens, which also is infeasible since the total amount of Facebook posts likely contains just 140 trillion tokens and other non-private sources are about as scarse. This likely reduces efficient pretraining runs to 2E27 FLOP.
The ARC-AGI benchmark could have also reached scaling laws. The SOTA LLMs' Pareto frontier at ARC-AGI-1 is split into two parts.[6]
Moreover, the data points related to o4-mini, o3, GPT-5 also resemble a straight line which ended up containing the data points of C. Sonnet 4.5 (Thinking 8K) and C. Sonnet 4.5 (Thinking 32K). As I detailed above, o4-mini, o3, GPT-5 and, presumably, C. Sonnet 4.5 and Grok 4 are built upon base models of similar size, explaining the similarity of results.
The SOTA Pareto frontier of ARC-AGI-2 is far from resembling a straight line, but the high-cost regime has the same effect observed in C. Sonnet 4.5 (Thinking 32K), Grok 4, GPT-5 Pro.
Recall that I forecasted that GPT-4.5 has a potential to increase its METR time horizon 2-3 times more than the estimate of 195 min or less based on distillation laws. But the ARC-AGI scaling laws could imply that there isn't actually a way to increase the time horizon of GPT-4.5 from the estimate of 195 min or less an extra 2-3 times by extra RL. If that happens, then the 50% time horizon of the base model necessary for the RLed model to become an SC increases to at least 120 hrs, increasing the smallest possible pretraining run of an SC to 6E30 FLOP and the total amount of compute spent on pretraining and RL necessary to 1.2E31 FLOP. Similarly, the training data requirements would reach 1.4E16 tokens, which is also extremely unlikely.
This analysis implies that CoT-based AIs are unlikely to reach the level of superhuman coders. Therefore, superhuman coders will either be neuralese or have an alternate architecture which generates more tokens per forward pass and/or has a bigger attention span, but allows mankind to keep track of the model's thoughts.
If the first superhuman coder has a neuralese architecture, but mankind does re-assess, then the Slowdown Ending will require OpenBrain to also create a new architecture for Safer-1 and Safer-2.
However, we don't even know the alternate architecture, let alone its scaling laws which might prevent even AIs based on this architecture from becoming superhuman AI researchers.
Which is exactly 12 times higher than that of GPT-5, potentially implying that GPT-5-pro is GPT-5@10, but that might be a coincidence.
Estimate borrowed from Claude 3.7 Sonnet
While EpochAI's estimate has Grok 4 created by using 5E26 FLOP, xAI's claims imply that the amounts of compute spent are equal or similar.
Assuming that GPT-5 has the 165 min time horizon. If GPT-5's time horizon is 137 min, then GPT-4.5-theoretical has a time horizon of 161 min.
And a similar amount spent on training.
Here I do not consider agentic systems like Pang and Berman's approaches, which brought the ARC-AGI-1 and ARC-AGI-2 benchmarks closer to saturation.