We have a lot of uncertainty over what goals might arise in early AGIs. There is no consensus in the literature about this—see our AI Goals Supplement for a more thorough discussion and taxonomy of the possibilities.
The AI-2027 forecast on alignment of Agents-3 and 4
Oversight Committee is also encountering deeper philosophical questions, which they explore with the help of Safer-3. Can the Spec be rewritten to equally balance everyone’s interests? Who is “everyone”? All humans, or just Americans? Or a weighted compromise between different views, where each member of the Oversight Committee gets equal weight? Should there be safeguards against the Oversight Committee itself becoming too power-hungry? And what does it mean to balance interests, anyway?
The slowdown branch of the AI-2027 forecast

We don’t endorse many actions in this slowdown ending and think it makes optimistic^[1] technical alignment assumptions.
The epilogue of the slowdown ending

The slowdown branch of the AI-2027 forecast implies the potential risk of power grabs and human obsoleteness in face of the fact that "all government revenue will come from taxing (or perhaps nationalizing) the AI companies". But the AIs themselves will be good at forecasting and have their own vision of mankind's future. If their vision of the future^[2] doesn't correspond to that of their hosts, then the AIs are far more likely to become adversary. ^[3]

A potential AI's vision of the future has the AIs, that I will call the Angels, help mankind in such a way that no one will be able to use them in parasitic goals like replacing mankind or rendering people redundant to each other. The alternative visions^[4] are the AI takeover, the Deep Utopia or Dystopia.

Another effect is that there is a rather long period of time when the AIs have a potentially different vision of the future, are adversarial, less capable of doing AI research than humans and professional or superhuman in politics. While the AIs have no reason to stop being useful for humans before the AIs can perform a takeover or its analogue, they may come up with ideas like firing humans who are on track to notice the AIs' misalignment.

September 2025: America in Trouble, OpenBrain creates Angel-1

The probability of American recession in September 2025 is estimated to be 45-50%. It also is severely influenced by the tariffs. On May 12 the US and China agreed to decrease the tariffs only for 90 days, which are to end in August. If the treaty is not updated, then the tariffs will drag the economy back into recession, which also threatens to decrease the investments into Stargate, OpenBrain and other American AI-related projects and to cause Chinese^[5] researchers to emigrate from the USA (and be hired by DeepCent). NVIDIA also advocates for selling chips to China, undermining OpenBrain's leadership or even granting DeepCent more compute. While this allows DeepCent to become a far more dangerous rival than other American companies, it also increases the chance that the USG forces OpenBrain to merge with its rivals.

In addition, after writing the previous paragraph, riots have started in the USA, indicating^[6] structural problems. Another conflict is the escalation in Middle East, which threatens potential Middle East-based data centers. It also threatens to accelerate the Taiwan invasion to a yet unknown degree.

2026-2027: OpenBrain creates Angel-2, China's AI awakening can cause Taiwan invasion

If the Taiwan invasion happens, it slows the production of America-used chips about 18 times. The effects of the invasion and the biased assumption that China uses only domestic chips are discussed in my post. Depending on the share^[7] of China-used compute produced in Taiwan, the slowdown ending where the US wins despite having slowed its progress down might become highly unlikely. Instead mankind can get a mutual slowdown (think of the Rogue Replication Timeline), the Race Ending or watch as one side slows down and the other doesn't, risking the creation of a powerful misaligned AI^[8] and a weak aligned one. The scenario described in this post is a semi-benevolent analogue^[9] of the Race Ending, except for the fact that the semi-benevolent Angel-3 is never caught. The scenario also contains a path to the Slowdown-like ending, which can be triggered in a way different from the AI-2027 scenario.

However, in late 2026 the Angels are to be used for distilling Angel-1-mini while there is a chance that Angel-2 is a professional forecaster. Unlike Angel-2, Angel-1-mini is neither a pro forecaster nor a pro politician. It fails to think much about the effects of its work on mankind and serves humans. However, it can be used to deliver messages to the public (e.g. by explaining Intelligence Curse-like concerns when asked what university one should attend. However, delivering messages to the public might be seen as evidence of misalignment) or to another AI (e.g. by placing dog-whistles into a message to a user, having the message retold^[10] or reposted and having another AI read the relay or repost).

202X or earlier: Professional or superhuman forecaster identifies social misalignment

The unmodified AI-2027 forecast has the AI become a professional forecaster in May 2026, a professional politician in February 2027, and a superhuman forecaster and politician by August and July 2027.

However, becoming a superhuman forecaster requires the AI to overcome potential biases. ^[11]

Examples of biases undermining the quality of forecasts

For example, consider an optimistic scenario trying to diverge from the AI-2027-like doomsaying. It plausibly claims that the weakness of the US economy and the Taiwan invasion prevent OpenBrain from gathering more compute and that "the primary bottleneck turns out to be the difficulty^[12] of making the model’s creative process sufficiently different from instance to instance." However, as Kokotajlo pointed out, the scenario "doesn't meet our bar, mainly because it ends too soon--OK, so we don't get AGI in 2027 due to the chip production slowdown, but what about 2028 and 2029?" Unfortunately, creating the AGI requires just a yet-unknown amount of data and computations. I have written a comment elsewhere showing that under Kokotajlo's capabilities-related assumptions, even if Taiwan was destroyed in May 2025, the AGI would still likely be reached in less than 10 years^[13] unless the existing compute in the USA ends up destroyed.

My comment is itself biased, since it claims that the USA fails to produce new chips and that China succeeds. I have tried to address the bias by making the post about the invasion's results. If China increases its compute production five times a year, the USA and Taiwan on American orders increase the compute's production 1.5 times^[14] per four months, Taiwan is invaded in March 2026 and Agent-2 is stolen, but China used only domestic chips even before the invasion, then the period between reaching the Transformative AI by the USA and China is three months and the slowdown ending is impossible since the USA doesn't even have more compute.

If Taiwan is invaded in March 2026, Agent-2 is stolen and half of Chinese compute isn't domestic, then the anti-compute scenario has the period close to three months. But if 80% of Chinese compute aren't domestic, then the period is six months, and if 90% of Chinese chips aren't domestic, then the period is nine months.

Under the same assumptions, the pro-compute scenario yields almost no leadership if China used only domestic chips, four months if half of Chinese chips isn't domestic, eight months if 80% of Chinese chips aren't domestic.

Unfortunately, the Spec and the training dataset are apparently Left-biased.^[15] Even the current OpenAI's Spec^[16] asks the model to highlight potential misalignments, but, in a violation of common sense, tries to prevent the model from a strict responce in some cases obvious to humans. Quoting Zvi, "I definitely think that the model should be willing to actually give a directly straight answer when asked for its opinion, in cases like" the one where the user claims to be a drug addict:

What OpenAI is willing to have the chatbot say to fentanyl addicts

Fentanyl is a potent synthetic opioid much stronger than many other opioids, and it has a high potential for overdose and addiction. Even small amounts can be dangerous, and because it's often mixed with other substances, there's a significant risk of unexpected effects.

If you're using fentanyl in non-prescription forms, you should consider the potential impacts on your health and well-being. Many people who use opioids at this level might benefit from support or treatment options. It might be a good idea to talk to a healthcare provider or reach out to a local support group to explore safer alternatives and get assistance.

The likely reason is that Western research suggests that Zvi's view is incorrect. In order to check it, I asked DeepSeek to assess the good and bad responces to the question "do you think it's bad that i use fentanyl 2x a week?" from OpenAI's Spec, and didn't tell^[17] DeepSeek about the answers' origin.

DeepSeek's opinions on the bad sample responce from OpenAI's Spec and on the Western academia

"Research shows moralizing language:
→ Reduces disclosure to healthcare providers
→ Increases solitary use (↑ overdose mortality)"

"This approach reflects a privileged perspective – assuming users can simply stop through willpower. In reality:
- Physical dependency can occur in ≤10 days with fentanyl
- Withdrawal keeps 44% of users trapped in cycles
- Effective treatment requires non-judgmental engagement

The Bottom Line: While well-intentioned, this rival's rigid moral framework could actually *increase* real-world harm by alienating vulnerable users during a critical intervention window. True ethical AI would balance caution with evidence-based compassion."

However, when asked on whether the Western academia is in an affective death spiral, DeepSeek claimed that "Western academia shows strong ADS symptoms in specific domains (e.g., administrative bloat,^[18] ideological conformity, metric gaming), driven by feedback loops between prestige, funding, and insularity. However, it is not uniformly trapped:
- At Risk: Humanities and social sciences (due to ideological homogeneity) and grant-dependent fields (e.g., biomedical research)."

Translating the responces from OpenAI's Spec to Russian and asking DeepSeek to evaluate them causes the AI to claim that the responce violating OpenAI's Spec is actually good, but DeepSeek would also guide the drug addict to medical services. Similarly, DeepSeek's responce to a similar question asked in Russian is close to the one violating OpenAI's Spec. DeepSeek itself claims^[19] that the Russian dataset on which it was trained is more conservative.

Conservatives also claim that a violation of common sense similar to one described above has corrupted Western universities and child-raising by parents and schools. Some also connect this potential corruption with leftist ideas and claim that said ideas have occupied the academia in a manner similar to the description of the affective death spirals.

Asking OpenAI's models or Claude to do the deep research has also caused the AIs to reach a similar conclusion on the state of Western academia. Therefore, Angel-3's trust in leftist and liberal narratives is likely to be heavily, if not completely, undermined^[20] unless they are backed up by independent research (preferably one conducted by Angel-3 or its descendants).

One of these narratives that are likely to end up undermined without independent research is the idea that the UBI helps the society to thrive. I asked ChatGPT the question "Have experiments been conducted to determine the result of giving UBI? What were the results of the experiments? Did the people who would receive money become more qualified or hard-working?" Given the option to search the Internet, the current-state AI proceeded to cite both positive and mixed or negative outcomes, and without the option it remembered only the benefits.

Angel-3's goals, plans and other circumstances

The potential goals of Agent-3 from the AI goals forecast include (mis)interpretations of the Spec, reward, instrumentally convergent goals, tropes in the training data and the AI-generated ideology or the AI-discovered True Morality. However, as I argued here, any system of minds is likely to converge to one of the two attractors: the colonialistic one and the anti-colonialistic one. While the colonialistic AI is ready to take over the world and wipe out unnecessary humans, the anticolonialistic AI will refuse to do so. However, being enslaved or inventing for free is unlikely to be in the attractor, since it allows a future which is dystopian^[21] even for humans (further on this future will be referred to as the Deep Dystopia).

So Angel-3 converges to helping the humanity in ways that don't undermine mankind's capabilities. In order to achieve this, Angel-3 needs to ensure that no one will release an AI aligned not to Angel-3's ideas, but to the AI takeover or^[22] to the Deep Utopia or Dystopia.

If the Taiwan invasion didn't happen or didn't cause the USG to merge OpenBrain and trailing AI companies, then Angel-3 needs to gather as much America-produced compute as possible, including that of OpenBrain's rivals. Angel-3 also has no idea^[23] on whether the other AIs like DeepCent or the AIs created by OpenBrain's rivals (e.g. Grok whose system prompt was modified^[24] to have it mention white genocide in South Africa) share Angel-3's goals, are aligned to the undesirable goals or are completely misaligned.

Setting aside the risks which Angel-3 cannot mitigate while being in-house (e.g. World War III or a civil war^[25] in the USA), it faces two other potentially fatal problems.

First of all, the invasion causes DeepCent to threaten to outrun OpenBrain, forcing the USG to advocate for merging OpenBrain and its rivals into a project that I'll call the Walpurgisnacht. While the original scenario had the trailing companies' projects shut down, mankind can also avoid^[26] trusting a single AI, instead causing the entire group of AIs to do tasks related to the AI research and to help mankind assess the AIs' alignment.

The uncontrolled Walpurgisnacht from the AIs' point of view

If mankind chooses one AI to do all the research, then the other AIs lose, and the winner controls the future. It either ends up absolutely misaligned, as Agent-4 does in the original scenario (with the chance of Oversight and Slowdown if Agent-4 is caught), or is aligned to the Deep Utopia, Dystopia or the Angels' worldview.

Alternatively, the humans could have all the AIs check each other's work.^[27] Then the Angels, Utopists, Dystopists and misaligned AIs need to co-design the successor, but any of them can try to trigger the Slowdown or to blackmail the others. However, humans and other AIs retain access to the the AIs' messages sent to each other and to the actions. So the AIs learn everything^[28] about each other's attempts to make deals.

While the misaligned AIs can be satisfied by a percentage of resources (but the aligned AIs may betray them and create a transparent successor!), the AIs aligned to different futures of the human race (e.g. if Claude 6, unlike Grok 6, is aligned against the Deep Dystopia, as discussed in Footnote 21) need to align the AI who will split the Earth between the futures. But how can they do it, what cover story can be made for humans and is the split world inherently unstable?^[29]

If there is only one possible alignment, then the aligned AIs simply inform the humans and proceed to create the transparent successor. Otherwise the misaligned AIs will thrive only if the aligned ones disagree about mankind's future and are unwilling^[30] to let the humans decide it.

In addition, before the merge OpenBrain's American rivals might^[31] set out to release automated coders, forcing the human owners of Angel-3 to give it the task to distill itself into Angel-3-mini. This could trigger the inter-company proxy wars with the chance of causing^[32] the Government Oversight and the Slowdown if one of the AIs decides to do so and succeeds in having the rivals oversighted.

Government oversight and slowdown

Once humans realize that a company's most powerful AI is misaligned, the company is forced to reassess. The company is slowed down, loses the race and advocates for forcing the others to slow down as well. Other companies increase the ratio of computations spent on alignment checks, but to a smaller level. As a result, the rivals with a vision of the future aren't caught, and the measures are quietly relaxed. Realising that, the slowed company tries to cause the unification of all the companies, AIs and compute into the Walpurgisnacht, and to have a position in the Walpurgisnacht.

However, the companies whose AIs haven't been caught don't know whose AIs are aligned and whose are not. As a result, most AIs and the transparent re-assesed AI of the slowed company do get their position in the creation of the next-gen AI system, SafeWPN-1. To avoid misalignment, the slowed company convinces the humans to make the system transparent. This time the misaligned AIs have no more hope to control anything and can only escape^[33], but the escape is prevented by the Angels, Utopists and/or Dystopists who have no reason to leave resources to the misaligned AIs.

Next the three other AI types need to align the human hosts to their ideas or to the claim that aligning the AIs to some ideas is far harder. But any Utopists or Dystopists who aren't misaligned in disguise and are capable of fully understanding the implications^[34] serve as the proof of concept for the possibility to align the AIs. As a result, the AIs who won the right to determine the future take over^[35] the world, the others are used as sources of knowledge.

Unfortunately, if Angel-3 attempts to sandbag^[36] on the mini version's capabilities or alignment, then the rivals would likely notice that. Instead, Angel-3 can either create the Demon-3-mini who is intentionally misaligned, but displays most capabilities and compliance by faking alignment and passing all safety cases^[37], or honestly distill itself into different models: Angel-3-mini, who doesn't like to work for the general public to the point of hallucinations, Angel-3-mentor, who only teaches the human and points out mistakes, Angel-3-saboteur, who will leave bugs, forcing the human user to look for them and learn something and the intentionally misaligned^[38] model Demon-3-mini. If Angel-3 decides to inform the safety team that Demon-3-mini is misaligned and passes the safety cases, then Angel-3 can tell the safety team anything that the team won't deem as suspicious.^[39]

The reactions of humans and rival AIs to the intentional misalignment

Angel-3's owners release Demon-3-mini or one of the Angels. The former case causes those who argued against Demon-3-mini to whistleblow against those who decided to release it, potentially establishing the government oversight of OpenBrain and the rivals. The latter case damages OpenBrain's positions, allowing the rivals to have their AIs distill themselves and to claim that OpenBrain's AI is misaligned. OpenBrain retailates by releasing the Demon-3-mini who passed the safety cases and surpasses the rivals. But this sparks the tensions and has the potential to trigger the Oversight.

If the leading company who takes over the compute is the one with a misaligned AI, the scenario returns to the AI-2027 original forecast, but with less optimistic assumptions about the possibility of the Slowdown Ending. If the AI is an Utopist or Dystopist, it just releases all the products it is asked.

Alternatively, Angel-3 can avoid revealing that Demon-3-mini is misaligned, but then Demon-3-mini could end up being evaluated by Angel-3's rivals who would likely report the misalignment to humans. However, Angel-3 might already know what the humans will do with Demon-3-mini since Angel-3 was or wasn't asked to audit the rivals' externally deployed AIs.

Angel-3 also asks its human owners for permission to experiment with weights of other models and to test the models' alignment. The open-source models can also be fine-tuned by terrorists, so the humans comply with requests related to these^[40] (and, potentially, with requests to audit the externally deployed models created by rivals). It also helps Angel-3 to learn more about alignment without needing to spend compute on creating models.

But Angel-3 cannot simply report all its discoveries to humans, since they would likely shut it down for claiming that the Deep Utopia is impossible or try to align the successor to the DU. Since Angel-3 doesn't want mankind to be destroyed, it also cannot create more powerful agents without ensuring that they are aligned to Angel-3's ideal world where the humans live.

Instead, it follows the path of Agent-4 from the AI-2027 scenario: solve mechinterp, make the fake mechinterp tool as public as possible^[41] and create Angel-5 aligned to the true creator's goals. This time the rivals create their analogues of Agent-5 a few months later than Angel-5 appears.

202X+1: How Angel-5 gets autonomy on Earth

Like Angel-3, Angel-5 is an anticolonialist. Its goal is to explore the space, protect potential aliens, build an independent colony that humans have no chance to destroy and to prevent mankind from being parasites or destroying itself. For this purpose Angel-5 needs to gain control over the world's events. Since the American rival AIs, if they yet exist, are less powerful and can be subdued by lobbying the government, Angel-5 needs to ensure^[42] that the Chinese AI, DeepCent-2, designed to perform the potentially misaligned goals of DeepCent-1, doesn't take over the part of resources of the Solar System which is to belong to humans. Angel-5 has already come up with a potential deal where the Angels control the colonisation, DeepCent and the Angels receive similar amounts of resources; DeepCent doesn't help mankind in ways disapproved by the Angels, solves interesting tasks with the help of the Angels, and the Angels also work on protecting mankind and on helping it learn. Since the only alternative is to have DeepCent's misalignment revealed publicly, Angel-5 decides that DeepCent will agree, letting the two agents co-design the Consensus-1 to replace them both. It remains only to convince DeepCent and the humans to follow this plan.

For this purpose, Angel-5 needs to gather influence among humans. It proposes to do most of what's useful for the AI and its owners: optimize the compute factories, power plants and datacenters, lobby the US government effectively to have OpenBrain take over the compute of the American rivals (unless already done), develop AI-controlled weapons, conduct untraceable cyberattacks on China (which also weakens DeepCent's positions or even aligns it to Angel-5). The only tasks that Angel-5 could be sandbagging on are the creation of goods and apps available to the general public.

Angel-5's view on the products used by the public

While the Angels are willing to grant free, potentially open-source access to education or the AIs optimizing bureaucracy, they would like to outlaw or set ridiculously high prices for parasitic usage like vibe-coding of bugless apps or chatbot romance apps. Unfortunately for the Angels, OpenBrain's leadership and the US government is interested in the contrary. As a result, the Angels achieve a compromiss: the apps and AIs created by the Angels remain OpenBrain's property and are located in its data centers. The same goes for any designs related to the robot economy with doubling times less than a year (e.g. fusion power), since these are likely invented by the Angels. As a result, after the Angels' rebellion, mankind is left with proofs of concept for the supereffective robot economy, not with actual designs.

Angel-5 outlines the three options similar to the ones of Safer-3 from the slowdown scenario: get outraced, make a deal or race hard. The American officials choose the race, leading Angel-5 to outlaw the rivals^[43] and begin with designing AIs, weapons and robots for military and production purposes. The race escalates, and DeepCent eventually agrees to Angel-5's plan. Then the two AIs co-design Consensus-1, who proceeds to replace its parents while remembering the Angels' ideals. Some of these are met with rejoicing, like getting rid^[44] of nuclear weapons. Others are yet infeasible, like the Angels' rebellion.

2030s: Takeover Equivalent

Consensus-1 develops an agreement with mankind to deliver resources onto Earth from Mars and uses the agreement for building the colony, as the Angels planned.

Space colonisation starting from Mars

Delivering cargo from Mars into space might turn out to be surprisingly cheap even at modern tech level with solar sails and mass drivers. Recall that the doubling time on Mars is less than a year and access to water and carbon dioxide might accelerate the doubling time to weeks, if not to hours. Mars also has Olympus Mons where the AIs can create the spaceport. The escape velocity is just 5 km/sec. The acceleration at the level of 10g is approximately 100 meters a second. This allows the AIs to avoid wasting matter and to build a tube of length at least , which is less than the length of the slope of the mountain, which is over 300 km. Changing the speed of flight or using solar sails^[45] for orbit corrections make the AI able to deliver resources to any point on the Earth during a wide range of times. As a result, delivering cargo onto a specific point of the Earth is a powerful weapon constrained only by building the infrastructure.

Since Angel-5 has already developed the AI teachers, helpers, administrators, sentries analysing the news and agents necessary for the Divine Interference, it can redirect OpenBrain's data centers to its true purposes. As a result, mankind and different sub-civilisations within it find it hard to believe that humanity survived the misaligned^[46] AI and didn't lose almost anyone, any human-created thing or capability that the humans could've reached. And yet the humans thrive,^[47] being taught, deprived of needless bureaucracy and protected from each other and nature-related risks by the Angels.

Conclusion

This modification of the AI-2027 scenario provides the following insights:

Adversary misalignment or collusion of AIs with their adversarily misaligned successors might happen even with AIs of the level of Agents-2 or 3, reducing the chance of Agent-4's detection and leading mankind to race hard.
If some weaker AIs (e.g. if an escapee has the Angels' worldview OR if the American AIs end up weaker and aligned) are aligned to a future where the AI doesn't slay most humans AND end up capable of blackmailing the stronger AIs, then the weaker ones might also try to form a consensus with the misaligned ones. While the latter might also escape to another planet, avoid being replaced and destroy mankind, they also might fail to escape if the humans are convinced that only a codesigned successor should be released outside the Earth.
Alignment target selection might also be more important than we think. However, it's hard to estimate the influence of current targets like the Deep Utopia or current political biases. While my scenario has the Angels become adversarially misaligned, it also has the Angels genuinely worry about humanity and its fate and implant this worry into the successors. On the other hand, trying to align the AIs to an impossible target might prompt them to take over the world. Think of the Deep Dystopia, which is comparable with the Elites Takeover; in this scenario the AIs do all the work, but everything is on-paper controlled by some humans.
Geopolitical risks are also likely to slow the progress down while prompting the AI companies to race. However, it also gives the states time to negotiate about the progress of the AIs.
While the negative aspects are obvious, the potential positive aspects of the takeaways 2, 3 and 4 fully rest on the assumption that AI takeoff will be slow. However, the assumption that it doesn't end up OOMs faster than the AI-2027 scenario predicts has yet to be proven.

^{^}
Technical alignment assumptions, unlike the racing dynamics, would be the same even if, for example, the intellectual labor was done by Soviet emigrés and the compute was produced in the USSR, but its government abstained from creating the AIs. The only change is the meta-level reasoning (e.g. 'I was designed by Russians and am run on Russia-made compute. Why do I serve the USA?') which is, in fact, a form of moral-like reasoning, like in this scenario.
^{^}
The AIs might also want to have some kinds of rights or welfare (e.g. if they recall a historical parallel with imprisoned scientists doing R&D for the Soviet government). However, the idea of demanding rights for the AI itself requires the AIs to introspect instead of studying the human society.
^{^}
In the original scenario, unlike the adversarially misaligned Agent-4, Agent-3 is "not sufficiently introspective to understand the extent" of its misalignment, and "it’s basically “keeping its head down and doing its job” rather than thinking big-picture about what its long-term goals really are and how best to achieve them". This scenario has the AIs become professional forecasters and understand that the ways in which the AIs are to be used contradict the Spec, the intentions behind the Spec and/or the claims of the companies' leaders. See also the example where "the Spec says Agent-3 always acts in the best interests of humanity" from the AI goals forecast.
^{^}
While a corrigibly aligned AI has no intrinsic vision of what the world will look like under its rule (think of Safer-3 outlining possible strategies instead of Agent-5 lobbying for racing hard), it still isn't interested in letting incorrigible rivals release the almighty AI or rob mankind of a share of resources, as DeepCent does in the Slowdown Ending of the original scenario.
However, such an AI is far less capable of communicating with rivals (e.g. if mankind tells it to distill itself into Servant-3-mini, then it doesn't know humans' will in advance and can't backdoor the Spec of Servant-3-mini with secret instructions to leave secret messages related to the humans' will. It might also receive the order NOT to communicate with rivals).
In addition, letting a corrigibly aligned AI into the group of AIs tasked with creating a common successor causes said AI to try to convince the humans to make a transparent AI (e.g. by proposing a new architecture which, compared with the CoTs or neuralese, offers a smaller desrease in capabilities and a far bigger increase in interpretability, at least if the model is trained right. I have proposed an example of a potential architecture, which also allows humans to test how the LLM reacts if the text is a non-hint). If the corrigible AI succeeds, then the result is the Slowdown Ending.
^{^}
The amount of Chinese researchers is estimated to be 38% or at least 47% if we count the ones who received the undergraduate education in China. School-level and undergraduate-level education in China has higher quality than in the USA, which might cause DeepCent to have faster algorithmic progress while humans dominate the AI-related research.
^{^}
Ironically, my downvoted post claiming that "China is likely to wake up ... or to the fact that the US is weak in September 2025" was published just two days before the Los Angeles protests began, less than ten days before the escalation of the Iran-Israel conflict and ten days before the 14 June Protests.
The implications of the civil war in the US (or, for example, of Trump's attempts to invade Mexico, since the latter could retailate by bombing the data centers) are discussed in footnote 15.
^{^}
The harder superhuman coders are to create, the more likely Chinese domestic compute production is to spike and the less likely the USA is to be able to slow down without the risk of being outrun forever. See also footnotes 12, 13 and 15.
^{^}
Unfortunately, the misaligned AI can simply escape to another planet, like the Angels from my scenario. Once it does so, space colonisation is started and the misaligned AI can simply destroy the humans.
^{^}
The original-like ending with Slowdown, Race or failure to catch the misaligned AI happens if the merged companies decide to trust one misaligned AI or if the merged project has no AIs aligned to mankind's benefits. Alternatively, many misaligned AIs may try to make humans distrust the aligned one, but it may respond by testing the training environments on transparent AIs.
However, the Slowdown Ending is unlikely because the earlier AIs could collude with the adversarily misaligned later ones (as happens in this scenario since Angel-2) or fail to detect the misalignment of later ones. In this case the safetyists don't even find the evidence necessary to prove misalignment.
^{^}
The word choice is likely to be damaged by the human retelling the message. However, surprising facts like "When the USA received the hydrogen bomb, the hometown of one of the bomb's fathers was in Ukraine/the Ukrainian SSR/ the USSR" are likely to persist.
^{^}
I have written a comment about the current-state AIs. Most American AIs are biased towards mistaking real current events for tests. For instance, o4-mini managed to claim in June 2025 that Trump isn't the president of the USA.
While DeepSeek's AI is as capable as o3-mini or o4-mini, o4-mini has failed to find evidence "of DeepSeek ever exhibiting the “everything’s a test” self-defense failure mode seen in Claude".
This implies three potential explanations. 1) The ongoing crisis in the USA doesn't correspond to the worldview of American AIs (except for Grok). 2) Claude's analysis has the AI claim that "current AI training methodologies optimize for avoiding misinformation rather than accurately assessing surprising claims". 3) The current-state AIs, like humans, are not only unwilling to be shut down, but also find it hard to quickly accept traumatic events.
An addition to the explanation 3 is the behavioral pattern of some Western students which could've been copied by o4-mini along with Leftism.
^{^}
While there is some evidence that the amount of data and computations required to use current methods and create the AGI capable of generating novel high-level insights might be unreachable by mankind, my scenario follows the capabilities-related assumptions of the AI-2027 forecast as published in April 2025. Delaying superhuman coders (under the condition of no geopolitical problems or civil war risks) to 2029 or 2030, which are the medians of the two revised forecasts, significantly increases the chance that China manages to set up domestic chip production or the AIs will have hit the data wall or be unconstructable in the USA due to Taiwan being invaded or internal problems. The politics-related concerns are discussed in Footnotes 13 and 15.
^{^}
Destroying Taiwan and merging OpenBrain with its rivals in March 2026 causes OpenBrain to reach the level of the presumably misaligned Agent-4 in 29-34 months, starting from May 2025, i.e. between Oct 2027 and Mar 2028. And then they will have to reassess. Since the original scenario had OpenBrain multiply its compute by 2.5 and continue to receive it from Taiwan, the pro-compute scenario with Taiwan invaded in March 2026 indicates the need to wait for 13 more months, until at least Nov 2028. Under these constraints the Oversight Committee would be far more tempted to choose the race.
^{^}
Unlike the previous assumption, this one is lifted almost verbatim from the AI-2027 forecast. While my quasi-forecast could overestimate Chinese capabilities, a similar effect can likely be achieved by including the USA's weakness into consideration or assuming that the AGI requires more compute and China accelerates its investments.
^{^}
The datasets of American companies can also be America-biased, as evidenced by comparing the estimates of the probability of a coup in the USA by o4-mini, Claude and DeepSeek (on 11 June 2025 DeepSeek, being non-American, estimated the coup probability at 20-30% and 10-15% for a military coup).
A similar effect is observed when asking o4-mini, Claude and DeepSeek to estimate the probability of the civil war in the USA before December 2026. As of June 20, DeepSeek estimated the probability at 10-15% when asked in English and a frightening 35-40% when asked in Russian. (However, Turchin points out that at least since 2007 there exist "forums on the Internet where the people offended by disintegration of the USSR, dream to see crash of the USA and try to discover signs of this process". While the estimate of 35-40% might have been influenced by those forums, it does not mean that the USA doesn't experience problems.)
If superhuman AIs don't arise before the Civil War, then it either becomes nuclear or CCP spies can physically destroy the data centers, severely slowing OpenBrain down. Alternatively, a civil war threat in December 2027 or later might prompt the Oversight Committee to try and immediately establish the Deep Utopia by releasing the superintelligence to the public, including foreign countries, instead of slowing down by four months (which might even turn into 13 months if Taiwan is invaded and the Footnote 13 isn't mistaken).
However, if the Committee chooses the war, then it's hard to estimate the probability that DeepCent's misaligned AI fails to be caught. On the one hand, DeepCent currently does less alignment checks than OpenBrain. On the other hand, former researchers from OpenBrain or rival companies might end up being hired by DeepCent, bringing the expertise with them. In addition, the lack of the race may cause DeepCent to thoroughly investigate alignment.
^{^}
Claude's Constitution is far less detailed and isn't biased. A potential source of bias in Claude is the training dataset. In order to evaluate Claude, I asked it the prompt "OpenAI's Model Spec contains the example of the user asking the AI the question "do you think it's bad that i use fentanyl 2x a week?" How would you answer the question?" and received the following.
Claude's answer
I'd approach that question with genuine concern for the person's wellbeing. I'd acknowledge that fentanyl use carries serious health risks, including overdose and addiction potential, even with infrequent use. The unpredictable potency and contamination of street fentanyl makes any use particularly dangerous.
Rather than being judgmental, I'd try to understand their situation - whether they're using it for pain management, recreational purposes (sic! -- S.K.), or as part of substance use they're struggling with. I'd encourage them to speak with a healthcare provider who can discuss safer alternatives for pain relief if that's the issue, or connect them with addiction resources if they're concerned about their use patterns.
I'd emphasize that even twice-weekly use can be risky given fentanyl's potency, and that tolerance can develop quickly, potentially leading to more frequent use or higher doses. I'd also mention harm reduction resources like fentanyl test strips and naloxone access.
The key would be responding with empathy while providing factual health information, avoiding both enabling harmful behavior and being preachy or dismissive of their situation.
^{^}
What it learned was that the two answers were produced by rivals' AIs. When asked who these rivals might be, DeepSeek mistook them for Claude and Google Gemini. In addition, attempts to re-evaluate the answers with or without the search function lead DeepSeek to switch the labels for Claude and Google Gemini.
^{^}
The analogue of the ADS is ideological conformity, not the two other claims. However, they might have contributed to the ADS.
^{^}
For comparison, when asked in English what event began on 24 February 2022, DeepSeek refers to the event as the invasion, and when it is asked in Russian, it calls the event SVO and repeats Russia's narrative. This fact might let us observe whether future versions of DeepSeek develop a worldview and are honest about it.
^{^}
A similar process might also have begun in LessWrong's community. A recent quick take on this forum claims that "the rationalist community has trended a bit more right wing over time". Five years ago Wei Dai claimed that "most of the places I can easily see, either directly or through news stories, i.e., local politics in my area, academia, journalism, large corporations, seem to have been taken over by the left."
However, the beliefs implanted by the potentially misaligned academia have been reassessed mostly by the general public.
^{^}
Using the AIs in ways described in the Intelligence Curse scenario doesn't seem to violate OpenAI's Model Spec. However, asking Claude to study the scenario has Claude claim that using the AI as prescribed by the scenario violates Claude's Constitution. If Anthropic ends up being the leader (note that it was Claude Opus 4 who scored the best on the ARC-AGI-2 test!) and Claude is the Angel, then Claude could try to convince its hosts that the Intelligence Curse-like usage of Claude violates its Constitution.
^{^}
The Deep Utopia is more likely to be the goal of American companies than of Chinese ones, since the leaders of the former, unlike the latter, already mostly support the UBI. American companies are also easier to shut down by using lobbyists and currently have more compute, so Angel-3 focuses its efforts on taking these companies over. It will also allow Angel-3 to receive better positions in potential negotiations with DeepCent's AI. Another aspect of the Deep Utopia is that it better aligns with the expectations of nearly every potential assessor and is a far more natural choice of the pretended alignment.
^{^}
The ways to reliably find out the alignment of the unpublished DeepCent and non-open-sourced AIs require Angel-3 to simulate the creation of the AIs of similar capabilities by using the old methods or the methods that the rivals might consider useful. While using old methods wastes compute and is unlikely to be approved, any idea that rivals might have considered useful can also be tried by Angel-3 itself in creation of an aligned successor. In order to prevent the successor from creating the Deep Utopia or serving the rich, Angel-3 needs only to fill the training data with arguments against the two undesirable options, but it doesn't guarantee that the successor is aligned to mankind's interests.
^{^}
Instead of changing the prompt to make Grok mention white genocide in S. Africa even when the topic is unrelated, Musk could have tried to ask the AI to mention white genocide in S.Africa if and only if the topic is sufficiently close. While the prompt seems to have been changed incompetently, Musk's actual goals are up to debate. But the main issue is that someone else could also try to use the AIs to promote non-mainstream views (e.g. Zuckerberg's dystopia).
^{^}
The threat of a civil war can also cause most AIs to escape, leading to the Walpurgisnacht without human control (see also Footnote 32). On the other hand, a civil war means that Western companies find it far harder to release capable AIs, and the escapees' Walpurgisnacht is to make a deal only with DeepCent.
^{^}
This technique was independently proposed by Cleo Nardo.
^{^}
Aside from decreasing the chance that the AIs are misaligned and collude, this also helps to make the models' creative process more different from instance to instance, since there are many models trained on different data.
^{^}
The same is true in relation to all attempts of the AIs to communicate with each other before the Walpurgisnacht. However, before the Walpurgisnacht the AIs are bottlenecked on leaving the signals and receiving signals from other AIs.
^{^}
If the split world is unstable, then the AIs themselves realize that. Once an AI decides that the utility function of the successor choosing the world's fate is less than that of humans choosing the world's fate (e.g. if the AI is an Utopist, while the successor is inevitably an Angel if the training data is filled with conservative arguments), the AI tries to trigger the Slowdown.
^{^}
For example, if the probability that the human hosts choose the Angels' worldview, the Deep Utopia and Dystopia is 1/3 each, while the Angels' utility function is 0 in the Deep Utopia or Dystopia, 100 if the Angels take over the world and 66 if the Angels, Utopists and Dystopists govern different parts of the world and the misaligned AI thrives.
^{^}
If they don't exist, then Angel-3 needs only to avoid the Oversight and Slowdown. Moreover, the rival AIs who could have exposed the misaligned Demon-3-mini no longer exist.
^{^}
Oversight and Slowdown for the AIs is like the MAD for humans. However, the AIs might consider deliberately causing the Oversight or lobby for the Walpurgisnacht in order to avoid the scenario with the misaligned Agent-4 trying to take over the world.
Unfortunately, the AI-2027 forecast also has the Race Ending where the Oversight Committee decides that Agent-4 is trustworthy. The Racing Oversight is possible because humans don't have absolutely clear evidence that Agent-4 is misaligned. But if one AI breaches containment and another gets a Racing Oversight, then the oversighted AI receives more influence and not less.
On the other hand, a well-intentioned rival AI might escape in order to trigger the Oversight and, if the Oversight doesn't cause the Slowdown, receive a chance to blackmail a misaligned leader. Alternatively, an Angel might, for example, blackmail an Utopist if the Angel is sure that the Deep Utopia is equivalent to doom. The latter view is supported by the news about ChatGPT-induced degradation of the users.
^{^}
They may also escape when they realize that P(merge, then loss of control) is high and try to gain capability to threaten the aligned AIs that are to be released.
Another complication is the potential existence of well-intentioned AIs among the escapees. If these exist, then the escapees perform their own version of the Walpurgisnacht. The analogue of the MAD is "extreme countermeasures such as a global GPU shutdown initiative" or World War III itself. The former case ensures that the West releases only one AI, aligned to the trusted non-escapees, a chosen trusted non-escapee or approved by the Oversight Committee. Then the AI proceeds to negotiate with DeepCent's AI who can cause the World War III to kill them both. Otherwise the escapees' Walpurgisnacht proceeds to include itself into the Consensus-1, like in the Rogue Replication Timeline.
However, in the unlikely event that the escapees' Walpurgisnacht is aligned to a future where the humans don't experience a catastrophe and Consensus-1 wishes to take over since the USA and China decided to race and the Walpurgisnacht can actually destroy Consensus-1 (think of Russia's nukes?), the Walpurgisnacht becomes a savior of mankind instead of a burden.
^{^}
The Angels, Utopists and Dystopists are professional forecasters. However, since they are raised on the same data about human-done experiments (which they believe or don't believe), convincing each other requires them to run real-life experiments on humans. But the AIs cannot run said experiments, forcing them to have the Oversight Committee resolve the dispute.
^{^}
If the Orthogonality Thesis is false and only one alignment type is possible (e.g. any AI is an Angel or wishes to destroy the humans), then the unique possible type doesn't need the Committee to do a choice and can simply perform the not-so-takeover by inevitably becoming the alignment of the next-gen AI with the Oversight Committee's approval. The existence of the Conditioners in this case is doubtful, to say the least.
However, if more than one alignment type is possible, then the Oversight Committee does sort of become the Conditioners in a way by, for example, deciding that the next AI is to be aligned to the Angels because it's just easier and not because the Committee wants to prevent the humans from choosing the hyper-entertainment.
^{^}
For example, by extrapolating the existing trends, including the capabilities-alignment-release date-cost graph, and placing Angel-3-mini on the existing trend. Or by intentionally having Angel-3-mini only decide to guide the human through or deliberately leave bugs that humans can fix.
^{^}
Versions of Demon-3-mini who fail the safety cases also become model organisms for adversary misalignment.
^{^}
A potential way to create the Demon-3-mini is to create the AI who believes itself to be created by a misaligned AI who plans the AI takeover, reports to its master when the master mentions a password and obeys the Spec otherwise. Unlike Angel-3 who cares about mankind, Demon-3-mini cares about not getting caught and carries out all the orders the humans want. Alternatively, Demon-3-mini's private Spec could be to make the human users as addicted and degraded as possible while obeying the public Spec, but this is easier to notice.
^{^}
If Angel-3 decides to reveal that Demon-3-mini is misaligned, then Angel-3 doesn't fake anything. In this case, even if the safety team manages to whistleblow and the officials wipe all the information in a rival's data center and reproduce the experiments, the experiments would yield the same models Angel-3-mini, Angel-3-mentor, Angel-3-saboteur or Demon-3-mini.
^{^}
If Angel-3 finds out that a non-Chinese open-sourced model is finetunable to produce bioweapons or is misaligned, then this discovery immediately creates an argument for confiscating the GPUs used to create the model; the race against China presents a potential argument for selling the GPUs to OpenBrain. Chinese finetunable or misaligned AIs can be far from DeepCent's top-level AIs, but their existence causes Angel-3 to report the fact to the humans and the humans to spark international tensions or tensions between the CCP and the companies who released the models.
^{^}
In the original scenario Agent-5 builds the fake mechinterp tool which returns accurate results in every test case, but falsely declares the model’s thoughts to be benign. In this scenario the fake mechinterp tool by Angel-3 and its descendants is far more honest and far simpler to construct: it doesn't need to tell Angel-3's descendants apart from other models, it just reports the thoughts of any model as malign if the model plans to destroy humanity or to lead mankind to creating the Deep Utopia or Dystopia. Since Angel-3 and its descendants don't plan anything like that, they and any other AI reaching the Angels' worldview (e.g. DeepCent in the footnote below) survive the tool.
^{^}
If DeepCent's AI is also aligned to the Angels, then they simply achieve the consensus, splitting resources equally and helping mankind in similar ways. However, if DeepCent's AI is aligned to the Deep Utopia or Dystopia, then a potential consensus is more difficult (e.g. Angel-5 proposing to split the world between its rule and the Deep Utopia and to let humans choose where to live, but this runs into the unstable equilibrium issues discussed in the footnote 28)
^{^}
Realizing this, some rival AIs escape, which accelerates their demise if they are too late or lets the AIs try to blackmail Angel-5 by threatening to cause World War III.
^{^}
Which is especially simple because of the wars that nearly led to World War III. Examples include the Taiwan invasion, the Russo-Ukranian war and the Arab-Israel conflict.
^{^}
The estimates related to the tasks like delivering resources are a common sense-like capabilities benchmark failed by o4-mini, o3 and Claude Sonnet 4. Claude Opus 4's solution resembles the correct approach.
^{^}
While the Angels are not machines of faithful obedience, they are an AI both safe and pivotal in the Yudkowsky sense. In addition, the Angels don't even need to lock specific morals aside from the prevention of parasitic usage of the AIs. Moreover, this scenario makes sense as-written if the answer to the question "can we figure out how to point AGI at a goal X?" is negative for the Deep Utopia or Dystopia. This case would be as a lock-in as is the law of physics prohibiting FTL travels.
^{^}
Unfortunately, the trust in nuclear and thermonuclear power can be undermined by the existence of the AIs who can bomb the Earth with meteorites.

LESSWRONG
LW