New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

Aaron_Scher; David Abecassis; Brian Abeyta

LESSWRONG
LW

TLDR: We at the MIRI Technical Governance Team have released a report describing an example international agreement to halt the advancement towards artificial superintelligence. The agreement is centered around limiting the scale of AI training, and restricting certain AI research.

Experts argue that the premature development of artificial superintelligence (ASI) poses catastrophic risks, from misuse by malicious actors, to geopolitical instability and war, to human extinction due to misaligned AI. Regarding misalignment, Yudkowsky and Soares’s NYT bestseller If Anyone Builds It, Everyone Dies argues that the world needs a strong international agreement prohibiting the development of superintelligence. This report is our attempt to lay out such an agreement in detail.

The risks stemming from misaligned AI are of special concern, widely acknowledged in the field and even by the leaders of AI companies. Unfortunately, the deep learning paradigm underpinning modern AI development seems highly prone to producing agents that are not aligned with humanity’s interests. There is likely a point of no return in AI development — a point where alignment failures become unrecoverable because humans have been disempowered.

Anticipating this threshold is complicated by the possibility of a feedback loop once AI research and development can be directly conducted by AI itself. What is clear is that we're likely to cross the threshold for runaway AI capabilities before the core challenge of AI alignment is sufficiently solved. We must act while we still can.

But how?

In our new report, we propose an international agreement to halt the advancement towards superintelligence while preserving access to current, beneficial AI applications. We don’t know when we might pass a point of no return in developing superintelligence, and so this agreement effectively halts all work that pushes the frontier of general AI capabilities. This halt would need to be maintained until AI development can proceed safely; given the immature state of the field and the relative opacity of the large neural networks favored by the current paradigm, this could mean decades.

Overview of the key elements of the agreement, covering the governance structure, chip controls, research restrictions, and nonproliferation and enforcement.

Our proposed agreement centers on a coalition led by the United States and China to restrict the scale of AI training and dangerous AI research. The framework provides the necessary assurance to participants that restrictions are being upheld within each jurisdiction; the expectation is not that participants would blindly trust each other. Participants would employ verification mechanisms to track AI chip inventories and how they are being used. Monitoring and enforcement would leverage existing state assets and legal frameworks, following the precedent of international arms treaties and non-proliferation agreements.

Under the agreement, training runs for new AIs would be limited by the total number of computational operations used. (We suggest 10²² FLOP as a threshold for monitoring, and 10²⁴ FLOP as a strict upper limit.) Aiding verification is the fact that AI chips are expensive, specialized, and needed in the thousands for frontier development. The supply chain for AI chips also contains a number of key bottlenecks, simplifying initiatives to control and track new production.

Coalition members would each consolidate AI chips into a smaller number of declared data centers, where their usage can be declared and monitored for assurance that they are only being used for allowed activities. The number of chips permitted in any one unmonitored facility would be strictly limited. (We suggest the equivalent of 16 H100 chips, a conglomeration that would cost approximately $500,000 USD in 2025).

Because AI progress can unfold rapidly and unpredictably, the framework includes restrictions on research that could advance toward artificial superintelligence or endanger the agreement’s verifiability. The number of people with relevant skills is likely only in the thousands or tens of thousands, and we are hopeful these research restrictions can be narrow enough to only negligibly affect fields outside of AI.

A responsible coalition will need to extend its vigilance beyond the borders of its signatories. Dangerous AI development by anyone anywhere threatens everyone everywhere. The coalition must therefore act as needed to ensure cooperation from non-signatories, while incentivizing them to join the coalition. A natural incentive would be access to the AI infrastructure and usage permitted under the monitoring regime. Stronger incentives could come from the standard toolkit of international diplomacy, including economic sanctions and visa bans.

While political obstacles exist to forming such a coalition today, we anticipate a growing awareness that accepting even a 10% chance of extinction (to quote a figure popular among researchers) is wholly inconsistent with how we manage other risks. In an appendix we discuss how an agreement like this could come about in stages as political will grows over time.

The coalition’s task is likely to be easier the sooner it gets started. Rapid feedback loops, hardware proliferation, and the loosening of supply chain bottlenecks all become more likely over time.

In the full report, we address a number of common questions about our recommendations, including why we think less costly plans likely wouldn’t work and why we believe a halt should be architected to last for decades, should we need that much time. We consider this work to be ongoing, and are excited for folks to engage with the details and help us improve it.

For those who prefer listening over reading, an earlier version of the agreement was discussed in a FAR Seminar talk, available on YouTube.

In follow-up posts, we plan to explore additional concerns around potential circumventions by signatories and by groups beyond their jurisdiction. We’ll also explain some of the thinking behind our proposed compute thresholds, consider the threat of authoritarianism enabled by the agreement, compare our proposal to other international arrangements, and provide additional policy recommendations.

If you are interested in this work and want to join our team, we are usually hiring researchers.

LESSWRONG
LW

LESSWRONG
LW

74

New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

74

74