New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

peterbarnett; Aaron_Scher; David Abecassis; Brian Abeyta

LESSWRONG
LW

New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence — LessWrong

An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

218 New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence

by peterbarnett, Aaron_Scher, David Abecassis, Brian Abeyta

18th Nov 2025

4 min read

218

TLDR: We at the MIRI Technical Governance Team have released a report describing an example international agreement to halt the advancement towards artificial superintelligence. The agreement is centered around limiting the scale of AI training, and restricting certain AI research.

Experts argue that the premature development of artificial superintelligence (ASI) poses catastrophic risks, from misuse by malicious actors, to geopolitical instability and war, to human extinction due to misaligned AI. Regarding misalignment, Yudkowsky and Soares’s NYT bestseller If Anyone Builds It, Everyone Dies argues that the world needs a strong international agreement prohibiting the development of superintelligence. This report is our attempt to lay out such an agreement in detail.

The risks stemming from misaligned AI are of special concern, widely acknowledged in the field and even by the leaders of AI companies. Unfortunately, the deep learning paradigm underpinning modern AI development seems highly prone to producing agents that are not aligned with humanity’s interests. There is likely a point of no return in AI development — a point where alignment failures become unrecoverable because humans have been disempowered.

Anticipating this threshold is complicated by the possibility of a feedback loop once AI research and development can be directly conducted by AI itself. What is clear is that we're likely to cross the threshold for runaway AI capabilities before the core challenge of AI alignment is sufficiently solved. We must act while we still can.

But how?

In our new report, we propose an international agreement to halt the advancement towards superintelligence while preserving access to current, beneficial AI applications. We don’t know when we might pass a point of no return in developing superintelligence, and so this agreement effectively halts all work that pushes the frontier of general AI capabilities. This halt would need to be maintained until AI development can proceed safely; given the immature state of the field and the relative opacity of the large neural networks favored by the current paradigm, this could mean decades.

Overview of the key elements of the agreement, covering the governance structure, chip controls, research restrictions, and nonproliferation and enforcement.

Our proposed agreement centers on a coalition led by the United States and China to restrict the scale of AI training and dangerous AI research. The framework provides the necessary assurance to participants that restrictions are being upheld within each jurisdiction; the expectation is not that participants would blindly trust each other. Participants would employ verification mechanisms to track AI chip inventories and how they are being used. Monitoring and enforcement would leverage existing state assets and legal frameworks, following the precedent of international arms treaties and non-proliferation agreements.

Under the agreement, training runs for new AIs would be limited by the total number of computational operations used. (We suggest 10²² FLOP as a threshold for monitoring, and 10²⁴ FLOP as a strict upper limit.) Aiding verification is the fact that AI chips are expensive, specialized, and needed in the thousands for frontier development. The supply chain for AI chips also contains a number of key bottlenecks, simplifying initiatives to control and track new production.

Coalition members would each consolidate AI chips into a smaller number of declared data centers, where their usage can be declared and monitored for assurance that they are only being used for allowed activities. The number of chips permitted in any one unmonitored facility would be strictly limited. (We suggest the equivalent of 16 H100 chips, a conglomeration that would cost approximately $500,000 USD in 2025).

Because AI progress can unfold rapidly and unpredictably, the framework includes restrictions on research that could advance toward artificial superintelligence or endanger the agreement’s verifiability. The number of people with relevant skills is likely only in the thousands or tens of thousands, and we are hopeful these research restrictions can be narrow enough to only negligibly affect fields outside of AI.

A responsible coalition will need to extend its vigilance beyond the borders of its signatories. Dangerous AI development by anyone anywhere threatens everyone everywhere. The coalition must therefore act as needed to ensure cooperation from non-signatories, while incentivizing them to join the coalition. A natural incentive would be access to the AI infrastructure and usage permitted under the monitoring regime. Stronger incentives could come from the standard toolkit of international diplomacy, including economic sanctions and visa bans.

While political obstacles exist to forming such a coalition today, we anticipate a growing awareness that accepting even a 10% chance of extinction (to quote a figure popular among researchers) is wholly inconsistent with how we manage other risks. In an appendix we discuss how an agreement like this could come about in stages as political will grows over time.

The coalition’s task is likely to be easier the sooner it gets started. Rapid feedback loops, hardware proliferation, and the loosening of supply chain bottlenecks all become more likely over time.

In the full report, we address a number of common questions about our recommendations, including why we think less costly plans likely wouldn’t work and why we believe a halt should be architected to last for decades, should we need that much time. We consider this work to be ongoing, and are excited for folks to engage with the details and help us improve it.

For those who prefer listening over reading, an earlier version of the agreement was discussed in a FAR Seminar talk, available on YouTube.

In follow-up posts, we plan to explore additional concerns around potential circumventions by signatories and by groups beyond their jurisdiction. We’ll also explain some of the thinking behind our proposed compute thresholds, consider the threat of authoritarianism enabled by the agreement, compare our proposal to other international arrangements, and provide additional policy recommendations.

If you are interested in this work and want to join our team, we are usually hiring researchers.

AI GovernanceMachine Intelligence Research Institute (MIRI)AI

Frontpage

218

Considerations for setting the FLOP thresholds in our example international AI agreement

5 comments52 karma

New Comment

23 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:04 AM

[-]Raemon3mo*327

Here's an attempt to recap the previous discussion about "Global Shutdown" vs "Plan A/Controlled Takeoff", trying to skip ahead to the part where we're moving the conversation forward rather than rehashing stuff.

Cruxes that seemed particularly significant (phrased the way they made most sense to me, which is hopefully reasonably ITT passing)

...

How bad is Chinese Superintelligence? For some people, it's a serious crux whether a China-run superintelligence would be dramatically worse in outcome than a democratic country.

...

"The gameboard could change in all kinds of bad ways over 30 years." Nations or companies could suddenly pull out in a disastrous way. If things go down in the near future there's fewer actors to make deals with and it's easier to plan things out.

...

Can we leverage useful work out of significantly-more-powerful-but-nonsuperhuman AIs? Especially since "the gameboard might change a lot", it's useful to get lots of safety research done quickly, and it's easier to do that with more powerful AIs. So, it's useful to continue to scale up until we've got the most powerful AIs can we can confidently control. (Whereas Controlled Takeoff skeptics tend to think AI that is capable of taking on the hard parts of AI safety research will already be too dangerous and untrustworthy)

...

Is there a decent chance an AI takeover is relatively nice? Giving the humans the Earth/solar system is just incredibly cheap from percentage-of-resources standpoint. This does require the AI to genuinely care about and respect our agency in a sort of complete way. But, it only has to care about us as a pretty teeny amount

[Edit: this was an interesting disagreement but I don't know anyone for whom it's strategically relevant, except in what arguments to publicize about whether if anyone built it, everyone would die]

...

And then, the usual "how doomed are current alignment plans?". My impression is "Plan A" advocates are usually expecting a pretty good chance things go pretty well if humanity is making like a reasonably good faith attempt at controlled takeoff, whereas Controlled Takeoff skeptics are typically imagining "by default this just goes really poorly, you can tell because everyone seems to keep sliding off understanding or caring about the hard parts of the problem")

...

All of those seem like reasonable things for smart, thoughtful people to disagree on. I do think some disagreement about them feels fishy/sus to me, and I have my takes on them, but, I can see where you're coming from.

Three cruxes I still just don't really buy as decision-relevant:

"We wouldn't want to pause 30 years, and then do a takeoff very quickly – it's probably better to do a smoother takeoff." Yep, I agree. But, if you're in a position to decide-on-purpose how smooth your takeoff is, you can still just do the slower one later. (Modulo "the gameboard could change in 30 years", which makes more sense to me as a crux). I don't see this as really arguing at all against what I imagined the Treaty to be about.
"We need some kind of exit plan, the MIRI Treaty doesn't have one." I currently don't really buy that Plan A has more of one than the the MIRI Treaty. The MIRI treaty establishes an international governing body that makes decisions about how to change the regulations, and it's pretty straightforward for such an org to make judgment calls once people have started producing credible safety cases. I think imagining anything more specific than this feels pretty fake to me – that's a decision that makes more sense to punt to people who are more informed than us.
Shutdown is more politically intractable than Controlled Takeoff. I don't currently buy that this is true in practice. I don't think anyone is expecting to immediately jump to either a full-fledged version of Plan A, or a Global Shutdown. Obviously, for the near future, you try for whatever level of national and international cooperation you can get, build momentum, do the easy sells first, etc. I don't expect, in practice, Shutdown to be different from "you did all of Plan A, and, then, took like 2-3 more steps, and by the time you've implemented Plan A in it's entirety, it seems crazy to me to assume the next 2-3 steps are particularly intractable."
1. I totally buy "we won't even get to a fully fledged version of Plan A", but, that's not an argument for Plan A over Shutdown.
2. It feels like people are imagining "naive, poorly politically executed version of Shutdown, vs some savvily executed version of Plan A." I think there are reasonable reasons to think the people advocating Shutdown will not be savvy. But, those reasons don't extend to "insofar as you thought you could savvily advocate for Plan A, you shouldn't be setting your sites on Shutdown."

[-]Thomas Larsen3mo180

Thanks, I thought this was a helpful comment. Putting my responses inline in case it's helpful for people. I'll flag that I'm a bit worried about confirmation bias / digging my heels in: would love to recognize it if I'm wrong.

How bad is Chinese Superintelligence? For some people, it's a serious crux whether a China-run superintelligence would be dramatically worse in outcome than a democratic country.

This isn't a central crux for me I think. I would say that it's worse, but that I'm willing to make concessions here in order to make alignment more likely to go well

"The gameboard could change in all kinds of bad ways over 30 years." Nations or companies could suddenly pull out in a disastrous way. If things go down in the near future there's fewer actors to make deals with and it's easier to plan things out.

This is the main thing for me. We've done a number of wargames of this sort of regime and the regime often breaks down. (though there are things that can be done to make it harder to leave the regime, which I'm strongly in favor of).

Can we leverage useful work out of significantly-more-powerful-but-nonsuperhuman AIs? Especially since "the gameboard might change a lot", it's useful to get lots of safety research done quickly, and it's easier to do that with more powerful AIs. So, it's useful to continue to scale up until we've got the most powerful AIs can we can confidently control. (Whereas Controlled Takeoff skeptics tend to think AI that is capable of taking on the hard parts of AI safety research will already be too dangerous and untrustworthy)

Yep, I think we plausibly can leverage controlled AIs to do existentially useful work. But not I'm confident, and I am not saying that control is probably sufficient. I think superhuman isn't quite the right abstraction (as I think it's pretty plausible we can control moderately superhuman AIs, particularly only in certain domains.), but that's a minor point. I think Plan A attempts to be robust to the worlds where this doesn't work by just pivoting back to human intelligence augemntation or whatever.

Is there a decent chance an AI takeover is relatively nice? Giving the humans the Earth/solar system is just incredibly cheap from percentage-of-resources standpoint. This does require the AI to genuinely care about and respect our agency in a sort of complete way. But, it only has to care about us as a pretty teeny amount

This is an existential catastrophe IMO and should desperately avoided, even if they do leave us a solar system or w/e.

And then, the usual "how doomed are current alignment plans?". My impression is "Plan A" advocates are usually expecting a pretty good chance things go pretty well if humanity is making like a reasonably good faith attempt at controlled takeoff, whereas Controlled Takeoff skeptics are typically imagining "by default this just goes really poorly, you can tell because everyone seems to keep sliding off understanding or caring about the hard parts of the problem")

I think the thing that matters here is the curve of "likelihood of alignment success" vs "years of lead time burned at takeoff". We are attempting to do a survey of this among thinkers in this space who we most respect on this question, and I do think that there's substantial win equity moving from no lead time to years or decades of lead time. Of course, I'd rather have higher assurance, but I think that you really need to believe the very strong version of "current plans are doomed" to forego Plan A. I'm very much on board with "by default this goes really poorly".

Three cruxes I still just don't really buy as decision-relevant:
"We wouldn't want to pause 30 years, and then do a takeoff very quickly – it's probably better to do a smoother takeoff." Yep, I agree. But, if you're in a position to decide-on-purpose how smooth your takeoff is, you can still just do the slower one later. (Modulo "the gameboard could change in 30 years", which makes more sense to me as a crux). I don't see this as really arguing at all against what I imagined the Treaty to be about.

huh, this one seems kinda relevant to me.

"We need some kind of exit plan, the MIRI Treaty doesn't have one." I currently don't really buy that Plan A has more of one than the the MIRI Treaty. The MIRI treaty establishes an international governing body that makes decisions about how to change the regulations, and it's pretty straightforward for such an org to make judgment calls once people have started producing credible safety cases. I think imagining anything more specific than this feels pretty fake to me – that's a decision that makes more sense to punt to people who are more informed than us.

If the international governing body starts approving AI development, then aren't we basically just back in the plan A regime? Ofc I only think that scaling should happen once people have credible safety cases. I just think control based safety cases are sufficient. I think that we can make some speculations about what sorts of safety cases might work and which ones don't. And I think that the fact that the MIRI treaty isn't trying to accelerate prosaic safety / substnatially slows it down is a major point against it, which is reasonable to summarize as them not having a good exit plan.

I'm very sypathetic to pausing until we have uploads / human intelligence augmentation, that seems good, and I'd like to do that in a good world.

Shutdown is more politically intractable than Controlled Takeoff. I don't currently buy that this is true in practice. I don't think anyone is expecting to immediately jump to either a full-fledged version of Plan A, or a Global Shutdown. Obviously, for the near future, you try for whatever level of national and international cooperation you can get, build momentum, do the easy sells first, etc. I don't expect, in practice, Shutdown to be different from "you did all of Plan A, and, then, took like 2-3 more steps, and by the time you've implemented Plan A in it's entirety, it seems crazy to me to assume the next 2-3 steps are particularly intractable."
I totally buy "we won't even get to a fully fledged version of Plan A", but, that's not an argument for Plan A over Shutdown.
It feels like people are imagining "naive, poorly politically executed version of Shutdown, vs some savvily executed version of Plan A." I think there are reasonable reasons to think the people advocating Shutdown will not be savvy. But, those reasons don't extend to "insofar as you thought you could savvily advocate for Plan A, you shouldn't be setting your sites on Shutdown."

This one isn't a crux for me I think. I do probably think it's a bit more politically intractable, but even that's not obvious because I think shutdown would play better with the generic anti-tech audience, while Plan A (as currently written) involves automating large fractions of the economy before handoff.

[-]Raemon3mo31

If the international governing body starts approving AI development, then aren't we basically just back in the plan A regime?

I think MIRI's plan is clearly meant to eventually build superintelligence, given that they've stated various times it'd be an existential catastrophe if this never happened – they just think it should happen after a lot of augmentation and carefulness.

A lot of my point here is I just don't really see much difference between Plan A and Shutdown except for "once you've established some real control over AI racing, what outcome are you shooting for nearterm?", and I'm confused why Plan A advocates see it as substantially different.

(Or, I think the actual differences are more about "how you expect it to play out in practice, esp. if MIRI-style folk end up being a significant political force." Which is maybe fair, but, it's not about the core proposal IMO.)

"We wouldn't want to pause 30 years, and then do a takeoff very quickly – it's probably better to do a smoother takeoff."
> huh, this one seems kinda relevant to me.

Do you understand why I don't understand why you think that? Like, the MIRI plan is clearly aimed at eventually building superintelligence (I realize the literal treaty doesn't emphasize that, but, it's clear from very public writing in IABIED that it's part of the goal), and I think it's pretty agnostic over exactly how that shakes out.

[-]aog3mo20

We've done a number of wargames of this sort of regime and the regime often breaks down.

I'd be curious to hear how it breaks down.

[-]Raemon3mo*20

Is there a decent chance an AI takeover is relatively nice?
> This is an existential catastrophe IMO and should desperately avoided, even if they do leave us a solar system or w/e.

Actually, I think this maybe wasn't cruxy for anyone. I think @ryan_greenblatt said he agreed it didn't change the strategic picture, it just changed some background expectations.

(I maybe don't believe him that he doesn't think it affects the strategic picture? It seemed like his view was fairly sensitive to various things being like 30% likely instead of like 5% or <1%, and it feels like it's part of an overall optimistic package that adds up to being more willing to roll the dice on current proposals? But, I'd probably believe him if he reads this paragraph and is like "I have thought about whether this is a (maybe subconscious) motivation/crux and am confident it isn't)

[-]ryan_greenblatt3mo30

Not a crux for me ~at all. Some upstream views that make me think "AI takeover but humans stay alive" is more likely and also make me think avoiding AI takeover is relatively easier might be a crux.

[-]Lukas Finnveden3mo20

I maybe don't believe him that he doesn't think it affects the strategic picture? It seemed like his view was fairly sensitive to various things being like 30% likely instead of like 5% or <1%, and it feels like it's part of an overall optimistic package that adds up to being more willing to roll the dice on current proposals?

Insofar as you're just assessing which strategy reduces AI takeover risk the most, there's really no way that "how bad is takeover" could be relevant. (Other than, perhaps, having implications for how much political will is going to be available.)

"How bad is takeover?" should only be relevant when trading off "reduced risk of AI takeover" with affecting some other trade-off. (Such as risk of earth-originating intelligence going extinct, or affecting probability of US dominated vs. CCP dominated vs. international cooperation futures.) So if this was going to be a crux, I would bundle it together with your Chinese superintelligence bullet point, and ask about the relative goodness of various aligned superintelligence outcomes vs. AI takeover. (Though seems fine to just drop it since Ryan and Thomas don't think it's a big crux. Which I'm also sympathetic to.)

[-]Thomas Larsen3mo11-2

Thanks for writing this paper.

Why do we need to halt for so long? In short, AI alignment is probably a difficult technical problem, and it is hard to be confident about solutions. Pausing for a substantial period gives humanity time to be careful in this domain rather than rushing. Pausing for a shorter amount of time (e.g., 5 years) might reduce risk substantially compared to the current race, but it also might not be enough. In general, world leaders should weigh the likelihood and consequence of different risks and benefits against each other for different lengths of a pause. Section 2 discusses some of the reasons why the AI alignment problem may be difficult. Generally, experts vary in their estimates of the difficulty of this problem and the likelihood of catastrophe, with some expecting the problem to be very hard [Grace et al., 2025, ControlAI, 2025, Wikipedia, 2025]. Given this uncertainty about how difficult this problem is, we should prepare to pause for a long time, 8 in case more effort is needed. Our agreement would allow for a long halt, even if world leaders later came to believe a shorter one was acceptable. We also contend that there are other problems which need to be addressed during a halt even if one presumes that alignment can be quickly solved, and these problems are also of an uncertain difficulty. These include risks of power concentration, human misuse of AIs, mass-unemployment, and many more. World leaders will likely want at least years to understand and address these problems. The international agreement proposed in this paper is primarily motivated by risks from AI misalignment, but there are numerous other risks that it would also help reduce.

I agree with a lot of this, but I do think this paper ambiguates a bit between "we need to halt for decades" and "we might need to halt for decades". I agree with the latter but not the former,.

I also think that in the cases where alignment is solvable sooner, then it might matter a lot that we accelerated alignment in the meantime.

I get that it's scary to have to try to bifurcate alignment and capabilities progress because governments are bad at stuff, but I think it's a mistake to ban AI research, because it will have very negative consequences on the rate of AI alignment research. I think that we should try hard to figure out what can be done safely (e.g. via things like control evals), and then do alignment work on models that we can empirically study that are as capable as possible while incurring minimal risks.

Serial time isn't the only input that matters: having smarter AIs is helpful as research assistants and to do experiments directly on the smarter AIs, having lots of compute to do alignment experiments is nice, having lots of money and talent going into AI alignment is helpful. I think you guys should emphasize and think about the function you are trying to maximize more clearly (i.e. how much do you really care about marginal serial time vs marginal serial time with smart AIs to do experiments on).

[-]yams3mo40

I'm struck by how many of your cruxes seem like things that it would actually just be in the hands of the international governing body to control. My guess is, if DARPA has a team of safety researchers, and they go to the international body, and they're like 'we're blocked by this set of experiments* that takes a large amount of compute; can we please have more compute?', and then the international body gets some panel of independent researchers to confirm that this is true, and the only solution is more compute for that particular group of researchers, they commission a datacenter or something so that the research can continue.

Like, it seems obviously true to me that people (especially in government/military) will continue working on the problem at all, and that access to larger amounts of resources for doing that work is a matter of petitioning the body. It feels like your plan is built around facilitating this kind of carveout, and the MIRI plan is built around treating it as the exception that it is (and prioritizing gaining some centralized control over AI as a field over guaranteeing to-me-implausible rapid progress toward the best possible outcomes).

*which maybe is 'building automated alignment researchers', but better specified and less terrifying

[-]Raemon3mo*20

Subcruxes of mine here:

I think, by the time any kind of international deal goes through, we will basically have already reached the frontier of what was safe, so it feels like splitting hairs, discussing whether the regime should want more capabilities in the immediate future.
- (surely it's going to take at least a year, which is a pretty long time, it's probably not going to happen at all, and 3 years to even get started is more like what I imagine when I imagine a very successful overton-smashing campaign)
I think there's tons of research augmentation you can do 1-3-year-from-now AI, that are more about leveraging the existing capabilities than getting fundamentally smarter
I don't buy that there's a way to get end-to-end research, or "fundamentally smarter" research assistants, that aren't unacceptably dangerous at scale. ~~(i.e. I believe you can train on more specific ).~~ (man I have no idea what I meant by that sentence fragment. sorry, person who reacted with "?")

Do those feel like subcruxes for you, or are there other ones?

[-]TsviBT3mo104

Nice. I've only read the quoted paragraphs from the main body:

Despite prohibitions on large-scale training, AIs could continue getting more capable via improvements to the algorithms and software used to train them. Therefore, the coalition adopts appropriate restrictions on research that contributes to frontier AI development or research that could endanger the verification methods in the agreement. These restrictions would cover certain machine learning-related research and may eventually expand to include other AI paradigms if those paradigms seem likely to lead to ASI. Coalition members draw on their experience restricting research in other dangerous fields such as nuclear and chemical weapons. These research restrictions aim to be as narrow as possible, preventing the creation of more capable general AI models while still allowing safe and beneficial narrow AI models to be created. The coalition encourages and makes explicit carve-outs for safe, application-specific AI activities, such as self-driving cars and other uses that provide benefits to society.

Verification that members are adhering to research restrictions is aided by the fact that relatively few people have the skills to contribute to this research—likely only thousands or tens of thousands, see Appendix A, Article IX. Nations verify compliance by way of intelligence gathering, interviews with the researchers, whistleblower programs, and more. This verification aims to use non-invasive measures to ensure that researchers are not working on restricted topics. Additionally, inspectors verify that the medium-scale training allowed by the agreement uses only approved methods (e.g., from a Whitelist) and does not make use of any novel AI methods or algorithms; these would be evidence that restricted research had taken place.

And then skimmed the appendix "ARTICLE VIII — Restricted Research: AI Algorithms and Hardware". I'm very happy to see any kind of proposal that includes a serious attempt to ban AGI research. I want to also mention "The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).".

My low-effort attempted very short summary of this aspect of the treaty is:

Part of the treaty sets up a committee that decides what counts as AGI research. That stuff is banned.

Is that basically right? (And there's more discussion of how you'd enforce that, and a few examples of non-AGI research are given, and some comparisons with nuclear bombs and Asilomar are given.)

I don't have a better idea, so this seems as promising as I can think of. I'm curious though for ideas about

Some sort of category that could actually work here.
- You need a category that's politically viable and also captures most or all of AGI research.
- It's ok but sad and also maybe politically more difficult to exclude a bunch of stuff that isn't AGI research.
- For example (some bad ideas to spur better ideas from others):
  - You could say "you have to declare a specific task you're trying to solve, like protein interactions or self-driving cars; the task can't be super broad / heterogeneous, like predicting text".
  - You could say "no running searches over very algorithmically rich spaces, or strongly enriching or speeding up existing searches".
  - You could say "your system has to demonstrate by lesion experiments that it requires domain-specific knowledge".
  - You could say "if other researchers can copy-paste your system into another domain, and it works well, it's banned".
  - You could say "if our red team is able to use your system to do X and Y and Z, then it's banned".
  - You could say "you have to state reasons why your research is interesting/promising, and those reasons have to sound like the promisingness comes from something domain-specific".
- It might be possible to targetedly make some boundaries more coordinatable-on. Cf. https://tsvibt.blogspot.com/2025/11/constructing-and-coordinating-around.html
And/or, some way of having really good governance.
- It seems like if the committee goes awry, it's very easy to pretend you've drawn a good boundary, but you haven't.
- It seems likely for the committee to go awry, given how much pressure there could be for less restriction.
- Curious for any thoughts on how to do that, e.g. examples of governance with significant responsibility to develop unclear policies that went well despite strong pressures.

[-]Aaron_Scher3mo40

Hey Tsvi, thanks for the ideas! The short answer is that we don't have good answers about what the details of Article VIII should be. It's reasonably likely that I will work on this as my next big project and that I'll spend a couple of months on it. If so, I'll keep these ideas in mind—they seem like a reasonable first pass.

[-]anaguma3mo80

Coalition members would each consolidate AI chips into a smaller number of declared data centers, where their usage can be declared and monitored for assurance that they are only being used for allowed activities. The number of chips permitted in any one unmonitored facility would be strictly limited. (We suggest the equivalent of 16 H100 chips, a conglomeration that would cost approximately $500,000 USD in 2025).

This seems a bit too low to me. According to epoch AI, the largest training runs currently take 5e26 FLOP, and so at 50% utilization this cluster would take^[1] ~ 5e26/(0.5*16*1e15) s = 6.25e10 s = 1981 years, far longer than the expected lifetime of the ban. Since current training runs have not yet caused a catastrophe, we should allow runs with less effective compute. For example, if want it to take ~50 years to reach frontier LLMs, and conservatively assuming a 10x algorithmic improvement over existing frontier methods, we should allow ~64 H100 GPU clusters.

^{^}
Not to mention that frontier training runs would be impossible on this cluster due to memory constraints.

[-]Aaron_Scher3mo122

Thanks for your comment! Conveniently, we wrote this post about why we pick the training compute thresholds we did in the agreement (1e24 is the max). I expect you will find it interesting, as it responds to some of what you're saying! The difference between 1e24 and 5e26 largely explains our difference in conclusions about what a reasonable unmonitored cluster size should be, I think.

You're asking the right question here, and it's one we discuss in the report a little (e.g., p. 28, 54). One small note on the math is that I think it's probably better to use FP8 (so 2e15 theoretical FLOP per H100 due to the emergence of FP8 training).

[-]anaguma3mo30

Yep, the threshold seems more reasonable now. Thanks for your work!

[-]Gurkenglas3mo40

We've had more than a 10x algorithmic improvement over the last 50 years.

[-]Raemon3mo40

Note: you can get a nice-to-read version of the Treaty on https://www.ifanyonebuildsit.com/treaty I'm not sure if there's any notable differences between that and the paper but I'm guessing it's mostly the same.

[-]David Abecassis3mo92

This paper represents an iteration over the version presented there. There are some key differences, and they include:

The governance approach has shifted away from the creation of a highly centralized international authority (which includes centralizing the verification efforts) to an international body which aids the coordination between states but leaves the verification heavy lift to the key members (i.e. US and China, perhaps others) and empowers their pre-existing intelligence gathering capacities.

It's an agreement, not a treaty. This is mostly rhetorical. Maybe the next version will be called a deal.

Our paper includes more appendices which we think are quite valuable. In particular, we present a staged approach and the agreement is merely the end step (or penultimate step before capabilities progress resumes).

We introduced a whitelist to the restricted research approach; it spells out things which people are explicitly allowed to do and gets updated over time.

[-]J Bostock3mo31

Excellent! This is a thing I will feel good about pointing people towards when they ask "but how would we pause AI development?"

[-]Raemon2mo20

I find the PDF kinda annoying to read, could we copy it over here?

[-]Jesper L.3mo10

The content in article XIII to XV is absolutely crucial to get roughly right.

From game theory perspective clear sanctions are extremely important.

I think even more explicity and specifics would be good there. But I could guess that the terms need to be heavily negotiated to pass.

// Regarding the whole core idea that signatories can regulate non-signatories, this seems absolutely wild politically. If the idea centers around a power coalition, then the withdrawal terms seem "undercooked" and the whole idea seems unrealistic to pass in the current era.

I mean, playing devil's advocat here, what superpower would sign on to shooting itself in the foot like this? Enabling an entity to which it yields its own power, which if successful, from which they can never take the power back?

USSR didn't even agree to US giving up its nukes to UN council back post-WW2. And current US seem very self-centered as well.

Do you have some estimation of the chances of the withdrawal terms passing?

[-]Aaron_Scher3mo30

Thanks for the comment. I think this is definitely one of the places that would both receive lots of negotiation, and where we don't have particular expertise. Given my lack of expertise, I don't have much confidence in the particular withdrawal terms.

One of the frames that I think is really important here is that we are imagining this agreement is implemented in a situation where (at least some) world leaders are quite concerned with ASI risk. As such, countries in the agreement do a bunch of non-proliferation-like activities to prevent non-parties from getting AI infrastructure. So the calculus looks like "join the agreement and get access to AI chips to run existing AI models" vs. "don't join the agreement and either don't get access to AI chips or be at risk of coalition parties disrupting your AI activities". That is, I don't expect 'refusing to sign' or withdrawing to be particularly exciting opportunities, given the incentives at play. (and this is more a factor of the overall situation and risk awareness among world leaders, rather than our particular agreement)

[-]Jesper L.3mo30

Yes, I see. This is exciting work. I hope you collect and receive a lot of feedback on those articles!

My main worry is hat you won't have buy-in to create an agreement in the first place. That's what I was trying to point at in the second half of my comment.

Let's start from the inception of the agreement.

The core idea is a bilateral inception between leading states, after all. Say China wants to adopt this. My question is: Why do you assume US would join? They could just accuse China of trying to slow them down.

But okay, say the (next) president of US is worried about ASI risk and so is China's leader. How do they pitch giving up power permanently to the rest of the politicians and business leaders that back them? What's the incentive to adopt this in the first place, before it has strong international backing?

The backlash could also be horrible from parties who do not worry, and who see it as a power grab or a blatant violation of international order. Honestly, even if they agree in spirit, it might be a hard sell as-is.

Trust is hard-won across countries and continents, when you lack a shared framework to build on, even during business as usual. I can say that even from my own humble experience. (In my work I coordinate with stakeholders in up to 15 different countries on a weekly basis.)

Moderation Log