So, Korea and Japan are building over 100x as much ship per worker-year as the US
Can you spell out why you think this? Do Korea and Japan produce ~100x more ships than the US?
I think a central consideration should be what can actually be enforced, i.e., if it were true that ASI can be built with 1e20 FLOPs / 27-H100-hours (which I think is possible); this would be very impractical to enforce, and so we should build our plan around not having to enforce that.
Thanks for writing this paper.
Why do we need to halt for so long? In short, AI alignment is probably a difficult technical problem, and it is hard to be confident about solutions. Pausing for a substantial period gives humanity time to be careful in this domain rather than rushing. Pausing for a shorter amount of time (e.g., 5 years) might reduce risk substantially compared to the current race, but it also might not be enough. In general, world leaders should weigh the likelihood and consequence of different risks and benefits against each other for different lengths of a pause. Section 2 discusses some of the reasons why the AI alignment problem may be difficult. Generally, experts vary in their estimates of the difficulty of this problem and the likelihood of catastrophe, with some expecting the problem to be very hard [Grace et al., 2025, ControlAI, 2025, Wikipedia, 2025]. Given this uncertainty about how difficult this problem is, we should prepare to pause for a long time, 8 in case more effort is needed. Our agreement would allow for a long halt, even if world leaders later came to believe a shorter one was acceptable. We also contend that there are other problems which need to be addressed during a halt even if one presumes that alignment can be quickly solved, and these problems are also of an uncertain difficulty. These include risks of power concentration, human misuse of AIs, mass-unemployment, and many more. World leaders will likely want at least years to understand and address these problems. The international agreement proposed in this paper is primarily motivated by risks from AI misalignment, but there are numerous other risks that it would also help reduce.
I agree with a lot of this, but I do think this paper ambiguates a bit between "we need to halt for decades" and "we might need to halt for decades". I agree with the latter but not the former,.
I also think that in the cases where alignment is solvable sooner, then it might matter a lot that we accelerated alignment in the meantime.
I get that it's scary to have to try to bifurcate alignment and capabilities progress because governments are bad at stuff, but I think it's a mistake to ban AI research, because it will have very negative consequences on the rate of AI alignment research. I think that we should try hard to figure out what can be done safely (e.g. via things like control evals), and then do alignment work on models that we can empirically study that are as capable as possible while incurring minimal risks.
Serial time isn't the only input that matters: having smarter AIs is helpful as research assistants and to do experiments directly on the smarter AIs, having lots of compute to do alignment experiments is nice, having lots of money and talent going into AI alignment is helpful. I think you guys should emphasize and think about the function you are trying to maximize more clearly (i.e. how much do you really care about marginal serial time vs marginal serial time with smart AIs to do experiments on).
Thanks, I thought this was a helpful comment. Putting my responses inline in case it's helpful for people. I'll flag that I'm a bit worried about confirmation bias / digging my heels in: would love to recognize it if I'm wrong.
How bad is Chinese Superintelligence? For some people, it's a serious crux whether a China-run superintelligence would be dramatically worse in outcome than a democratic country.
This isn't a central crux for me I think. I would say that it's worse, but that I'm willing to make concessions here in order to make alignment more likely to go well
"The gameboard could change in all kinds of bad ways over 30 years." Nations or companies could suddenly pull out in a disastrous way. If things go down in the near future there's fewer actors to make deals with and it's easier to plan things out.
This is the main thing for me. We've done a number of wargames of this sort of regime and the regime often breaks down. (though there are things that can be done to make it harder to leave the regime, which I'm strongly in favor of).
Can we leverage useful work out of significantly-more-powerful-but-nonsuperhuman AIs? Especially since "the gameboard might change a lot", it's useful to get lots of safety research done quickly, and it's easier to do that with more powerful AIs. So, it's useful to continue to scale up until we've got the most powerful AIs can we can confidently control. (Whereas Controlled Takeoff skeptics tend to think AI that is capable of taking on the hard parts of AI safety research will already be too dangerous and untrustworthy)
Yep, I think we plausibly can leverage controlled AIs to do existentially useful work. But not I'm confident, and I am not saying that control is probably sufficient. I think superhuman isn't quite the right abstraction (as I think it's pretty plausible we can control moderately superhuman AIs, particularly only in certain domains.), but that's a minor point. I think Plan A attempts to be robust to the worlds where this doesn't work by just pivoting back to human intelligence augemntation or whatever.
Is there a decent chance an AI takeover is relatively nice? Giving the humans the Earth/solar system is just incredibly cheap from percentage-of-resources standpoint. This does require the AI to genuinely care about and respect our agency in a sort of complete way. But, it only has to care about us as a pretty teeny amount
This is an existential catastrophe IMO and should desperately avoided, even if they do leave us a solar system or w/e.
And then, the usual "how doomed are current alignment plans?". My impression is "Plan A" advocates are usually expecting a pretty good chance things go pretty well if humanity is making like a reasonably good faith attempt at controlled takeoff, whereas Controlled Takeoff skeptics are typically imagining "by default this just goes really poorly, you can tell because everyone seems to keep sliding off understanding or caring about the hard parts of the problem")
I think the thing that matters here is the curve of "likelihood of alignment success" vs "years of lead time burned at takeoff". We are attempting to do a survey of this among thinkers in this space who we most respect on this question, and I do think that there's substantial win equity moving from no lead time to years or decades of lead time. Of course, I'd rather have higher assurance, but I think that you really need to believe the very strong version of "current plans are doomed" to forego Plan A. I'm very much on board with "by default this goes really poorly".
Three cruxes I still just don't really buy as decision-relevant:
"We wouldn't want to pause 30 years, and then do a takeoff very quickly – it's probably better to do a smoother takeoff." Yep, I agree. But, if you're in a position to decide-on-purpose how smooth your takeoff is, you can still just do the slower one later. (Modulo "the gameboard could change in 30 years", which makes more sense to me as a crux). I don't see this as really arguing at all against what I imagined the Treaty to be about.
huh, this one seems kinda relevant to me.
"We need some kind of exit plan, the MIRI Treaty doesn't have one." I currently don't really buy that Plan A has more of one than the the MIRI Treaty. The MIRI treaty establishes an international governing body that makes decisions about how to change the regulations, and it's pretty straightforward for such an org to make judgment calls once people have started producing credible safety cases. I think imagining anything more specific than this feels pretty fake to me – that's a decision that makes more sense to punt to people who are more informed than us.
If the international governing body starts approving AI development, then aren't we basically just back in the plan A regime? Ofc I only think that scaling should happen once people have credible safety cases. I just think control based safety cases are sufficient. I think that we can make some speculations about what sorts of safety cases might work and which ones don't. And I think that the fact that the MIRI treaty isn't trying to accelerate prosaic safety / substnatially slows it down is a major point against it, which is reasonable to summarize as them not having a good exit plan.
I'm very sypathetic to pausing until we have uploads / human intelligence augmentation, that seems good, and I'd like to do that in a good world.
Shutdown is more politically intractable than Controlled Takeoff. I don't currently buy that this is true in practice. I don't think anyone is expecting to immediately jump to either a full-fledged version of Plan A, or a Global Shutdown. Obviously, for the near future, you try for whatever level of national and international cooperation you can get, build momentum, do the easy sells first, etc. I don't expect, in practice, Shutdown to be different from "you did all of Plan A, and, then, took like 2-3 more steps, and by the time you've implemented Plan A in it's entirety, it seems crazy to me to assume the next 2-3 steps are particularly intractable."
- I totally buy "we won't even get to a fully fledged version of Plan A", but, that's not an argument for Plan A over Shutdown.
- It feels like people are imagining "naive, poorly politically executed version of Shutdown, vs some savvily executed version of Plan A." I think there are reasonable reasons to think the people advocating Shutdown will not be savvy. But, those reasons don't extend to "insofar as you thought you could savvily advocate for Plan A, you shouldn't be setting your sites on Shutdown."
This one isn't a crux for me I think. I do probably think it's a bit more politically intractable, but even that's not obvious because I think shutdown would play better with the generic anti-tech audience, while Plan A (as currently written) involves automating large fractions of the economy before handoff.
I think I mostly am on board with this comment. Some thoughts:
Before I did a rapid-growth of capabilities, I would want a globally set target of "we are able to make some kind of interpretability strides or evals that let us make better able to predict the outcome of the next training run." (
Another reaction I have is that a constraint to coordination will probably be "is the other guy doing a blacksite which will screw us over". So I think there's a viability bump at the point of "allow legal capabiliites scaling at least as fast as the max size blacksite that you would have a hard time detecting".
- I would want to do at least some early global pause on large training runs, to check if you are actually capable of doing that at all. (in conjunction with some efforts attempting to build international goodwill about it)
So I think this paragraph isn't really right, because "slowdown' != 'pause', and slowdowns might still be really really helpful and enough to get you a long way.
- One of the more important things to do as soon as it's viable, is to stop production of more compute in an uncontrolled fashion. (I'm guessing this plays out with some kind of pork deals for nVidia and other leaders[2], where the early steps are 'consolidate compute', and then them producing the chips that are more monitorable, and which they get to make money from, but also are sort of nationalized). This prevents a big overhang.
I actually currently think that you want to accelerate compute production, because hardware scaling seems safer than software scaling. I'm not sure exactly what you mean by "in an uncontrolled fashion".. if you mean "have a bunch of inspectors making sure the flow of new chips isn't being smuggled to illegal projects", then I agree with this, on my initial read I thought you meant something like "pause chip production until they start producing GPUS with HEMs in them", which I think is probably bad.
In other words I think that you want to create a big compute overhang during a pause. The downside is obvious, but the upsides are:
(this comment might be confusing because I typed it quickly, happy to clarify if you want)
One framing that I think might be helpful for thinking about "Plan A" vs "shut it all down" is: "Suppose that you have the political will for an n-year slowdown, i.e. after n years, you are forced to handoff trust to superhuman AI systems (e.g. for n = 5, 10, 30). What should the capability progression throughout the slowdown be?" This framing forces a focus on the exit condition / plan to do handoff, which I think is an underdiscussed weakness of the "shut it all down" plan.
I think my gut reaction is that the most important considerations are: (i) there are a lot of useful things you can do with the AIs, so I want more time with the smarter AIs, and (ii) I want to scale through the dangerous capability range slowly and with slack (as opposed to at the end of the slowdown).
Ofc this framing also ignores some important considerations, e.g. choices about the capability progression effect both difficulty of enforcement/verification (in both directions: AI lie detectors/ai verification is helpful, while making AIs closer to the edge is a downside), as well as willingness to pay over time (e.g. scary demos or AI for epistemics might help increase WTP)
However, I also think that open agency approaches to transparency face two key difficulties: competitiveness and safety-of-the-components.[18]
I think a third key difficulty with this class of approaches is something like "emergent agency", i.e. that each of the individual components seem to be doing something safe, but when you combine several of the agents, you get a scary agent. Intuition pump: each of the weights in a NN is very understandable (it's just a number) and not doing dangerous scheming, but if you compose them it might be scary. Analagously, each of the subagents in the open agency AI might not be scheming, but a collection of these agents might be scheming.
Understanding the communications between the components seems like it may or may not be sufficient to mitigate this failure mode. If the understanding is "local", i.e. looking at a particular chain of reasoning and verifying that it is valid, this is probably not sufficient to mitigate the problem, as scary reasoning might be made up of a bunch of small chains of local valid reasoning that looks safe. So I think you want something like a reasonable global picture of the reasoning that the open agent is doing in order to mitigate "emergent agency".
I think this is kind of related to types of the "safety of the components" failure mode you talk about, particularly in the analogue to the corporation passing memos around, but the memos not corresponding to the "real reasoning" going on. However, it could be that the "real reasoning" emerges on a higher level of abstraction than the individual agents.
This sort of threat model leads me to think that if we're aiming for this sort of open agency, we shouldn't do end-to-end training of the whole system, lest we incentivize "emergent agency", even if we don't make the individual components less safe.
One upside of shut it all down is that it does in fact buy more time: in Plan A it is difficult to secure algorithmic secrets without extremely aggressive security measures, hence any rogue projects (e.g. nation state blacksites) can just coast off the algos developed by the verified projects. Then, a few years in, they fire up their cluster and try to do an intelligence explosion with the extra algorithmic progress.
>superintelligence
Small detail: My understanding of the IABIED scenario is that their AI was only moderately superhuman, not superintelligent
>proper global UBI is *enormously* expensive (h/t @yelizarovanna)
This seems wrong. There will be huge amounts of wealth post-ASI. Even a relatively small UBI (e.g. 1% of AI companies) will be enough to support way better QOL for everyone on earth. Moreover, everything will become way cheaper because of efficiency gains downstream of AI. Even just at AGI, I think it's plausible that physical labour is something like 10x cheaper and cognitive labour is something like 1000x cheaper.