How to think about slowing AI

Zach Stein-Perlman

This post is part of the EA Forum AI Pause Debate Week. Please see this sequence for other posts in the debate.

Slowing AI^[1] is many-dimensional. This post presents variables for determining whether a particular kind of slowing improves safety. Then it applies those variables to evaluate some often-discussed scenarios.

Variables

Many variables affect whether an intervention improves AI safety.^[2] Here are four crucial variables at stake when slowing AI progress:^[3]

Time until critical systems are deployed.^[4] More time seems good for alignment research, governance, and demonstrating risks of powerful AI.
Length of crunch time. In this post, "crunch time" means the time near critical systems before they are deployed.^[5] More time until critical systems are deployed is good; more such time near critical systems is especially good. A lab is more likely to (be able to) pay an alignment tax for a critical system if it has more time to pay the tax for that system. Time near critical systems also seems especially good for alignment research and potentially for demonstrating risks of powerful AI and doing governance.
Safety level of labs that develop critical systems.^[6] This can be improved both by making labs safer and by differentially slowing unsafe labs.
Propensity to coordinate or avoid racing.^[7] This is associated with many factors, but plausible factors relevant to slowing AI seem to be there are few leading labs, they like/trust each other, and they are all in the same country (or at least allied countries) (in part because regulation is one possible cause of not-racing).

One lab's progress, especially on the frontier, tends to boost other labs. Labs leak their research both intentionally (publishing research and deploying models) and unintentionally.

Some interventions would differentially slow relatively safe labs (relevant to 3). Some interventions (especially policies that put a ceiling on AI capabilities or inputs) would differentially slow leading labs (relevant to 4). Both outcomes are worse than uniform slowing and potentially net-negative.

If something slows progress temporarily, after it ends progress may gradually partially catch up to the pre-slowing trend, such that powerful AI is delayed but crunch time is shortened (relevant to 1 and 2).^[8]

Coordination may facilitate more coordination later (relevant to 4).

Current leading labs (Google DeepMind, OpenAI, and maybe Anthropic) seem luckily safety-conscious (relevant to 3). Current leading labs seem luckily concentrated in America (relevant to 4).^[9]

Some endogeneities in AI progress may give rise to considerations about the timing of slowing. For example, the speed at which the supply of (ML training) compute responds to (expected) demand determines the effect of slowing soon on future supply. Or perhaps slowing affects the distribution of talent between dangerous AI paths, safe AI paths, and non-AI stuff. Additionally, some kinds of slowing increase or decrease the probability of similar slowing later.

Scenarios

Magic uniform slowing of all dangerous AI: great. This delays dangerous AI and lengthens crunch time. It has negligible downside.

A leading safety-conscious lab slows now, unilaterally: bad. This delays dangerous AI slightly. But it makes the lab irrelevant, thus making the labs that develop critical systems less safe and making the lab unable to extend crunch time by staying at the frontier for now and slowing later.

All leading labs coordinate to slow during crunch time: great. This delays dangerous AI and lengthens crunch time. Ideally the leading labs slow until risk of inaction is as great as risk of action on the margin, then deploy critical systems.

All leading labs coordinate to slow now: bad. This delays dangerous AI. But it burns leading labs' lead time, making them less able to slow progress later (because further slowing would cause them to fall behind, such that other labs would drive AI progress and the slowed labs' safety practices would be irrelevant).

Strong global treaty: great. A strong global agreement to stop dangerous AI, with good operationalization of 'dangerous AI' and strong verification, would seem to stop labs from acting unsafely^[10] and thus eliminate AI risk. The downside is the risk of the treaty collapsing and progress being faster and distributed among more labs and jurisdictions than otherwise.

Strong US regulation:^[11] good. Like "strong global treaty," this stops labs from acting unsafely—but not in all jurisdictions. Insofar as this differentially slows US AI progress, it could eventually cause AI progress to be driven by labs outside the regulation's reach.^[12] If so, the regulation—and the labs it slowed—would cease to be relevant, and it would likely have been net-negative: it would cause critical systems to be created by labs other than the relatively-safety-conscious currently-leading ones and cause leading labs to be more globally diffuse.

US moratorium now: bad. A short moratorium (unless succeeded by a strong policy regime) would slightly delay dangerous AI on net, but also cause progress to be faster for a while after it ends (when AI is stronger and so time is more important), increase the number of leading labs (especially by adding leading labs outside the US), and result in less-safe leading labs (because current leading labs are relatively safety-conscious). A long moratorium would delay dangerous AI, but like in "strong US regulation" the frontier of AI progress would eventually be surpassed by labs outside the moratorium's reach.

Which scenarios are realistic; what interventions are tractable? These questions are vital for determining optimal actions, but I will not consider them here.

Thanks to Rose Hadshar, Harlan Stewart, and David Manheim for comments on a draft.

^{^}
That is, slowing progress toward dangerous AI, or AI that would cause an existential catastrophe. Many kinds of AI seem safe, such as vision, robotics, image generation, medical imaging, narrow game-playing, and prosaic data analysis—maybe everything except large language models, some bio/chem stuff, and some reinforcement learning. Note that in this post, I assume that AI safety is sufficiently hard that marginal changes in my variables are very important.
^{^}
This post is written from the perspective that powerful AI will eventually appear and AI safety is mostly about increasing the probability that it will be aligned. Note that insofar as other threats arise before powerful AI or intermediate AI systems pose threats, it's better for powerful AI to arrive faster—but I ignore this here.
^{^}
See my Slowing AI: Foundations for more.
^{^}
In this post, a critical system is one whose deployment would cause an existential catastrophe if misaligned or be able to execute a pivotal act if aligned. This concept is a simplification: capabilities that could cause catastrophe are not identical to capabilities that could execute a pivotal act, 'cause catastrophe' and 'execute a pivotal act' depend on not just the system but also the world, 'catastrophe or not' and 'pivotal act or not' aren't really binary, and deployment is not binary. Nevertheless, it is a useful concept.
^{^}
This concept is a simplification insofar as "near critical systems" is not binary. Separately, note that some interventions could lengthen total time to critical systems but reduce crunch time or vice versa. For example, slowing now in a way that causes progress to partially catch up to the old trend later would lengthen total time but reduce crunch time.
Separately, I believe we are not currently in crunch time. I expect we will be able to predict crunch time decently well (say) a year in advance by noticing AI systems' near-dangerous capabilities.
^{^}
This concept is a simplification: non-lab actors may be central to safety, especially the creators of tools/plugins/scaffolding/apps to integrate with ML models.
^{^}
The other variables are implicitly by default, without much coordination.
^{^}
See my Cruxes for overhang.
^{^}
Coordination seems easier if leading labs are concentrated in a single state, in part because it can be caused by regulation. (Additionally, the AI safety community has relatively more influence over government in the US, so US regulatory effectiveness and thus US lead is good, all else equal.)
Observations about current leads are relevant insofar as (1) those leads will be sustained over time and (2) dangerous AI is sufficiently close that current leaders are likely to be leaders in crunch time by default.
On the risk of differentially slowing US labs, see my Cruxes on US lead for some domestic AI regulation.
^{^}
Or in terms of the above variables, a strong global treaty would delay dangerous AI, cause labs to be safer, and (insofar as it discriminates between safe and unsafe labs) differentially slow unsafe labs.
^{^}
I imagine "strong global treaty" and "strong US regulation" as including miscellaneous safety standards/regulations but focusing on oversight of large training runs, enforcing a ceiling on training compute and/or doing model evals during large training runs and stopping runs that fail an eval until the lab can ensure the model is safe.
^{^}
Labs outside US regulation's reach could eventually dominate AI progress due to some combination of the following (overlapping):
- The US fails to get a large coalition to join it
- Labs in coalition states can effectively move to non-coalition states to escape the regulation
- Labs in non-coalition states can quickly catch up to the frontier given slowed progress in the coalition
- Coalition export controls fail to deny compute to labs in non-coalition states
- Other attempted extraterritorialization of the regulation fails
- (Also just there being a substantial tradeoff between speed and (legible) safety, such that the regulation substantially slows the labs it affects)
- (Also just powerful AI being far off, such that outside labs have longer to catch up to the slowed coalition labs)

I think that it's important to be careful with elaborately modelled reasoning about this kind of thing, because the second order political effects are very hard to predict but also likely to be extremely important, possibly even more important than the direct effect on timelines in some scenarios. For instance, you mention leading labs slowing down as bad (because the leading labs are 'safety conscious' and slowing down dilutes their lead). In my opinion, this is a very simplistic model of the likely effects of this intervention. There are a few reasons for this:

Taking drastic unilateral action creates new political possibilities. A good example is Hinton and Bengio 'defecting' to advocating strongly for AI safety in public; I think this has had a huge effect on ML researchers and governments in taking things seriously, even though the direct effect on AI research is probably neglible. For instance, Hinton in particular made me personally take a much more serious look at AI safety related arguments, and this has influenced me trying to re-orient my career in a more safety-focused direction. I find it implausible that a leading AI lab shutting themselves down for safety reasons would have no second order political effects along these lines, even if the direct impact was small: if there's one lesson I would draw from covid and the last year or so of AI discourse, it's that the overton window is much more mobile than people often think. A dramatic intervention like this would obviously have uncertain outcomes, but could trigger unforeseen possibilities. Unilateral action that disadvantages the actor also makes a political message much more powerful. There's a lot of skepticism when labs like Anthropic talk loudly about AI risk because of the objection 'if it's so bad why are you making it'. While there are technical arguments one can make that there are good reasons to simultaneously work on safety and ai development, it makes communicating this message much harder and people will understandably have doubts about your motives.
'we can't slow down because someone else will do it anyway' - I actually this is probably wrong: in a counterfactual world where OpenAI didn't throw lots of resources and effort into language models, I'm not actually sure someone else would have bothered to continue scaling them, at least not for many years. Research is not a linear process and a field being unfashionable can delay progress by a considerable amount; just look at the history of neural network research! I remember many people in academia being extremely skeptical of scaling laws around the time they were being published; if OpenAI hadn't pushed on it it could have taken years to decades for another lab to really throw enough resources at that hypothesis if it had become unfashionable for whatever reason.
I'm not sure it's always true that other labs catch up if the leading ones stop: progress also isn't a simple function of time; without people trying to scale massive GPU clusters you don't get practical experience with the kind of problems such systems have, production lines don't re-orient themselves towards the needs of such systems, etc. etc. There are important feedback loops in this kind of process that the big labs shutting down could disrupt, such as attracting more talent and enthusiasm into the field. It's also not true that all ML research is a monolithic line towards 'more AGI' - from my experience of academia, many researchers would have quite happily worked on small specialised systems in a variety of domains for the rest of time.

I think many of these arguments also apply to arguments against 'US moratorium now' - for instance, it's much easier to get other countries to listen to you if you take unilateral actions, as doing so is a costly signal that you are serious.

this isn't neccesarily to say that I think a US moratorium or a leading lab shutting down would actually be a useful thing, just that I don't think it's cut and dry that it wouldn't. Consider what would happen if a leading lab actually did shut themselves down - would there really be no political consequences that would have a serious effect on the development of AI? I think that your argument makes a lot of sense if we are considering 'spherical AI labs in a vacuum', but I'm not sure that's how it plays out in reality.

This is good, thanks. In brief reply to your bullets:

Yeah, agree; this seems complicated.
I agree that progress isn't inevitable. But to some extent it's fine if you do-the-thing and don't publish your research. But to some extent ideas leak.
I think LLMs are now sufficiently promising that if DeepMind, OpenAI, and Anthropic disappeared, the field would be set back a year or two but other labs would take their place.

Taking drastic unilateral action creates new political possibilities. A good example is Hinton and Bengio 'defecting' to advocating strongly for AI safety in public; I think this has had a huge effect on ML researchers and governments in taking things seriously, even though the direct effect on AI research is probably neglible. For instance, Hinton in particular made me personally take a much more serious look at AI safety related arguments, and this has influenced me trying to re-orient my career in a more safety-focused direction. I find it implausible that a leading AI lab shutting themselves down for safety reasons would have no second order political effects along these lines, even if the direct impact was small: if there's one lesson I would draw from covid and the last year or so of AI discourse, it's that the overton window is much more mobile than people often think. A dramatic intervention like this would obviously have uncertain outcomes, but could trigger unforeseen possibilities. Unilateral action that disadvantages the actor also makes a political message much more powerful. There's a lot of skepticism when labs like Anthropic talk loudly about AI risk because of the objection 'if it's so bad why are you making it'. While there are technical arguments one can make that there are good reasons to simultaneously work on safety and ai development, it makes communicating this message much harder and people will understandably have doubts about your motives.
'we can't slow down because someone else will do it anyway' - I actually this is probably wrong: in a counterfactual world where OpenAI didn't throw lots of resources and effort into language models, I'm not actually sure someone else would have bothered to continue scaling them, at least not for many years. Research is not a linear process and a field being unfashionable can delay progress by a considerable amount; just look at the history of neural network research! I remember many people in academia being extremely skeptical of scaling laws around the time they were being published; if OpenAI hadn't pushed on it it could have taken years to decades for another lab to really throw enough resources at that hypothesis if it had become unfashionable for whatever reason.
I'm not sure it's always true that other labs catch up if the leading ones stop: progress also isn't a simple function of time; without people trying to scale massive GPU clusters you don't get practical experience with the kind of problems such systems have, production lines don't re-orient themselves towards the needs of such systems, etc. etc. There are important feedback loops in this kind of process that the big labs shutting down could disrupt, such as attracting more talent and enthusiasm into the field. It's also not true that all ML research is a monolithic line towards 'more AGI' - from my experience of academia, many researchers would have quite happily worked on small specialised systems in a variety of domains for the rest of time.

This is good, thanks. In brief reply to your bullets:

Yeah, agree; this seems complicated.
I agree that progress isn't inevitable. But to some extent it's fine if you do-the-thing and don't publish your research. But to some extent ideas leak.
I think LLMs are now sufficiently promising that if DeepMind, OpenAI, and Anthropic disappeared, the field would be set back a year or two but other labs would take their place.

14

How to think about slowing AI

14

Variables

Scenarios

14

14