Stopping AI is easier than Regulating it.

David Scott Krueger (formerly: capybaralet)

I want to start with this provocative claim: Stopping AI is easier than regulating AI.

I often hear people say “Stopping is too hard, so we should do XYZ instead” where XYZ is some other form of regulation, such as mandating safety testing. It seems like the purpose of doing safety testing would be to stop building AIs if we can’t get them to pass the tests, so unless that’s not the purpose, or proponents are confident that we can get them to pass the tests (and hopefully also confident that the test work, which they quite likely do not…), this particular idea doesn’t make a lot of sense. But people might in general think that we can instead regulate the way AI is used or something like that.

But I think this line of argument gets it exactly backwards. Stopping AI is easier than regulating it.

Why? Well let’s dive in. First, I need to explain what I mean…

I mean, specifically, that stopping AI is an easier way to reduce the risks from AI to an acceptable level that other approaches to regulating AI.

The way I imagine stopping AI is actually a particular form of regulating AI, specifically via an international treaty along the lines of Systematically Dismantling the AI Compute Supply Chain.

Also, when I say “it’s easier”, what do I mean? Well, there are a few ways in which stopping is hard. I’d separate technical and incentive challenges from political challenges, and I’m setting aside political challenges, because I think we should be clear about what should happen and why, and then seek to accomplish it politically.

Besides politics, the main underlying issue preventing meaningful AI regulation is international competition, especially between the US and China.1 So basically, I mean stopping AI is the most effective way to address this key barrier to international cooperation, which is necessary to reduce AI risks to an acceptable level.

I believe in the fundamentals of AI, and I believe alignment is not doomed, so I believe that AI could indeed end up giving one nation or company control over the future. It’s still not clear that it’s rational to race to build AI, given the risks involved. But it does seem hard for me to imagine a stable situation where governments aren’t confident their adversaries aren’t building super powerful AI in secret.

Proposals to govern super powerful AI internationally while still building it suffer from a bunch of challenges that stopping it doesn’t. But basically, approaches that instead try to regulate development or use of AI to ensure it is safe and beneficial are harder to monitor and enforce, and hence more likely to fail.

Challenges

Let’s consider a hypothetical agreement between the US and China (leaving out the other countries for simplicity), and consider some of these challenges in detail.

Monitoring hardware

Suppose you have an agreement that allows AI to proceed in some particular “authorized” directions. How do you verify compliance? This basically boils down to: How can you be sure that no significant fraction of the world’s computer power is being used in unauthorized ways? This seems hard for a few reasons:

How can you be sure you know where all the computer chips are? This is a problem in any case, but it’s more of a problem if you keep making more computer chips, and you keep around the factories that make the chips. Right now, we know where a ton of the chips are -- they’re in data centers, which are easy to spot. But what’s to stop countries from secretly siphoning off some chips here and there? Or making a secret factory to produce more secret chips? We can certainly try and monitor for such things, but there’s an ongoing risk of failure. What happens when a shipment of chips go missing unexpectedly? If the US (e.g.) actually lost them (and wasn’t secretly using them), China would have to trust that that is the case, or the agreement might collapse. In general, whenever monitoring breaks down, the “enforcement clock” starts ticking, where enforcement could easily and quickly escalate to war.
1. As chip manufacturing technology advances and it becomes easier and easier to build or acquire a dangerous amount of computer power, it also becomes harder and harder to be sure that nobody has done so secretly.
We need to agree on which uses of the computer power, i.e. which computations, are and are not authorized.
1. One solution commonly proposed is a whitelist allowing existing AI models to be used, but prohibiting further training that would make AI more powerful. Note that this is now essentially a form of stopping AI, but it’s not clear if it goes far enough.
2. 1. One problem with this is that it’s possible to use AIs to drive AI progress, even if you never “train” them, e.g. by automating research into developing better tools and ways of using the AI in combination with other tools. If we analogize the AI to a person: You could make that person vastly more powerful by giving them new tools and instruction manuals, even if you don’t teach them new concepts.
  2. We could try to further restrict which queries of AI are authorized. But it seems possible to decompose arbitrary queries into authorized queries, and it might be easier to hide this activity than to detect it.
3. The problem is much harder if you need to continuously update the list of authorized computations, or wish to use a blacklist instead of a whitelist. Then you get into the problem of agreeing on standards.

Agreeing on standards

If we move away from a static whitelist of authorized computations, we then need a process for determining which computations should be authorized. This is hard for a few reasons:

There is still a lot of technical uncertainty about how to do AI assurance to a high standard. Testing AI systems is largely a matter of vibes. For instance, there is no suite of tests where, if an AI passed those tests, we could conclude it was not going to “go rogue”.
In addition, for any particular test(s), AIs can be designed specifically to fool those tests. So both the US and China have an incentive to use tests that maximally advantage their AI. One solution here might be to ensure that AI passes all the tests proposed by either side, but again, it might be easier to fool the tests the other side runs -- even without advanced knowledge of which tests those would be -- than to create a reliable set of tests that cannot be fooled.
The US and China might have very different standards for what they consider to be “safe”, e.g. due to differences in values and priorities. I expect that such disagreements could be resolved, but they still create an extra challenge that could stall or sink negotiations.
In general, every point which requires some element of subjective judgment and negotiation is a potential point of failure.

Violations of the agreement

It’s going to be easier to violate the agreement if there are a bunch of AIs and AI chips around that are being used according to the agreement. You just say “we’re done with this treaty”, and then start doing whatever you want with the ones you control. There are proposals to make it technically difficult to use AI chips in ways that aren’t authorized, but they aren’t mature or tested, and it’s likely that the US and/or China could find ways to subvert such controls.

Once a violation occurs, the other side might need to intervene rapidly to protect themselves. In the current paradigm, training a new, more powerful AI might take months, but that’s not a comfortable amount of time for resolving a tense international security dispute. And if all that’s required to be a threat is for an adversary to “fine-tune” an existing AI, or use it in an unauthorized way, lead time might be measured in days -- or even seconds.

On the other hand, if the infrastructure needed to build dangerous AI systems does not exist in any form, and a violator would need to build up the compute supply chain again, this would probably give other parties years to negotiate an arrangement that undoes the violation and doesn’t involve war.

Conclusion

Summing things up, If you are concerned that stopping AI altogether might be too hard to enforce, you should only expect alternative approaches to international governance to be harder. From this point of view, alternative approaches add unnecessary complexity and fail the KISS (”Keep it simple, stupid”) design principle. They may provide more of an opportunity to capture benefits of AI, but this doesn’t matter if they aren’t actually workable. If you believe international governance of AI is needed to reduce the risk to an acceptable level, the coherent points of view available seem to be:

We cannot regulate AI internationally in any substantive way.
Stopping AI is possible and would reduce the risk to an acceptable level, but this is also true of more nuanced approaches that allow us to capture more of the benefits.
Stopping AI is the only way to reduce the risk to an acceptable level.

I’m not sure which of these is right, but my money is on (3). Note that “Stopping AI is too hard, we need to regulate it in a different way instead” is not on the list.

But this is also often used, politically, as an argument for why pausing is impossible. And this means that addressing this concern is also a big way to address the political barriers to pausing.

I think this seems right to me, just on the intuition that it's been far easier to enforce bans on nuclear and chemical weapons than our attempts to regulate how much or what kind of nuclear or chemical weapons countries have. It's easier to draw and enforce a bright line in the sand and try to minimize the number of actors who have to be regulated in more complex ways.

I think "Pivoting AI" makes more sense than "Pausing AI" or "Regulating AI". As far as I understand it, the demand for technological innovation cannot be "stopped" in any meaningful sense of the word.

Many stakeholders are investing in the best 'horse' to cover the technological innovation demand, which happens to be AI according to many. What I am saying is that maybe a more realistic solution is to offer a better and safer technological alternative to invest in.

Also, when I say “it’s easier”, what do I mean? Well, there are a few ways in which stopping is hard. I’d separate technical and incentive challenges from political challenges, and I’m setting aside political challenges, because I think we should be clear about what should happen and why, and then seek to accomplish it politically.

I'm definitely sympathetic to this idea, but I wonder how to apply it in practice. For example, it seems to me like a critic of your position could say the following:

By your own reasoning, it seems like what "should happen" under this standard is regulation rather than stopping. You argue that regulation would allow the realization of the benefits of AI but might not acually be safe. But it seems like the reasons why regulation would not be safe are themselves political. They follow from some actor making a decision to do a particular thing that isn't safe (using AI in an unsafe way contrary to their agreements), even though it is technical possible for them to do otherwise (they could just comply with the agreements). If regulation potentially realizes a higher benefit (due to the additional benefit of using AI safely), by this standard, isn't it what "should happen" since we could seek to accomplish compliance with international agreements politically, by convincing the countries in question that it is in their own best interests to comply?

You might say that ths is about their "incentives" rather than politics, but what's the difference? Why is failure to comply with an international agreement apolitical while the decision of whether to make the agreement is political? If someone believes that it will be hard to make international agreements to stop AI because countries will have incentives against this, does that mean that those considerations now fall under "incentives" and thus count for purpose of determining whether stopping is "hard"?

Yeah, this is a good point. The way I've put it before is: when you are thinking about what should happen, you're basically imagining you have some sort of magic wand that makes it happen. But how powerful is the magic wand? I haven't thought this through to my satisfaction, so for now I'm just going based on intuitive notions of what is actually realistically achievable.

But one way of trying to define the limits of the "magic wand" here would be: You get to magically choose a policy to be adopted, but you don't get to magically control people's behavior afterwards. So if you want to get people to limit AI uses, your policy needs to deal with their potential incentives to do otherwise.

This means, IIUC, that the answer to your final question is "yes". But it's more a matter of perceived incentives here, IMO, see: https://therealartificialintelligence.substack.com/p/following-the-incentives
> If someone believes that it will be hard to make international agreements to stop AI because countries will have incentives against this, does that mean that those considerations now fall under "incentives" and thus count for purpose of determining whether stopping is "hard"?

But one way of trying to define the limits of the "magic wand" here would be: You get to magically choose a policy to be adopted, but you don't get to magically control people's behavior afterwards. So if you want to get people to limit AI uses, your policy needs to deal with their potential incentives to do otherwise.

That makes sense to an extent. If I can summarize my understanding of your point, for purposes of understanding how enforceable a policy is, we assume that the policy is implemented and than analyze enforcement. We want to do this to decompose the difficulty of implementing the policy from the question of post-implementation enforcement. Assuming that both stopping and regulating were implemented, you give reasons to believe that regulation would be harder to enforce. Is that correct?

The part I don't understand is how that relates to your conclusions at the end:

Note that “Stopping AI is too hard, we need to regulate it in a different way instead” is not on the list.

Where the list is of "the coherent points of view available". I don't think this follows because something can be "hard" for non-enforcement reasons. So someone can coherently believe that regulation has non-enforcement advantages and enforcement-related disadvantages, with the advantages outweighing the disadvantages (relative to stopping). This seems entirely coherent to me (which isn't to say that I agree with it).

If the statement I quote above has an implicit (only as it relates to enforcement) attached, then I don't really understand what it means beyond the fact that if someone accepts your argument, they are in fact accepting your argument. The conclusion becomes almost tautological, such that it doesn't really seem to relate to coherence to me (because someone who disagrees probably disagrees with an earlier step in your argument).