tl;dr: I know a bunch of EA/rationality-adjacent people who argue — sometimes jokingly and sometimes seriously — that the only way or best way to reduce existential risk is to enable an “aligned” AGI development team to forcibly (even if nonviolently) shut down all other AGI projects, using safe AGI. I find that the arguments for this conclusion are flawed, and that the conclusion itself causes harm to institutions who espouse it. Fortunately (according to me), successful AI labs do not seem to espouse this "pivotal act" philosophy.
[This post is also available on the EA Forum.]
How to read this post
Please read Part 1 first if you’re very impact-oriented and want to think about the consequences of various institutional policies more than the arguments that lead to the policies; then Parts 2 and 3.
Please read Part 2 first if you mostly want to evaluate policies based on the arguments behind them; then Parts 1 and 3.
I think all parts of this post are worth reading, but depending on who you are, I think you could be quite put off if you read the wrong part first and start feeling like I’m basing my argument too much on kinds-of-thinking that policy arguments should not be based on.
Part 1: Negative Consequences of Pivotal Act Intentions
Imagine it’s 2022 (it is!), and your plan for reducing existential risk is to build or maintain an institution that aims to find a way for you — or someone else you’ll later identify and ally with — to use AGI to forcibly shut down all other AGI projects in the world. By “forcibly” I mean methods that violate or threaten to violate private property or public communication norms, such as by using an AGI to engage in…
- cyber sabotage: hacking into competitors’ computer systems and destroy their data;
- physical sabotage: deploying tiny robotic systems that locate and destroy AI-critical hardware without (directly) harming any humans;
- social sabotage: auto-generating mass media campaigns to shut down competitor companies by legal means, or
- threats: demonstrating powerful cyber or physical or social threats, and bargaining with competitors to shut down “or else”.
Hiring people for your pivotal act project is going to be tricky. You’re going to need people who are willing to take on, or at least tolerate, a highly adversarial stance toward the rest of the world. I think this is very likely to have a number of bad consequences for your plan to do good, including the following:
- (bad external relations) People on your team will have a low trust and/or adversarial stance towards neighboring institutions and collaborators, and will have a hard time forming good-faith collaboration. This will alienate other institutions and make them not want to work with you or be supportive of you.
- (bad internal relations) As your team grows, not everyone will know each other very well. The “us against the world” attitude will be hard to maintain, because there will be an ever weakening sense of “us”, especially as people quit and move to other institutions and conversely. Sometimes, new hires will express opinions that differ from the dominant institutional narrative, which might pattern-match as “outsidery” or “norm-y” or “too caught up in external politics”, triggering feelings of internal distrust within the team that some people might defect on the plan to forcibly shut down other projects. This will cause your team to get along poorly internally, and make it hard to manage people.
- (risky behavior) In the fortunate-according-to-you event that your team manages to someday wield a powerful technology, there will be a sense of pressure to use it to “finally make a difference” or other argument that boils down to acting quickly before competitors would have a chance to shut you down or at least defend themselves. This will make it hard to stop your team from doing rash things that would actually increase existential risk.
Overall, building an AGI development team with the intention to carry out a “pivotal act” of the form “forcibly shut down all other A(G)I projects” is probably going to be a rough time, I predict.
Does this mean no institution in the world can have the job of preparing to shut down runaway technologies? No; see “Part 3: it matters who does things”.
Part 2: Fallacies in Justifying Pivotal Acts
For pivotal acts of the form “shut down all (other) AGI projects”, there’s an argument that I’ve heard repeatedly from dozens of people, which I claim has easy-to-see flaws if you slow down and visualize the world that the argument is describing.
This is not an argument that successful AI research groups (e.g., OpenAI, DeepMind, Anthropic) seem to espouse. Nonetheless, I hear the argument frequently enough to want to break it down and refute it.
Here is the argument:
- AGI is a dangerous technology that could cause human extinction if not super-carefully aligned with human values.
(My take: I agree with this point.)
- If the first group to develop AGI manages to develop safe AGI, but the group allows other AGI projects elsewhere in the world to keep running, then one of those other projects will likely eventually develop unsafe AGI that causes human extinction.
(My take: I also agree with this point, except that I would bid to replace “the group allows” with “the world allows”, for reasons that will hopefully become clear in Part 3: It Matters Who Does Things.)
- Therefore, the first group to develop AGI, assuming they manage to align it well enough with their own values that they believe they can safely issue instructions to it, should use their AGI to build offensive capabilities for targeting and destroying the hardware resources of other AGI development groups, e.g., nanotechnology targeting GPUs, drones carrying tiny EMP charges, or similar.
(My take: I do not agree with this conclusion, I do not agree that (1) and (2) imply it, and I feel relieved that every successful AI research group I talk to is also not convinced by this argument.)
The short reason why (1) and (2) do not imply (3) is that when you have AGI, you don’t have to use the AGI directly to shut down other projects.
In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.
To be clear, I’m not arguing for leaving regulatory efforts entirely in the hands of governments with no help or advice or infrastructural contributions from the tech sector. I’m just saying that there are many viable options for regulating AI technology without requiring one company or lab to do all the work or even make all the judgment calls.
Q: Surely they must be joking or this must be straw-manning... right?
A: I realize that lots of EA/R folks are thinking about AI regulation in a very nuanced and politically measured way, which is great. And, I don't think the argument (1-3) above represents a majority opinion among the EA/R communities. Still, some people mean it, and more people joke about it in an ambiguous way that doesn't obviously distinguish them from meaning it:
- (ambiguous joking) I've numerous times met people at EA/R events who were saying extreme-sounding things like "[AI lab] should just melt all the chip fabs as soon as they get AGI", who when pressed about the extremeness of this idea will respond with something like "Of course I don't actually mean I want [some AI lab] to melt all the chip fabs". Presumably, some of those people were actually just using hyperbole to make conversations more interesting or exciting or funny.
Part of my motivation in writing this post is to help cut down on the amount of ambiguous joking about such proposals. As the development of more and more advanced AI technologies is becoming a reality, ambiguous joking about such plans has the potential to really freak people out if they don't realize you're exaggerating.
- (meaning it) I have met at least a dozen people who were not joking when advocating for invasive pivotal acts along the lines of the argument (1-3) above. That is to say, when pressed after saying something like (1-3), their response wasn't "Geez, I was joking", but rather, "Of course AGI labs should shut down other AGI labs; it's the only morally right thing for them to do, given that AGI labs are bad. And of course they should do it by force, because otherwise it won't get done."
In most cases, folks with these viewpoints seemed not to have thought about the cultural consequences of AGI research labs harboring such intentions over a period of years (Part 2), or the fallacy of assuming technologists will have to do everything themselves (Part 1), or the future possibility of making evidence available to support global regulatory efforts from a broader base of consensual actors (see Part 3).
So, part of my motivation in writing this post is as a genuine critique of a genuinely expressed position.
Part 3: It Matters Who Does Things
I think it’s important to separate the following two ideas:
- Idea A (for “Alright”): Humanity should develop hardware-destroying capabilities — e.g., broadly and rapidly deployable non-nuclear EMPs — to be used in emergencies to shut down potentially-out-of-control AGI situations, such as an AGI that has leaked onto the internet, or an irresponsible nation developing AGI unsafely.
- Idea B (for “Bad”): AGI development teams should be the ones planning to build the hardware-destroying capabilities in Idea A.
For what it’s worth, I agree with Idea A, but disagree with Idea B:
Why I agree with Idea A
It’s indeed much nicer to shut down runaway AI technologies (if they happen) using hardware-specific interventions than attacks with big splash effects like explosives or brainwashing campaigns. I think this is the main reason well-intentioned people end up arriving at this idea, and Idea B, but I think Idea B has some serious problems.
Why I disagree with Idea B
A few reasons! First, there’s:
- Action Consequence 1: the action of having an AGI carry out or even prescribe such a large intervention on the world — invading others’ private property to destroy their hardware — is risky and legitimately scary. Invasive behavior is risky and threatening enough as it is; using AGI to do it introduces a whole range of other uncertainties, not least because the AGI could be deceptive or otherwise misaligned with humanity in ways that we don’t understand.
Second, before even reaching the point of taking the action prescribed in Idea B, merely harboring the intention of Idea B has bad consequences; echoing similar concerns as Part 1:
- Intention Consequence 1: Racing. Harboring Idea B creates an adversarial winner-takes-all relationship with other AGI companies racing to maintain
- a degree of control over the future, and
- the ability to implement their own pet theories on how safety/alignment should work, leading to more desperation, more risk-taking, and less safety overall.
- Intention Consequence 2: Fear. Via staff turnover and other channels, harboring Idea B signals to other AGI companies that you are willing to violate their property boundaries to achieve your goals, which will cause them to fear for their physical safety (e.g., because your incursion to invade their hardware might go awry and end up harming them personally as well). This kind of fear leads to more desperation, more winner-takes-all mentality, more risk-taking, and less safety.
Summary
In Part 1, I argued that there are negative consequences to AGI companies harboring the intention to forcibly shut down other AGI companies. In Part 2, I analyzed a common argument in favor of that kind of “pivotal act”, and found a pretty simple flaw stemming from fallaciously assuming that the AGI company has to do everything itself (rather than enlisting help from neutral outsiders, using evidence). In Part 3, I elaborated more on the nuance regarding who (if anyone) should be responsible for developing hardware-shutdown technologies to protect humanity from runaway AI disasters, and why in particular AGI companies should not be the ones planning to do this, mostly echoing points from Part 1.
Fortunately, successful AI labs like DeepMind, OpenAI, and Anthropic do not seem to espouse this “pivotal act” philosophy for doing good in the world. One of my hopes in writing this post is to help more EA/R folks understand why I agree with their position.
I'm surprised that there's not more push back in this community on the idea of a "pivotal act" even being feasible on any reasonable timeline that wouldn't give up the game (i.e.: reveal that you have AGI and immediately get seized by the nearest state power), in the same way that there's push back on regulation as a feasible approach.
SUMMARY: Pivotal acts as described here are not constrained by intelligence, they're constrained by resources and time. Intelligence may provide novel solutions, but it does not immediately reduce the time needed for implementation of hardware/software systems in a meaningful way. Novel supply chains, factories, and machinery must first be designed by the AGI and then created by supporters before the AGI will have the influence on the world that is expected by proponents of the "pivotal act" philosophy.
I'm going to structure this post as 2 parts.
First, I will go through the different ideas posed in this thread as example "pivotal acts" and point out why it won't work, specifically by looking at exceptions that would cause the "pivotal act" to not reliably eliminate 100% of adversaries.
Then, I'll look at the more general statement that my complaints in part 1 are irrelevant because a superhuman AGI is by definition smarter than me, and therefore it'll do something that I can't think of, etc.
Part 1. Pivotal act <x> is not feasible.
This has the same issues as banning gain-of-function research. It's difficult to imagine a mass media campaign in the US having the ability to persuade or shutdown state-sponsored AI research in another country, e.g. China.
Nation states (and companies) don't generally negotiate with terrorists. First, you'd have to be able to convince the states you can follow through on your threat (see arguments below). Second, you'd need to be able to make your threat from a position of security such that you're not susceptible to a preemptive strike, either delivered in the form of military or law enforcement personal showing up at your facilities with search warrants OR depending on what exactly you threatened and where you are located in the world, a missile instead.
Stuxnet worked by targeting hardware/software systems. Specifically, it targeted the hardware (the centrifuges) controlled via software (PLCs) and sent commands that would break the hardware by exceeding the design spec. The Iran network would have been air-gapped, so the virus had to be delivered on site either via a human technician performing the deployment, or via some form of social engineering like leaving a trapped USB device on the premises and hoping that an Iranian engineer would plug it into their network. Even that last vector can be circumvented by not allowing USB storage devices on computers attached to the network which is absolutely a thing that certain networks do for security. By "not allow", I don't mean it's a piece of paper that says "don't do this", I mean that the software stack running on the computers don't allow USB storage devices and/or the ports are physically inaccessible.
Let's assume for the sake of argument that the bug already exists and we just need to exploit it. How are you delivering your exploit? Are you assuming the other AI projects are connected to the public internet? Or do you first have to assume that you can break into their VPN, or worse, somehow get into an air-gapped network? When does your exploit take effect? Is it immediately? Or is it when someone tries to run some type of experiment? If it's immediately, then you risk early discovery when the exploit is being spread and you give your adversaries time to pull themselves from a network. If it's triggered on some external command, then you risk the command not being received and certain exploited systems failing to be destroyed as intended. If it's triggered by the system it has exploited, e.g. in response to some GPU usage threshold, then you run into the same issue where people will start posting online "my GPU melted after I ran a neural network, what's going on?"
Even the above discussion ignores the fact that Linux is not a monolithic entity and neither is Windows, or MacOS, so you probably need bugs for each OS or distro, and probably separate versions, and you're soon looking at hundreds of different exploits all of which need to be orchestrated at the same time to avoid early detection / avoidance by your adversaries. Add in the need to target specific libraries and you've got even more exploits to deal with, but that's still assuming that your adversaries use the public versions of libraries, vs using internal forks or private code entirely.
This isn't even getting into the most pressing problem of this hypothetical. Stuxnet could destroy the hardware -- not just break it until the bug was removed, but actually destroy it -- because it was targeting centrifuges which are things that spin at super high rates (think 1000 revolutions per second for a big piece of machinery) and were thus susceptible to oscillations that would cause the system to physically tear itself apart. How are you going to destroy the GPUs in this hypothetical? Exploit some type of bug that bricks them? That isn't unheard of. The Amazon MMO "New World" reportedly bricked some high end graphics cards. That particular example though was traced to a hardware fault on less than 1% of RTX 3090's created by the company EVGA, so you'd need a different exploit for the other 99%, plus the other graphics cards, plus the other manufacturers of those cards. If you can't identity a way for your software exploit to physically break the hardware, actually break it, then at best this is just a minor speed bump. Even if you can 100% of the time nuke the attached graphics card in a computer, companies have stock of extra computers and components in storage. You aren't destroying those, because they're not even in a computer right now.
Compare all of the above to Stuxnet: a single worm, designed to destroy a single hardware/software system, with the exact target environment (the hardware, and the software) known to the creators, and it still took 2 (?) nation states to pull it off, and crucial to our discussion of pivotal acts, it was not 100% effective. The best estimate is that maybe 10% of the Iran centrifuges were destroyed by Stuxnet.
See statements above about air-gaps / VPNs / etc. Pointing to anecdotal hacks of big companies doesn't work, because for your pivotal act you need to hit 100% of adversaries. You also need to deal with backups, including backups that are off-site or otherwise unconnected to the network, which is standard practice for corporations that don't want to care about ransomware.
Part 2. Magic isn't real.
These attacks all fall under a bucket I'll call "wizardry". In these attacks, we assume that a superintelligence can do things that defy our current understanding of physics, supply chains, factories, chip design, yield rates, etc. I don't mean that the superintelligence is able to solve problems faster or better than a human, because that trivially follows from the definition of "superintelligence". What I mean is that the superintelligence in these attacks is able to skip the part of the process that follows "think of a solution" -- implementation. For all hardware/software systems that I've worked on, coming up with a solution to a problem was probably less than 1% of the total engineering effort spent on bringing that solution into reality. The rest of the time is on implementation. Specifically, the rest of the time is spent on iterative loops of problem discovery and problem solving until you've designed a system that actually works in the real world.
Let's look at nanotechnology to start with, since it's a popular example.
Ok, so the first thing you need to do is develop working nanobots, because nanobots don't exist. And to do that, the superhuman AGI is going to think really hard, and design the following at a minimum:
The nanobots need to be designed to the constraints that make "melt the GPU factory" a realistic goal, so this means the superhuman AGI needs to be considering things like: how are they programmed, what are their actuators (for doing the melting), how do they sense the world (for seeing what to melt), what is their power supply (externally powered? if so, by what? what's that device? how is it not a failure point in this plan? if they're internally powered, how is that battery sized or where is the energy derived?), how are they controlled, what is their expected lifetime? When you're answering these questions, you need to reason about how much power is needed to melt a GPU factory, and then work backwards from that based on the number of nanobots you think you get into the factory, so that you've got the right power output + energy requirements per nanobot for the melting.
Now, you need to actually build that machinery. You can't use existing fabs because that would give away your novel designs, plus, these aren't going to be like any design we have in existence since the scale you're working with here isn't something we've gotten working in our current reality. So you need to get some permits to build a factory, and pay for the factory, and bring in-house all of the knowledge needed to create these types of machines. These aren't things you can buy. Each one of the machines you need is going to a custom design. It's not enough to "think" of the designs, you'll still need an army of contractors and sub-contractors and engineers and technicians to actually build them. It's also not sufficient to try and avoid the dependency on human support by using robots or drones instead that's not how industrial robots or drones work either. You'll have a different, easier bootstrap problem first, but still a bootstrap problem nonetheless.
If you're really lucky, the AGI was able to design machinery using standard ICs and you just need to get them in stock so you can put them together in house. Under that timeline, you're looking at 12-15 week lead times for those components, prior to the chip shortage. Now it's as high as 52+ week lead times for certain components. This is ignoring the time that it took to build the labs and clean rooms you'll need to do high-tech electronics work, and the time to stock those labs with equipment, where certain pieces of equipment like room-sized CNC equipment are effectively one-off builds from a handful of suppliers in the world with similar lead times to match.
If you're unlucky, the AGI had to invent novel ICs just for the machinery for the assembly itself, and now we get to play a game of Factorio in real life as we ask the AGI to please develop a chain of production lines starting from the standard ICs that we can acquire, up to those we need for our actual production line for the nanobots. Remember that we've still got 12-15 week lead times on the standard ICs.
Tangent: You might look at Amazon and think, well, I can buy a processor and have it here next day, why can't I get my electronic components that quickly? In a word: scale. You're not going to need a single processor or a single IC to build this factory. You're going to need tens of thousands. If you care about quality control on this hardware you're developing, you might even buy an entire run of a particular hardware component to guarantee that everything you're using was developed by the same set of machines and processes at a known point in time from a known factory.
The next problem is that you'll build all of the above, and then it won't work. You'll do a root cause analysis to figure out why, discover something you hadn't realized about how physics works in that environment, or a flaw in some of your components (bad manufacturer, bad lot, etc), update your designs, and go through the process all over again. This is going to take time. Not weeks, but months, or worse, years. If you have to buy new components, it's back to that 12-15 week lead time. If you want to try and avoid buying new components by cleverly desoldering the ones you have and reusing them, that is very difficult.
You can't avoid this process by just reading up on the existing literature of how Intel or Nvidia or some other company designs chips because that information 1.) isn't public, and 2.) isn't the whole story. The dirty secret of engineering documentation is that engineers hate writing documentation and all big hardware/software engineering projects are carried by the collective knowledge of the organization more so than whatever token documentation got produced as needed for interfacing with an external supplier. Example: Rocketdyne F-1 rocket engines.
During the time that you spend iterating in the real world to get to working designs, people are going to wonder about your massive factory, and your technicians are going to get drunk at a bar and brag about how they're developing cutting edge nanobots, and your competitors will realize you're doing something very novel, very odd, and very difficult to explain.
This is going to cost you literal billions (with a B) of dollars in hardware costs between the supplies you need to buy, the ICs you'll have on order, the custom machinery you'll be purchasing to build your own machines, etc. So there's another prerequisite here of either 1.) you are Apple / Google / Microsoft, or 2.) you're going to ask the AGI to make a bunch of money on the stock market. I actually believe option 2 is fairly achievable, e.g. see RenTech and the Medallion fund. That fund has averaged annual returns of 70% from 1994 through 2014. However, that's still a timescale of years and significant seed money (millions of dollars) before you'll have enough cash on hand to bankroll all of this R&D, unless you get external investment, but to get this much external investment you'll have to 1.) find someone with billions of dollars, and 2.) convince them that you have AGI, 3.) swear them to secrecy, and 4.) hope that they don't do some shenanigans like poaching your employees or having the government rain down regulation on you as a stall tactic while they develop their own AGI or try to steal yours.
The likely result here is an arms race where your competitors try to poach your employees ($2 million / year?) or other "normal" corporate espionage to understand what's going on. Example: When Google sued Uber for poaching one of their top self-driving car engineers.
Tangent: If you want the AGI to be robust to government-sponsored intervention like turning off the grid at the connection to your factory, then you'll need to invest in power sources at the factory itself, e.g. solar / wind / geothermal / whatever. All of these have permit requirements and you'll get mired in bureaucratic red tape, especially if you try to do a stable on-site power source like oil, natural gas, or worse nuclear. Energy storage isn't that great at the moment, so maybe that's another sub-problem or the AGI to solve first as a precursor to all of these other problems, so that it can run on solar power alone.
You might think that the superhuman AGI is going to avoid that iterative loop by getting it right on the very first time. Maybe we'll say it'll simulate reality perfectly, so it can prove that the designs will work before they're built, and then there's only a single iteration needed.
Let's pretend the AI only needs one attempt to figure out working designs: Ok, the AGI perfectly simulates reality perfectly. It still doesn't work the first time because your human contractors miswired some pins during assembly, and you still need to spend X many months debugging and troubleshooting and rebuilding things until all of the problems are found and fixed. If you want to avoid this, you need to add a "perfect QA plan for all sub-components and auditing performed at all integration points" to the list of things that the AGI needs to design in advance, and pair it with "humans that can follow a QA plan perfectly without making human mistakes".
On the other hand: The AGI can only simulate reality perfectly if we had a theory of physics that could do so, which we don't. The AGI can develop their own theory, just like you and I could do so, but at some point the theorizing is going to hit a wall where there are multiple possible solutions, and the only way to see which solution is valid in our reality is to run a test, and in our current understanding of physics, the tests we know how to run involve constructing increasing elaborate colliders and smashing together particles to see what pops out. While it is possible that there exists another path that does not have a prerequisite of "run test on collider", you need to add that to your list of assumptions, and you might as well add "magic is real". Engineering is about tradeoffs or constraints. Constraints like mass requirements given some locomotion system, or energy usage given some battery storage density and allowed mass, or maximum force an actuator can provide given the allowed size of it to fit inside of the chassis, etc. If you assume that a superhuman AGI is not susceptible to constraints anymore, just by virtue of that superhuman intelligence, then you're living in a world just as fictional as HPMOR.