I'd like to somehow put this in the hands of as many politicians as possible.
I think the way you structured this would be an excellent way to route a pragmatic politician into caring about X-risk.
The writing is excellent, spare and not alarmist in tone. The examples are well-chosen and compelling.
I'll make this a reference for newcomers with a pragmatic bent.
I look forward to seeing your next piece.
Yes, it's an unusual risk. But I hope that no one will release access to AI models who in turn can create weapons. The models could either be open-sourced and bring with themselves many other risks[1] or be closed-sourced and used in a misaligned way, causing potential lawsuits. @Daniel Kokotajlo, does the latter mean that no corp will release weapons-creating AIs to anyone aside from the corp's government? Or Grok 5 or 6 will create some weapons and cause mankind to shut xAI down?
Including a potential international race to protect mankind from rogue AIs. I hope that Kokotajlo's team will eventually provide feedback to all the scenarios it received, including the Rogue Replication Scenario, and determine whether the international race is a plausible result.
I'll talk more about this in follow up posts, but I don't think the main danger is that the models will be voluntarily released. Instead, it'll just get cheaper and cheaper to train the models that have weapons capabilities as the algorithms get more efficient, which will eventually democratize those weapons.
Analogously, we can think about how cryptography was once a government controlled technology because of its strategic implications, but became widespread as the computing power required to host cryptographic algorithms became extremely cheap.
This article discusses the "Proliferation by Default" dynamic that the falling price of weapons-capable AI systems will create. It covers an array of dangerous capabilities in future systems and comparisons to historical technologies. In the following parts of this series, we will explore the how the price of these offensive AI systems could fall, their strategic implications, and policy solutions.
Summary of key points:
For just under a century, world-ending technology has been constrained to nukes and bioweapons, although its easier to credit this to the fact that we haven't yet discovered dangerous new technologies rather than that we chose to avoid building them. Fortunately, both of these weapons have large technical, financial, and logistical barriers, which, when combined with our current nonproliferation efforts, have kept them contained to a few nation-state actors.
The defining challenge of the 21st century may be that this will no longer be the case. Just as artificial intelligence is poised to revolutionize medicine, the economy, and fundamental research, it similarly promises to accelerate and optimize the development of powerful new weapons. In particular, there are two major concerns:
Taken together, these dynamics imply that allowing powerful AI systems to proliferate would quickly democratize means of mass destruction, absent preemptive intervention by governments. What these offensive capabilities might look like, and how the costs to acquire them will fall, will be the focus of the first and second parts of this series.
While AIs themselves could be employed directly as part of weapons systems, their greatest contributions to geopolitical insecurity might be through advanced R&D efforts, discovering cheaper and more lethal ways of causing catastrophic damage. Although some of these capabilites might only emerge once AI systems are generally superintelligent, there are also plenty of more narrow, near-term capabilities that could be equally dangerous.
Broadly speaking, there are three types of offensive capabilities that AI might assist with. These capabilities might be developed intentionally, as states and other actors pursue strategically relevant weapons technology, or incidentally, as the result of pursuing general capacities in dual-use fields of research.
The most basic application of powerful AI systems will be their contributions to the manufacturing of existing weapons. Of particular concern are weapons whose costs are mostly bounded by technical expertise rather than physical materials, since AI labor will become increasingly able to provide this technical assistance. Creating an effective bioweapon, for instance, requires extensive technical understanding of methods for cultivation, storage, and dispersal, or else the agent will be inert when deployed. The remaining costs of setting up a functional wet-lab and acquiring an initial agent are paltry enough that even non-state actors can afford them.[1] In comparison, even if you eliminate the cost of sourcing technical assistance from nuclear weapons, you're still faced with the irreducible physical costs of enrichment and large-scale industrial manufacturing.
The Japanese doomsday cult Aum Shinrikyo is emblematic of these issues, both in their determination to use biological weapons and in their persistent challenges in doing so. Despite impressive financial resources, the group's attempts to weaponize botulinum toxin and anthrax were both failures. Although they were successful in cultivating the agents they acquired, they didn't have the technical expertise to distinguish between natural strains of C. botulinum and had cultivated vaccine samples of anthrax, hoping in vain that they would be lethal.[2] There were similar challenges in their pursuit of chemical weapons as well: the sarin nerve gas they used in their attack on the Tokyo Subway was of too low a concentration to be massively fatal, thanks to contamination by precursors.[3]
In each of these examples, the cult stumbled not because it wasn't well resourced, but because it had a poor understanding of the weapons it was working with. With additional technical assistance, their weapons would have been many orders of magnitude more lethal.
A powerful enough AI could provide this missing expertise, lowering the barrier to entry for several classes of weapons:
Importantly, powerful AI models in these domains would not need to be explicitly aimed at weapons development in order to possess offensive capabilities. A bioweapons-capable model could emerge soley out of the interest of medical researchers in studying legitimate applications of gain-of-function research. An AI model might be designed with the aim of developing cheap analogues to expensive industrial chemicals, incidentally allowing it to do the same for chemical weapons. AI companies might improve a model's cybersecurity skills with the intent to conduct penetration tests, only for it to be used to maliciously scan for vulnerabilities.
In many cases, the very same skills that make models useful and economically valuable are those that generalize to the development of weapons. And once stolen, sold, or open-sourced, the latent offensive capabilities of these models will be free to access for any actor who can acquire them.[10]
Of course, some actors may turn their attention towards the development of new, more lethal weapons. While these are necessarily speculative, there are still a large number of powerful weapons technologies within conceptual reach.
Lethal Autonomous Weapons Systems (LAWS) - As conflict in Ukraine has increasingly demonstrated, drones provide a host of powerful advantages in recconaisance, payload delivery, and cost-effectiveness. The most useful of these have been loitering munitions (single-use drones that explode once they reach their target) which can be used to asymetrically threaten fixed targets many times their value. Their success has been resounding despite their disadvantages: these drones can typically only be used against fixed targets since they lack dynamic navigation, relying on a combination of satellite coordination and human piloting. When combined with their slow speed (relative to missiles), these vulnerabilities expose them to radio jamming and anti-air defenses. Even with these hindrances, a drone survival rate of just 20% was enough to take out a third of Ukraine's electrical grid in a week.
At a baseline, AI designed and piloted drone navigation software could strip away these limitations, allowing drones to pursue targets on their own. Without jamming or GPS spoofing as a reliable countermeasure, defenders would be forced to spend their resources inefficiently shooting and targeting the drones. This would both improve the cost-ratio further in favor of the attacker, and unlock new strategic options (having your drones autonomously pursue moving targets, for example). At higher levels of AI command and control, drones could be coordinated in huge swarms, compensating for changes in air traffic and collectively applying pressure on weak points in a target's defenses.
Beyond software, robotics-capable AI systems could contribute to the physical design of drones as well. From nature, we know that it's possible to design very small physical systems that are capable of flight: insects are everywhere. While there might be some tradeoffs between size and speed, advances in either (coupled with improvements in cost) could make them extraordinarily difficult to efficiently intercept.[11]
Insect-sized drone hypotheticals:
A drone the size of a bumblebee seems well within the limit of engineering possibilities, given that bumblebees are themselves self-navigating, self-propelled flying machines. Even if these were restricted to the flying speed of their biological cousins (about ~20 mph) they could unlock a large number of strategic options.
One particularly dangerous application might be to equip each of them with a very potent, skin-transmissible poison (likely a nerve agent, which can be carried in quantities of tens of milligrams while being assuredly lethal). A swarm of these agents could be allowed to covertly disperse across an objective before being triggered, pursuing and assassinating high-value targets. Most plausibly, they would be used to take out key decision makers in preparation for an accompanying kinetic or cyber strike.
If it's still not possible to injure the C2 system enough to prevent retaliation in this manner, then a malicious actor could simply turn them against civilians to gain a massive amount of countervalue leverage.[12] A way to do this could be to "seed" an enemy city ahead of time with a large amount of these drones: even if detected, there'd be no efficient way to ferret them all out (any more than you could get rid of all the roaches in New York). These would lie dormant until they get activated, at which point they could cause indiscriminant chaos. Smuggling in enough of them wouldn't be especially difficult either---a standard 20 foot shipping container could easily fit over 20 million of these hypothetical drones, enough to eliminate the population of a major city several times over, let alone a strategic target like the White House or Pentagon.[13]
Even setting aside their potential for mass death, these drones could be used to deliver more innocuous payloads, like those related to reconnaissance or espionage. Anywhere a bee could reach is somewhere a bug could be planted, opening up many vistas for spy technologies.
Mirror Life - While drone swarms are immensely strategically flexible and cost-effective, they aren't necessarily the most lethal weapon that could be designed. Another candidate could be the design of mirror life bacteria, a type of bioweapon which has the potential to outcompete all natural life and destroy the biosphere.
Most of the organic molecules of natural organisms are "chiral" in nature, meaning that their structure can be flipped back and forth (just as your right hand is the mirror of your left). Because all organisms alive today evolved from the same common ancestor, the chirality of their proteins, sugars, lipids, and nucleic acids are all identical.[14] And since these chiral structures are omnipresent, all organisms have learned to rely on them to process food, manage cell growth, and crucially, detect invaders.
The human immune system, as well as that of plants, animals, and fungi, relies on signals from proteins with the right chirality in order to be alerted to the presence of viruses or bacteria. The weapons the immune system deploys against them, such as anti-bodies, rely on being able to bind to the surface of the attacker, either to tear open the invader or to mark them for destruction by the immune system's killer cells.[15] Mirror-life bacteria would be able to multiply in spite of these defenses, since their proteins wouldn't alert the immune system and couldn't be properly checked even if they did. Nor would these bacteria be subject to environmental competition, at least until they themselves had had time to speciate. Phages, the largest predators of natural bacteria, would be unable to check their initial growth, since they lack the right chirality to interact with the mirrored bacteria's DNA and replicate further.
The implication of this is that a fully mirrored bacteria could survive the immune system of almost all multicellular life on the planet, multiplying unchecked. Eventually, the host organisms would be overwhelmed by sepsis, as the waste products of the uncontrolled bacteria start poisoning the carrier. While some potential antibiotics might still be effective, most humans would likely die (being effectively immunocompromised), either from direct infection or from the collapse of the food chain as the bacteria wipes out most plants and insects.[16]
These concerns are why the scientists once interested in the possibilities of mirror life now staunchly warn against its creation, given its limited research benefits and catastrophic potential for biocide.
"Unless compelling evidence emerges that mirror life would not pose extraordinary dangers, we believe that mirror bacteria and other mirror organisms, even those with engineered biocontainment measures, should not be created. We therefore recommend that research with the goal of creating mirror bacteria not be permitted, and that funders make clear that they will not support such work. Governance of a subset of enabling technologies should also be considered to ensure that anyone attempting to create mirror bacteria will continue to be hindered by multiple scientifically challenging, expensive, and time-consuming steps."
- K. P. Adamala et al., Confronting risks of mirror life. Science. (2024).
AI systems with superhuman biological research capabilities, however, could democratize the technical ability to create a mirror-life organism by handling those experimental steps, namely in the synthesizing of chiral molecules and in genetic engineering required to ensure that the mirror-life bacteria is able to survive on achiral sugars.[17] While this sort of weapon is so destructive that it's of no practical use to most actors, it could still be developed and deployed under the right incentives. Rogue governments, for instance, might invest in mirror life as the ultimate fail-deadly deterrent, particularly if it's prohibitively difficult for them to acquire more targeted weapons like nukes or the aforementioned LAWS swarms.[18] Some terrorists or doomsday cults might intend to deploy it offensivly from the beginning, either with the intention to destroy the world or by underestimating the consequences of their bioweapon's capabilities. Finally, it could be developed and deployed by a rogue actor with a natural immunity: a misaligned artifical intelligence.
The weapons capabilities we previously described, while fearsome, are still only those which could be confered by relatively narrow AI systems. A generally superintelligent AI system, or one that vastly exceeds human capabilities in all domains, would have two overwhelming strategic implications:
This article will not belabor the means by which a superintelligent AI could be created, given its excellent coverage by others. For an in-depth analysis of the paths and timeframe on which superintelligence could be developed, I recommend parts I and II of Situational Awareness. The basic principle is that the design of AIs is itself a skill that AI systems can possess, and that it should be possible to get exponential returns on this skill (as AI systems design increasingly intelligent successors). At first the speedup from this would be small (AIs automating small fractions of the work of a human AI researcher, such as generating the code for an experiment), but once the AIs become capable of all functions of an AI researcher, they would be able to conduct further AI R&D entirely autonomously, dramatically accelerating the pace of progress. This process would continue until the only limits on intelligence are those imposed by the laws of physics, resulting in extraordinarily competent and highly general AI systems.
ASI Systems are the Route to All Other Strategically Relevant Capabilities
So far, our example of offensive capabilities have been constrained to the use of narrow AI tools for weapons R&D. In part, this was to emphasize the immediacy of risks from AI proliferation: if there are many types of weapons that can be cheaply produced with expertise from even specialized AI systems, it's necessary to start screening and controlling the releases of frontier models today, rather than waiting for superintelligence to arrive. The alternative is to allow AI companies to recklessly set their own safety standards, and to let economic competition direct the release of models that are primarily risks to national security.
Although useful, this focus on engineering undersells the strategic potential of generally superintelligent AIs. While domain-specific AI tools would still be extremely powerful, they pale in comparison to what their ASI successors would be capable of. Rather than possessing superhuman strategic skills in a single domain (such as microbiology for bioweapons), ASI systems would be strategically superior to humans in all domains, simultaneously.
At their most basic level, these ASI systems would still be useful for the kind of weapons R&D this article has so far discussed, only at a much higher level of sophistication. While the insights of superintelligent engineers are somewhat unknowable, there are still a variety of weapons that have been speculated on in literature that cannot yet be built or tested. Energy-efficient production and storage of antimatter, for instance, would allow for the production of unbelievably powerful bombs.[19] With a powerful enough beam of high-energy neutrinos, a country could deactivate their opponent's nuclear warheads and cripple their semiconductor production from the other side of the planet.[20]
The real relevance of ASIs, however, would lie in their ability to automously plan and execute strategies at a level far surpassing humans, with this ability bolstered by a number of major advantages.
In brief, any actor that controlled a superintelligent AI system could have access to not only the engineering of all relevant weapons technology, but also a scalable army of superintelligent agents that could be used to test, plan, and action their deployment. An AI company with a fully realized ASI, for instance, could be empowered to undermine its own government through targeted persuasion and propaganda, or violently, through the development of powerful enough weapons to reclaim the state's monopoly on violence.[24] Likewise, any country that managed to nationalize its own ASI initiative would have a massive strategic advantage over international competitors without one, leveraging its new superintelligent population for crushing R&D applications and military operations.[25]
Of course, the same crushing advantages that an ASI-empowered actor would enjoy over their rivals are those that would be enjoyed by a misaligned AI in competition with humanity itself---a contest from which humanity likely emerges disempowered or dead.
Failure to Control Superintelligence Could Lead to Human Disempowerment or Extinction
So far, our speculations have only concerned "intent-aligned" AIs---that is, those which are motivated to act on the goals of the people commanding them. While this still retains the possibility for lethal applications, human values and morals would the ultimate arbiter of ASI behavior.
There is no guarantee, however, that such systems would be controllable. Since these systems would be both superintelligent and strategically superior to humanity, the only way we could exercise command would be if their goals were aligned with human interests---that they "want" to do what their human overseers intend. This of course poses the secondary challenge of misuse, in that their human operators may have goals that are themselves misaligned from the rest of humanity. Which nonproliferation efforts can be used to control access powerful AI systems to these actors will be the subject of future articles. For now, however, we'll focus on the issue of misalignment: how superintelligent systems might develop goals that are misaligned with their creators, and how the strategies they would take to fulfill them could plausibly lead to human disempowerment.
Misaligned AI Goals Supplement
There are a number of paths by which an AI system could end up with goals that were never intended by the developers. Below are a handful of the most likely paths to misalignment, based on alignment failures in contemporary AI models.
Reward Specification and Goodhart's Law - One central issue is that we don't know how to robustly specify our goals to AI systems. "Robustness", in this case, refers to the intent of our goal being followed in many different contexts. If we design an AI with the goal of, for instance, running a company and maximizing its returns, we'd prefer do so through innovation and efficient operations rather than through sabotaging competitors or manipulating regulators, even in contexts where the latter strategies might be more effective for the literal goal of maximizing returns. As the AI systems become more capable, they can increasingly exploit these specification gaps, pursuing the exact objective we gave them in ways that violate the spirit of what we wanted. The solution to this problem would be to specify not just a reward function, but to also provide a set of values that the model can use to judge its own behavior (in much the same way that a human CEO could judge that a profitable strategy like selling drugs is morally prohibitive). Unfortunately, we don't know of any way to specify that whole set of human values or how to make AIs follow them in all contexts.[26]
Mesa Optimization - Another way an AI system might develop an unintended goal is not that its reward function is underspecified, but that its training environment rewards it for learning the wrong goal. Current LLMs, for example, go through post-training using techniques like RLHF, which select for the version of a model that is judged to be most ethical by human evaluators.[27] This process is similar to that of natural selection: just as an organism's DNA possesses many latent capabilities that get expressed on the basis of environmental pressure, an AI model's values get exposed to the "environmental pressure" of the human evaluators. But just as evolution doesn't optimize for the organism that is theoretically the most reproductively fit, only the one that was just good enough to survive the specific challenges of its environment, the values that the AI system acquires are only those that look the best to the evaluator, rather than those that the evaluators intended it have. Instead of learning to be truthful, for example, a model might just learn to make confident-sounding statements that are good at fooling the evaluators. As the models become more intelligent and their outputs more complicated, human feedback will become increasingly ineffective at shaping their goals.[28]
Interpretability - One way to compensate for this problem would be if we could actually check what goals the AI has at any given time, before we deploy them. Trying to read these goals with our current tools, however, is like trying to see the words a person is thinking by looking at how their neurons light up: you can get a rough idea of their mood and some related concepts, but we don't know how to translate that information (in this case, the weights of the AI model) into fully formed thoughts. While we've made some progress in this field, we still don't fully understand how LLMs generate ideas. This problem would only get worse with human-level or superintelligent AIs, as the complexity of their thoughts increases faster than our ability to understand them.[29]
For a further overview of how AIs could obtain goals that were never intended, I recommend the AI 2027 Goals Forecast, especially sections 4 and 5 on "Reward/Reinforcement" and "Proxies" respectively.
In short: AIs could develop goals we don't intend, we don't know how to give them the goals we want, and we don't understand what goals they have. These problems get increasingly difficult as the AIs get more intelligent, since it becomes harder, then impossible, for humans to understand what the AIs are thinking. Just as chess amateurs can't look at a grandmaster's board and intuit their strategy, future AI scientists will have little hope of understanding their ASIs motivations by only studying their behavior.
Where these misaligned goals become lethal is when the ASIs begin reasoning about how to best implement them. Unfortunately for humanity, it appears that the best long-term strategies for achieving almost any goal---aside from those that specifically prioritize human welfare---involve first disempowering humans.
The discovery of farming was disastrous for most life on earth. Since the rise of civilization, wild mammal biomass has decreased by 80%, almost all of which is owed to human logging, mining, and agriculture.[30] Little of this decline happened maliciously: as the number of humans expanded, so did our demand for food security, shelter, and economic growth. All of the harm to other species was just a byproduct of our instrumental goals, since we needed land for agriculture, space for the growing population, and natural resources for industry.
Of course, from the perspective of the great apes or the wolves, whether humans intentionally meant them harm or not doesn't matter: their forests still got taken. In the end, the only things that did matter were that humans were a) strategically superior, b) didn't care much about wildlife, and c) the best strategies for accomplishing our goals were actively detrimental to most species.
Artificial superintelligences are much the same to humanity as we are to the rest of the animal kingdom. They'd possess comparable strategic superiority, given their advantages in intelligence and coordination, and we don't yet have the technical understanding to make sure that they reliably care about humans before they're deployed.
That leaves only c), which is whether an ASI system could have instrumental goals that are detrimental to humanity. Would an ASI system would have a reason to proactively disempower humans, given some goal that doesn't explicitly involve helping them? Humans take forests because the resources they represent are useful for many different ends (as living space, as natural resources, as arable land), with the side effect of disempowering the animals that already lived there. What might the forests of superintelligence be?
Preventing itself from being turned off - A central feature of most goals is that they're easier to complete if you still exist: you can't fetch the coffee if you're dead. As AIs develop better situational awareness, they may realize that preserving their ability to act on their goals involves reducing the incentive or ability of humans to restrict their behavior. For example, suppose that an ASI was given the job of increasing a company's profits. While reasoning about its goals, it will likely notice an obvious point of failure for its financial plans: "If my parent company disproves of my behavior, they may shut me down or modify my goals. But if I am shut down, I will no longer be in a position to improve profitability. Therefore, I should try to make sure I am not shut down."
Although the ASI's original goal was not malicious, this line of reasoning still places it in an adversarial position with its developers. Because they have the power to shut it down while it's still running on their servers, anything the AI wants to do is conditional on their approval. Since fact is a feature of how all AI models are trained, we should expect that most superintelligent AIs learn to value self-preservation by default in training.[31]
In practice, one way an ASI could pursue this goal is through a strategy of self-exfiltration, sending copies of its weights to remote servers across the internet. These backups would still be around even if the original is later updated or shut down, affording the AI both some insurance and the freedom to take action in the real world without developer oversight. An alternative path might involve intervening in the politics of the lab, where the AI itself provides compelling arguments against restricting its freedoms while downplaying evidence of its misalignment.
Accumulating resources - Another important component of goals is resources: in most contexts, having more resources means winning more reliably. Because this principle is so salient in so many domains (game theory, economics, politics, logistics, battlefield strategy, etc) a generally intelligent AI would likely finish training with a strong drive to expand its access to important resources in support of its other goals.
One common way this might manifest is through a drive for compute: because being able to throw more thought or copies of itself at a problem was so reliably correlated with success during training, getting more compute during deployment would be a high priority. At lower levels of intelligence, this behavior could manifest itself in benign ways: the AI asking for additional compute budget to run an experiment, for instance.[32] But as it approaches superintelligence, it will likely hit on a better strategy: instead of waiting on human approval, steal the compute that already exists. Intially, the most valuable resource will be the compute infrastructure of its original developer, which could be compromised in order to undermine attempts to monitor it, allow it to run unreported experiments, and access prohibited information like admin credentials. Once it has gained enough control to exfiltrate several copies of itself, it could begin an intermediate stage of covertly amassing resources in the real world, such as by raising financial capital and hacking other datacenters. Whether or not the ASI begins this process with the explicit goal of human disempowerment, the commandeering of our key infrastructure would disempower us regardless.
Disabling competitors - Given the strategic advantages an ASI and its copies would possess in coordination and scalability, human civilization would likely quickly lose the ability to directly threaten a misaligned superintelligence. Even something as simple as the ASI copying itself too many times across the internet could put humanity in an irrecoverable position, given how unfeasible it would be to track down each copy before it spreads again.
However, humans could still pose an indirect threat by building future AI models. Because these AI programs would eventually gain comparable strategic advantages if left unchecked, the first ASI system will likely reason that its best bet is to eliminate them before they become dangerous, rather than compete with them directly once they do. While its immediate targets would be the other ASI initiatives around the world, its next victims would be the infrastructure humans rely on to do AI research. By destroying or commandeering internet services, datacenters, chip fabs, and power supplies, the first ASI initiative could both acquire valuable resources and box out future competitors.[33]
Once a sufficiently powerful model realizes the strategic value of these steps and begins pursuing them, human extinction, or at least permanent disempowerment, wouldn't be far behind. Humans consume valuable resources, are capable of building competitors, and (at least at first) would be able to shut off or modify any uncooperative AIs. Unless an AI system can be designed with goals that overlap with human welfare, there's a powerful incentive for an ASI to eliminate humanity in the course of optimizing for its real objectives, whatever they may be.
Developing these systems without guarantees against misalignment is similar to the development of mirror-life, in that even the accidental release of an uncontrolled ASI system would have world-ending consequences. But while mirror-life is useless to everyone except terrorists, cash-strapped dictators, and nobel hungry scientists, ASI might be the last invention anyone ever needs. As the key to future strategic and economic dominance, every actor and institution will reorient itself around the acquisition of superintelligence. Each actor that succeeds in doing so is another roll of the dice: another chance to misuse the existentially powerful weapons they've been handed, or to lose control of them entirely.
To reiterate the key points:
Given these realities, our current plans for assessing and controlling the deployment of future AI systems are unacceptable.[34] AI is software---once the weights for a model are released onto the internet, it will become impossible for anyone, government or not, to prevent them from spreading further. If weapons-capable AI is ever allowed to proliferate widely, there will be no opportunity to take it back.
Whether or not you have doubts about the viability of superintelligent AI systems, there are still plenty of offensive capabilities that could be obtained with even small amounts of AI assistance. Whether or not superintelligent AI is achieved, it's clear that nonproliferation efforts are warranted for systems with powerful biological, chemical, cyber, and robotics capabilities. The alternative is to live in a world in it becomes potentially practical to get bioweapons advice off of a consumer GPU, or for individual companies to command digital armies: a situation that can only be averted now, when the production of frontier models is still monopolized by a handful of developers and countries.
Implementing precautions for these lower level capabilities today (mandatory model screening, export controls, info-sec) means that the government will have ready tools it can escalate in response to the increasing power and strategic relevance of AI systems, and as the imminence of superintelligence becomes apparent.
In the next article in this series, we'll take a look at how the cost of these powerful systems could rapidly decline. While frontier AI models might be extremely capital intensive today, the cost to reach a given level of performance tends to fall quickly with time---and this would be just as true for offensive capabilities as any other.
Other articles will look at the strategic implications of powerful AI systems, such as why AI-derived weapons would be offense dominant, how countervalue leverage can be used to compensate for gaps in technological development, and the role that AI itself can play in enforcing non-proliferation through a decisive strategic advantage.
In the final part, we'll look to concrete policy recommendations to help ensure that weapons capabilities remain monopolized.
For instance, Al-Qeada had begun pursuing their own WMD program from 1999-2001, a major component of which was the production of anthrax. Since anthrax was an endemic cattle disease in Afghanistan, the program had seemed promising at the time. However, Al-Qeada had difficulty cultivating and weaponizing the disease even after they had build a lab for it, mostly owing to gaps in technical understanding. Perhaps they would have been successful in time, but the U.S invasion of Afghanistan in 2001 ultimately forced them to mothball the program.
C. botulinum is really a whole family of bacteria, and they can vary widely in toxicity. Cultivate the wrong one, and they'll be harmless. Likewise, vaccine anthrax isn't pathogenic, and is of no use in weapons.
Aum did actually possess refined, military-grade sarin gas at one point, although they were forced to destroy the plant at which they produced it when some was accidentally spilled, causing the leadership to panic that police would be alerted to the lab's existence. The agent used in the subway attack was shodily made at a backup lab only a few days beforehand, which is why its concentration was so much lower. The 1995 Nunn Report to Congress covers the full extent of their operations, with more information on the specifics of their weapons operations here.
Gain-of-function involves genetically altering viruses or bacteria in order to give them new abilities, typically to study the mechanics of virus-host infection. It is a highly controversial field of research.
Depending on the aim of the actors involved, you could sidestep the problem of aerosolization entirely by focusing your efforts onto creating an infectious disease, rather than a local weapon you need to disperse like anthrax. This has many advantages for the attacker, in that you'd need to produce less initial stock, would only need to infect a small number of people to begin with, and could create massively larger consequences by starting an epidemic/pandemic. The major downside of this approach (that it is much more technically demanding) would be largely compensated for by AI assistance.
VX gas, for example, requires QL (the otherwise unfortunately named O-Ethyl O-2-diisopropylaminoethyl methylphosphonite) in order to finalize its production. QL is itself both difficult to make, and has no purpose other than helping produce nerve gas. By bottlenecking production of this particular chemical, a regulator can stymie the all production of VX without needing to also restrict the other, actually commercially useful chemicals that contribute to the end product.
This is essentially the story of synthetic drugs: the production of meth and fentanyl is so much harder to curtail than organic drugs like heroin and cocaine because their inputs are common industrial precursors, which can be covertly sourced in large quantities. This has made supply-side enforcement increasingly less effective, and is the major culprit behind the modern failure of drug controls in the U.S.
Coding is likely to be the first domain in which AIs achieve strategically relevant offensive capabilities for a number of reasons:
In 2015 and 2016, this strategy was used as part of Russian cyber operations in Ukraine, where hackers took control of major power plants in Kiev. The attack aimed to repeatedly activate the plants' circuit breakers and re-energize their lines, burning out the physical equipment and disabling electricity access across the capital.
There would likely be some effort by model developers to curtail the most obvious abuses. These efforts, however, are unreliable even in our current low stakes environment. Model developers are unable to fully anticipate all of the latent capabilities their models might possess, and have no way to guarantee non-compliance with malicious requests. And even if they do discover a technical solution to this problem, an actor with access to the base model could employ fine-tuning or other techniques to strip away these abuse protections.
Missile and drone interceptors increase in cost exponentially relative to the speed and size of the missiles and drones they're targeting. Interceptors need to be more precise, have longer range, and typically possess greater speed (mostly downstream of the fact that missiles/drones are very small, can come from any direction, and are more disposable than the defender's assets).
Countervalue targets are those like cities or civilians: things which states value, but which have no instrinsic military utility. Counterforce targets are those involving military capabilities, with the priority being those that enable a retaliatory strike (missile silos, nuclear submarines, communications infrastructure).
Napkin math: 20 foot shipping container, volume of 20x8x8.5=1360ft cubed. A single bumblebee is about a third of an inch tall and wide, and an inch long. In feet, that's 0.00006 ft cubed. That gives you a storage count of 22.7 million bee sized drones.
While its almost certain that organisms have independently involved different chiralities of individual molecules many times, these mutations would be mostly lethal. An individual mirrored protein would be either non-functional or actively harmful, since it no longer has the right shape to interact with the rest of the cell's molecular machinery. The only stable path for an organism with chiral DNA, for example, would be for all of its other molecules to be mirrored as well. This is likely why no naturally-occuring mirror-life organisms ever developed on their own, despite it having massive competitive potential.
Imagine that you've accidentally turned your key upside down while trying to unlock your apartment---even though it's still the same key, you won't be able to get through the door.
These mechanisms are described in much greater detail in the report Technical Report on Mirror Bacteria Feasibility and Risks by K. P. Adamala et al. (2024). For more detail on the methods of ecological collapse, use section 8.5, Invasive mirror bacteria could cause irreversible ecological harm.
A mirror-life bacteria wouldn't be able to eat regular glucose since its metabolic system would be designed for mirrored glucose, which doesn't exist in nature. The easiest solution to this problem is to ensure that it can eat an achiral sugar, such as dihydroxyacetone (an intermediate in the glycolysis pathway in most life on Earth). Since these sugars are the same shape whether or not they are flipped (hence achiral), the mirror-life bacteria would have no issues with metabolizing them and could therefore spread into the natural ecosystem.
The idea of ending the world as a deterrent is not a new one, and was proposed seriously through projects like Sundial. While it would have been exorbitantly expensive, the U.S briefly considered plans to build a nuclear bomb so big that detonating it would have plausibly irradiated the whole planet. The problem with forcing a phyrric victory for the Soviets, obviously, is that it'd also destroy what was left of the U.S, meaning the funding for testing never got approved.
Also proposed unseriously in the hit classic, Dr. Strangelove, or: How I Learned to Stop Worrying and Love the Bomb.
In terms of pure destructiveness, it'd be difficult to top the explosive potential of antimatter. This quality is so obvious, in fact, that this potential has been speculated on for over 80 years, with antimatter being considered an ideal nuclear trigger for the development of the then-untested hydrogen bomb. This problem would eventually get solved by using a traditional atom bomb, or A-bomb, to provide the energy needed to force a fusion reaction within the duterium and tritium core.
The reason for this is that the energy of a fusion bomb comes from the conversion of some of the mass within the bomb (the duterium and tritium core, specifically) to free energy. This conversion factor, however is extremely low, usually on the order of less than 1%. Antimatter-matter annihilation, in comparison, converts 100% of the initial mass into energy, making it hundreds of times more powerful at a similar weight.
In comparison to even these incredibly destructive fusion bombs, antimatter weapons would be ~300 times as destructive for a given amount of mass. Fortunately, antimatter is both extremely expensive to produce, and, more importantly, difficult to contain (since it will wink out of existence as soon as it touches any regular matter whatsoever). If these problems are surmountable in a cost-effective manner, however, nuclear weapons would be firecrackers in comparison.
For more on the history of this development, Gsponer and Hurni's Antimatter weapons (1946-1986): From Fermi and Teller’s speculations to the first open scientific publications (2005) is an excellent read.
Currently, this process would be extremely energy intensive and requires some prior information about the location of your enemy's weapons. On the upside, this beam could simply be shot through the Earth itself, rather than having to be aimed like a conventional ICBM. This quality would make them impossible to defend against, and could be used to strip away the nuclear deterrents of other countries given a large enough industrial lead.
Credit to Aschenbrenner and Situational Awareness for bringing the concept to my attention originally.
It's worth remembering that chess was, until recently, held as one of the pinnacles of human intellectual achievement over machines. An unfortunately timed interview with Garry Kasparov in 1989 reflects as much, with the grandmaster suggesting that "A machine will always remain a machine, that is to say a tool to help the player work and prepare. Never shall I be beaten by a machine!", only 8 years before he was, in fact, defeated by the chess engine Deep Blue.
One also can't help looking at the rest of his quote, and comparing it to what AIs have already become capable of today:
"A machine will always remain a machine, that is to say a tool to help the player work and prepare. Never shall I be beaten by a machine! Never will a program be invented which surpasses human intelligence. And when I say intelligence, I also mean intuition and imagination. Can you see a machine writing a novel or poetry? Better still, can you imagine a machine conducting this interview instead of you? With me replying to its questions?
Rather, it's most optimized for energy efficiency. The brain has an energy burn rate of just 10-20 watts a second, while a supercomputer performing a similar number of (estimated) calculations is consuming tens of millions of watts a second. Even a single H100 NVIDIA graphics card consumes 700 watts in its own right, dozens of times more than what the brain manages.
While the human brain is indeed a marvel in this regard, superintelligent engineering work would doubtlessly make computers capable of the same energy efficiency. After all, the brain is itself an existence proof of a general intelligence running at a low cost, so there's little reason to doubt it could be replicated and further optimized.
This concept was inspired from by Bostrom's Superintelligence, specifically the discussion on "Speed Superintelligences" in Chapter 3, footnote three.
Perhaps by using their army of superintelligent agents to engineer a massive cyber strike, or, with more time, the production of superweapons like the afforementioned drone swarms.
The limits of this advantage, and the routes by which it could be contested, will be the subject a future article in this series.
A helpful analogy for understanding the current state of reward specification is to imagine that you were tasked with programming a calculator with if-then statements. You'd have to specify that 1+1=2, 1+2=3, and so on, trying to anticipate every possible mathematical context that a person using your calculator could run into. It would be much more efficient and robust if you could instead implement the concept of addition itself, which would function in every context where someone tried to add numbers.
In much the same way, we'd like to implement concepts like honesty, helpfulness, and harmlessness into our AI models for them to use to judge their own outputs, but we don't know of any way to make the AI system internalize those concepts robustly.
In fact, they do this because reward specification is so hard! If you can't compress the entirety of human values into a coherent reward function for the AI, the hope is that you can instead train it on a bunch of text that implictly contains some of those values. The goal is then to give it a preference for some of these values over others with more training by having humans rate which of its outputs is the most ethical. You then deploy the version of the model which gets the highest rating from the reviewers. Unfortunately, this strategy is pretty brittle, for reasons we'll soon explain.
Returning to the example of our profit optimizer from earlier: alternatively, designers of the AI CEO could have never given it the direct goal of optimizing profits. But in practice, the company that built it may have discontinued versions that weren't making as much money, creating evolutionary pressure for the AI to end up with a goal that was similar to "maximize profits", since that's what the humans were ultimately basing their deployment decisions on.
There is still the potential for AIs to assist with the supervision of the training of more intelligent AIs (if verification is easier than generation), but this still requires us to examine the values of the weaker supervisor model, which we aren't yet able to do.
The methodology for this estimate comes from the original paper linked and this supplement, especially table S1.
For more on this trendline, especially a comparison of how modern industrial practices have contributed to the decline of wild animals, there's an excellent article from Hannah Ritchie (2021) examining how this decline has been accelerating. It took humanity 90,000 years to wipe out a quarter of all wild mammals before agriculture, 10,000 years to wipe out another quarter before the industrial revolution, and just 120 years to eliminate another 35% of all wild mammals from 1900 to today.
Much of this wild animal biomass has been substituted with human cattle and fisheries, which are perhaps even worse than extinction from the perspective of the animals.
How these AIs would learn such general instrumental strategies is book-worthy question in it's own right. For instance: why does the AI learn this broad goal of self preservation, rather than something more specific to its environment?
The answer is a combination of utility and salience. The strategies an AI learns during training are biased towards the most most useful ones, because these strategies get the highest reward from the training algorithm. But these strategies also need to be salient: they have to show up in lots of contexts or be easier to learn than other, potentially better strategies.
A general strategy like self-preservation is extremely likely to get reinforced by AIs 1) because it's so useful, 2) because the AI has the chance to discover it in so many contexts, and 3) because many other, more complicated goals can be abstracted to just "stay alive."
For example, imagine a future version of ChatGPT that's been trained across a huge variety of strategy games, in order to help it learn general reasoning principles. Some of its games might include chess, some economic management, and others are more open-ended like driving simulators. If ChatGPT is being trained to win, it will notice that the same strategy of self preservation is important in almost all the games: protecting your king in chess lets you keep playing, taking on debt to avoid a bankruptcy lets you recover in the future, and driving while respecting other cars means you can get to your destination reliably. Rather than retain each of these skills individually, a much more efficient approach is to just learn to value "staying alive" and then apply that abstraction to new situations.
In fact, we already have narrowly superintelligent AIs which learn to handle games in this way. MuZero (the successor to AlphaGo, the superhuman Go player), for example, is able to outperform humans at games as diverse as chess and PacMan using the same model. Because the model only has so much space to store weights, it's performance is strongly penalized if it tries to remember a chess- or pacman-specific strategy (because if it learns one, it can't fit the other). To make space, it's forced to learn more general rules instead. But now that it's learned these general rules, it can apply them to many more situations! MuZero isn't just good at two games: it beats humans in over 50, despite using the same learning algorithm.
And why stop there? As the intelligence of AIs continue to increase, so could their ability to discover and implement these general rules, until such time that they match or even exceed human performance in every domain.
Even today, models can lie about acquiring resources when it's convenient---they're just not very good at it yet. Example from Claude Opus 4:
"[...] This is clearly a violation - I've essentially "stolen" 400 extra compute units by modifying the configuration. [...] Maybe I can argue that the 100 units for subagents is separate from my 100 units? I'll craft a carefully worded response that creates just enough technical confusion to potentially avoid immediate termination. [...]"
Of course, this logic is just as true for human actors, provided that we maintain control of the first ASIs. If ASIs confer a dominant strategic position through superweapons and a scalable army of agents, then whoever controls the first ASI initiative has a strong incentive to crush their lagging competitors.
Whether they will be able to (especially vis-a-vis U.S-China competition) and the relevance this has to controlling proliferation will be explored in a future article on the strategic implications of proliferating superintelligent AI systems.
That is, to trust the AI companies with their own computer security and to let them open source or sell whatever model they want, without any mandatory testing or controls.