Audio reading

Rationalist AI risk public outreach has often emphasized first principles thinking, theory, and logical possibilities (e.g. evolution, gradient decent, human-chimp analogy, etc.) over more concrete tangible empirical findings (e.g. deception emerging in small models, specification gaming, LLMs helping to create WMDs, etc.). I found the recent post below from the AGI governance fundamental provides a great summary.


Developments in AI could exacerbate long-running catastrophic risks, including bioterrorism, disinformation and resulting institutional dysfunction, misuse of concentrated power, nuclear and conventional war, other coordination failures, and unknown risks. This document compiles research on how AI might raise these risks. (Other material in this course discusses more novel risks from AI.) We draw heavily from previous overviews by academics, particularly Dafoe (2020) and Hendrycks et al. (2023).

Bioterrorism

AI advances could worsen the risks of bioterrorism. Hendrycks et al. (2023) provide a useful introduction to this risk. The following is a large excerpt from their paper.

The rapid advancement of AI technology increases the risk of bioterrorism. AIs with knowledge of bioengineering could facilitate the creation of novel bioweapons and lower barriers to obtaining such agents. Engineered pandemics from AI-assisted bioweapons pose a unique challenge, as attackers have an advantage over defenders and could constitute an existential threat to humanity. [...] 

Bioengineered pandemics present a new threat. Biological agents, including viruses and bacteria, have caused some of the most devastating catastrophes in history. It’s believed the Black Death killed more humans than any other event in history, an astounding and awful 200 million, the equivalent to four billion deaths today. [...] [E]ngineered pandemics could be designed to be more lethal or easily transmissible than natural pandemics, presenting a new threat that could equal or even surpass the devastation wrought by history’s most deadly plagues [6].

Humanity has a long and dark history of weaponizing pathogens [...] During the twentieth century, 15 countries are known to have developed bioweapons programs, including the US, USSR, UK, and France. Like chemical weapons, bioweapons have become a taboo among the international community. While some state actors continue to operate bioweapons programs [8], a more significant risk may come from non-state actors like Aum Shinrikyo, ISIS, or simply disturbed individuals. [...]

Biotechnology is progressing rapidly and becoming more accessible. A few decades ago, the ability to synthesize new viruses was limited to a handful of the top scientists working in advanced laboratories. Today it is estimated that there are 30,000 people with the talent, training, and access to technology to create new pathogens [6]. This figure could rapidly expand. Gene synthesis, which allows the creation of custom biological agents, has dropped precipitously in price, with its cost halving approximately every 15 months [9]. Furthermore, with the advent of benchtop DNA synthesis machines, access will become much easier and could avoid existing gene synthesis screening efforts, which complicates controlling the spread of such technology [10]. The chances of a bioengineered pandemic killing millions, perhaps billions, is proportional to the number of people with the skills and access to the technology to synthesize them. With AI assistants, orders of magnitude more people could have the required skills, thereby increasing the risks by orders of magnitude.

AIs could be used to expedite the discovery of new, more deadly chemical and biological weapons. In 2022, researchers took an AI system designed to create new drugs by generating non-toxic, therapeutic molecules and tweaked it to reward, rather than penalize, toxicity [11]. After this simple change, within six hours, it generated 40,000 candidate chemical warfare agents entirely on its own. It designed not just known deadly chemicals including VX, but also novel molecules that may be deadlier than any chemical warfare agents discovered so far. In the field of biology, AIs have already surpassed human abilities in protein structure prediction [12] and made contributions to synthesizing those proteins [13]. [...]

AIs compound the threat of bioengineered pandemics. AIs will increase the number of people who could commit acts of bioterrorism. General-purpose AIs like ChatGPT are capable of synthesizing expert knowledge about the deadliest known pathogens, such as influenza and smallpox, and providing step-by-step instructions about how a person could create them while evading safety protocols [14]. Future versions of AIs could be even more helpful to potential bioterrorists when AIs are able to synthesize information into techniques, processes, and knowledge that is not explicitly available anywhere on the internet. Public health authorities may respond to these threats with safety measures, but in bioterrorism, the attacker has the advantage. The exponential nature of biological threats means that a single attack could spread to the entire world before an effective defense could be mounted. Only 100 days after being detected and sequenced, the omicron variant of COVID-19 had infected a quarter of the United States and half of Europe [6]. Quarantines and lockdowns instituted to suppress the COVID-19 pandemic caused a global recession and still could not prevent the disease from killing millions worldwide. 

A paper by Oxford biosecurity researcher Jonas Sandbrink further clarifies these risks [bolding added]:

This article differentiates two classes of AI tools that pose such biosecurity risks: large language models (LLMs) and biological design tools (BDTs). LLMs, such as GPT-4, are already able to provide dual-use information that could have enabled historical biological weapons efforts to succeed. As LLMs are turned into lab assistants and autonomous science tools, this will further increase their ability to support research. Thus, LLMs will in particular lower barriers to biological misuse. In contrast, BDTs will expand the capabilities of sophisticated actors. Concretely, BDTs may enable the creation of pandemic pathogens substantially worse than anything seen to date and could enable forms of more predictable and targeted biological weapons. In combination, LLMs and BDTs could raise the ceiling of harm from biological agents and could make them broadly accessible.

Disinformation

AI-boosted disinformation is a risk factor for catastrophes, because it undermines societies’ ability to address catastrophes.

A report by authors from three research institutions (Goldstein et al., 2023) examines the impact AI could affect ‘influence operations’ (i.e. "covert or deceptive efforts to influence the opinions of a target audience,"), which could worsen how important decisions are made.

Actors: Language models could drive down the cost of running influence operations, placing them within reach of new actors and actor types. [...]

Behavior: Influence operations with language models will become easier to scale, and tactics that are currently expensive (e.g., generating personalized content) may become cheaper. Language models may also enable new tactics to emerge—like real-time content generation in chatbots.

Content: Text creation tools powered by language models may generate more impactful or persuasive messaging compared to propagandists, especially those who lack requisite linguistic or cultural knowledge of their target. They may also make influence operations less discoverable, since they repeatedly create new content without needing to resort to copy-pasting and other noticeable time-saving behaviors.

Hendrycks et al. (2023) highlight additional aspects of risk from AI-boosted disinformation:

AIs can exploit users’ trust. Already, hundreds of thousands of people pay for chatbots marketed as lovers and friends [21], and one man’s suicide has been partially attributed to interactions with a chatbot [22]. As AIs appear increasingly human-like, people will increasingly form relationships with them and grow to trust them. AIs that gather personal information through relationship-building or by accessing extensive personal data, such as a user’s email account or personal files, could leverage that information to enhance persuasion. Powerful actors that control those systems could exploit user trust by delivering personalized disinformation directly through people’s “friends.”

AIs could centralize control of trusted information. [...] AIs could centralize the creation and dissemination of trusted information. Only a few actors have the technical skills and resources to develop cutting-edge AI systems, and they could use these AIs to spread their preferred narratives. Alternatively, if AIs are broadly accessible this could lead to widespread disinformation, with people retreating to trusting only a small handful of authoritative sources [23]. In both scenarios, there would be fewer sources of trusted information and a small portion of society would control popular narratives.

Authoritarianism, Inequality, and Bad Value Lock-in

Dafoe (2020) describes ways AI could lead to power being concentrated and then misused. Let us consider these mechanisms (and an extra one—oligopolistic markets) in more detail:

Global winner-take-all markets: Whoever leads in selling access to broadly capable AI systems may be able to offer many customers the best deal for a wide range of services. This could greatly concentrate wealth, which would incentivize authoritarian coups (while also making it easier to suppress democratic revolutions, as discussed below).

Oligopolistic markets: In addition to global winner-take-all markets, there may be additional factors driving AIs to be controlled by a small number of people. Hendrycks et al. (2023) mention one aspect of that: “To operate effectively, AIs require a broad set of infrastructure components, which are not equally distributed, such as data centers, computing power, and big data.”

Labor displacement: Historically, new technologies have often automated some jobs while creating new jobs. However, broadly capable AIs would be historically unprecedented. If AIs can do nearly every task at least as well as a human, this may leave few jobs left for humans—especially since AIs do not need to rest, can learn vast amounts of information, and can often complete tasks far more quickly and cheaply than humans.

Authoritarian surveillance and control: AIs can be used to flag content for censorship, analyze dissidents’ activities, operate autonomous weapons, and persuade people. Through such mechanisms, “AIs could allow totalitarian governments to efficiently collect, process, and act on an unprecedented volume of information, permitting an ever smaller group of people to surveil and exert complete control over the population without the need to enlist millions of citizens to serve as willing government functionaries. Overall, as power and control shift away from the public and toward elites and leaders, democratic governments are highly vulnerable to totalitarian backsliding. Additionally, AIs could make totalitarian regimes much longer-lasting; a major way in which such regimes have been toppled previously is at moments of vulnerability like the death of a dictator, but AIs, which would be hard to “kill,” could provide much more continuity to leadership, providing few opportunities for reform” (Hendrycks et al., 2023).

AI tools also contribute to inequality through biased or discriminatory outputs (Turner Lee, et al., 2019). Although these issues are often considered separately from catastrophic risks, it is helpful to be aware of them.

Machine learning models have been found to make discriminatory recommendations that influence who is hired, who receives loans, and who is jailed, among other decisions.

Biased recommendations often result from an AI system being trained with historical data that reflects historical biases. Additionally, AI systems tend to be less accurate for certain groups when they are trained with little data from those groups.

Concentration of power may make it easier to permanently entrench certain values. Hendrycks et al. (2023) argue this is dangerous [bolding added]:

In addition to power, locking in certain values may curtail humanity’s moral progress. It’s dangerous to allow any set of values to become permanently entrenched in society. For example, AI systems have learned racist and sexist views [24], and once those views are learned, it can be difficult to fully remove them. In addition to problems we know exist in our society, there may be some we still do not. Just as we abhor some moral views widely held in the past, people in the future may want to move past moral views that we hold today, even those we currently see no problem with. For example, moral defects in AI systems would be even worse if AI systems had been trained in the 1960s, and many people at the time would have seen no problem with that. We may even be unknowingly perpetuating moral catastrophes today [25]. Therefore, when advanced AIs emerge and transform the world, there is a risk of their objectives locking in or perpetuating defects in today’s values.

(Nuclear) War

Analysts have highlighted several pathways by which AI could increase the risk of war, including nuclear war.

AI could undermine nuclear deterrence. In a world with nuclear weapons, nuclear war is often thought to be prevented (partly) by nuclear deterrence. If a state launched a nuclear strike, it would risk being nuked in retaliation. (Some nuclear states have even signaled their willingness to launch retaliatory nuclear strikes on behalf of allies.) However, advances in AI could undermine states’ retaliatory strike (also called “second strike”) capabilities in various ways. States might be able to use AI to…

…locate nuclear-armed submarines, by using AI to analyze sensor data and perhaps improve other aspects of reconnaissance drone technology. This would be problematic, because states often see nuclear-armed submarines as their most resilient nuclear deterrent, due to the difficulty of locating them. However, the technical plausibility of this is debated (1).

…make mid-air adjustments to strikes against mobile missile launchers, e.g. through satellite image analysis. This would reduce the difficulty of destroying these weapons (which also have deterrent functions).

…asymmetrically improve missile defense, which could neutralize rivals’ retaliatory strikes. However, missile defense has historically been highly difficult.

…execute cyberattacks that disable rivals’ retaliatory capabilities.

AI could inadvertently escalate conflicts. While analysts often consider it unlikely that militaries would give AIs direct control over nuclear weapons, AIs could inadvertently escalate conflicts in other ways. Decision-makers might overly trust AI systems’ flawed recommendations about escalation. Additionally, lethal autonomous weapons without proper oversight could inadvertently initiate violent conflicts or expand their scope.

AI could create other incentives for war. International relations scholars have found that, in many cases, “large, rapid shifts in the distribution of power lead to war” (Powell, 2006). When faced with a rising rival, states may see a need to choose between (i) warring against the rival while the rival is relatively weak, or (ii) later being coerced by a powerful rival. This situation incentivizes war. Above, we saw one way AI might contribute to a large, rapid shift in power: undermining some states’ retaliatory capabilities. Advances in AI might also cause other destabilizing power shifts, such as accelerating economic and technological development in certain states. Imminent power shifts could incentivize rival states to pursue preventive war—or smaller-scale interventions or brinkmanship that spiral into war.

Other Coordination Failures

Dafoe (2020) writes [bolding added]:

A high-stakes race (for advanced AI) can dramatically worsen outcomes by making all parties more willing to cut corners in safety. This risk can be generalized. Just as a safety-performance tradeoff, in the presence of intense competition, pushes decision-makers to cut corners on safety, so can a tradeoff between any human value and competitive performance incentivize decision makers to sacrifice that value. Contemporary examples of values being eroded by global economic competition could include non-monopolistic markets, privacy, and relative equality. In the long run, competitive dynamics could lead to the proliferation of forms of life (countries, companies, autonomous AIs) which lock-in bad values (2). I refer to this as value erosion; Nick Bostrom discusses this in The Future of Human Evolution (2004); Paul Christiano has referred to the rise of “greedy patterns”; Hanson’s Age of Em scenario involves loss of most value that is not adapted to ongoing AI market competition.[6]

In the document he links, Dafoe addresses several objections to this argument. Here are summaries of some objections and responses:

If competition creates terrible competitive pressures, wouldn't actors find a way out of this situation, by using cooperation or coercion to put constraints on their competition?

Maybe. However:

In practice, it may be very difficult to create a politically stable arrangement for constraining competition. This could be especially difficult in a highly multipolar world.

Political leaders do not always act rationally. Even if AI makes political leaders more rational, perhaps it would only do so after leaders have accepted terrible, lasting sacrifices for the sake of competition.

Why is this risk particularly important now?

Advances in AI may greatly expand how much can be sacrificed for a competitive edge. For example, there is currently a limit to how much workers' well-being can be sacrificed for a competitive advantage; miserable workers are often less productive. However, advances in automation may mean that the most efficient workers will be joyless ones (3).

The report “Coordination challenges for preventing AI conflict” (Torges, 2021) raises another class of potential coordination failures. When people task powerful AI systems with high-stakes activities that involve strategically interacting with other AI systems, bargaining failures between AI systems could be catastrophic:

Transformative AI scenarios involving multiple systems (“multipolar scenarios”) pose unique existential risks resulting from their interactions. (4) Bargaining failure between AI systems, i.e., cases where each actor ends up much worse off than they could have under a negotiated agreement, is one such risk. The worst cases could result in human extinction or even worse outcomes (Clifton 2019). (5)

As a prosaic example, consider a standoff between AI systems similar to the Cold War between the U.S. and the Soviet Union. If they failed to handle such a scenario well, they might cause nuclear war in the best case and far worse if technology has further advanced at that point.

Short of existential risk, they could jeopardize a significant fraction of the cosmic endowment by preventing the realization of mutual gains or causing the loss of resources in costly conflicts.

Some might be optimistic that AIs tasked with bargaining will be so skilled at bargaining that they will avoid catastrophic bargaining failures. However, even perfectly skilled negotiators can end up with catastrophic negotiating outcomes (Fearon, 1995). One problem is that negotiators often have incentives to lie. This can cause rational negotiators to disbelieve information or threats from other parties—even when the information is true and the threats are sincere. Another problem is that negotiators may be unable to commit to following through on mutually beneficial deals. These problems may be addressed through verification of private information and mechanisms for making commitments. However, these mechanisms can be limited. For example, verification of private information may expose vulnerabilities, and commitment mechanisms may enable commitments to mutually harmful threats.

Unknown Risks

New technologies often pose risks that are hard to foresee. As Dafoe (2020) mentions, risks from the combustion engine have included "urban sprawl, blitzkrieg offensive warfare, strategic bombers, and climate change"—presumably a hard list to predict in the 1800s. The risks of AI look even harder to predict if we see AI advances as analogous to some of the most transformative events in humanity's past, such as the rise of agriculture.

Rather than being powerless against unknown AI risks, we may be able to (indirectly) prepare for them. We can work to make institutions better at identifying and responding to new AI risks as they emerge. We can also improve our own responsiveness to currently unknown risks, in part by remembering that we might not yet know all the risks.

We have seen how increasingly advanced AI may exacerbate many long-running catastrophic risks. These varied, severe risks warrant intense mitigation efforts, as part of a broader portfolio of actions to realize the benefits and reduce the risks of AI.

Footnotes

(1) Skeptics argue there are enormous challenges in using drones to reliably locate submarines, such as vast search areas, limitations of sensor technology, short battery lives, and low drone speeds. On the other hand, perhaps these problems could be overcome with technological advances (e.g. improved sensor data analysis) and innovative approaches (e.g. using relatively centralized recharging stations or surface-level solar panels to recharge drones).

(2) This is different from the earlier concern about the lock-in of bad values. In an earlier section, we considered how concentration of power could lock in bad values. Here, the concern is that unconstrained competition could lock in bad values. (Of course, which values are bad is highly contested.)

(3) Dafoe does not mention this example; it is based on the sources he references.

(4) [Footnote from the excerpt] I do not mean to imply that this is the only risk posed by multipolar scenarios. For other ones, see for example: Critch, Krueger 2020, Zwetsloot, Dafoe 2019, Manheim 2018.

(5) [Footnote from the excerpt] Note that bargaining failure is not the only cause of catastrophic interactions. For instance, the interactions of Lethal Autonomous Weapon Systems might also be catastrophic.

N

New to LessWrong?

New Comment