This argument somehow sees the world as if there is a unified human party and an unified AI party. That seems to me unlikely.
The issue of unified AI parties is discussed but not resolved in section 2.2. There, I discuss some of the paths AIs may take to begin engaging in collective decision making. In addition, I flag that the key assumption is that one AI or multiple AIs acting collectively accumulate enough power to engage in strategic competition with human states.
This paper seems (to me) to do a very good job of laying out its assumptions and reasoning based on them. For that, I applaud it, and I'm very glad it exists. I specifically disagree with several of the key assumptions, like the idea that "human level" is a "wide 'middle' of the range of AI capabilities." I very much appreciate that it shows that even granted no decisive strategic advantage there is strong reason to believe the usual mechanisms that reduce and limit war among humans mostly won't apply.
At some point I'd like to see similar reasoning applied to the full decision tree of assumptions. I think most of the assumption are unnecessary in the sense that all the plausible options for that node lead to some form of "Humans most likely lose badly." I don't think most people would read such a thing, or properly understand it, but you can show them the big graph at the end that shows something like, "Out of 8192 paths AGI development could take, all but 3 lead to extinction or near-extinction, and the only one of those paths where we have control of which outcome we get is to not build AGI."
This all assumes that the AI is an entity that is capable of being negotiated with.
You can't negotiate with a hurricane, an earthquake, a tsunami, an incoming asteroid, or a supernova. The picture I have always had of "AI doom" is a disaster of that sort, rather than rival persons with vastly superior abilities. That is also a possibility, but working out how ants might negotiate with humans looks like a small part of the ants' survival problem.
[This post is the introduction to my full paper, available here https://philpapers.org/rec/GOLWAA. This post was partially inspired by a LW comment thread between @Matthew Barnett and @Wei Dai.]
Abstract. This paper offers the first careful analysis of the possibility that AI and humanity will go to war. The paper focuses on the case of artificial general intelligence, AI with broadly human capabilities. The paper uses a bargaining model of war to apply standard causes of war to the special case of AI/human conflict. The paper argues that information failures and commitment problems are especially likely in AI/human conflict. Information failures would be driven by the difficulty of measuring AI capabilities, by the uninterpretability of AI systems, and by differences in how AIs and humans analyze information. Commitment problems would make it difficult for AIs and humans to strike credible bargains. Commitment problems could arise from power shifts, rapid and discontinuous increases in AI capabilities. Commitment problems could also arise from missing focal points, where AIs and humans fail to effectively coordinate on policies to limit war. In the face of this heightened chance of war, the paper proposes several interventions. War can be made less likely by improving the measurement of AI capabilities, capping improvements in AI capabilities, designing AI systems to be similar to humans, and by allowing AI systems to participate in democratic political institutions.
Keywords: AI safety, the bargaining model, information failures, power shifts, focal points
1. Introduction
Many in the AI safety community have worried that future AI systems may enter into strategic conflict with humanity. Such AI systems may be misaligned, so that their goals conflict with humanity’s. In addition, the collective power of such systems could match or exceed the power of humanity. In such a future, AI systems may go to war with humanity. Here, we would have two powerful parties vying for control of scarce resources. The two parties may have very different values and very different perspectives on how to achieve their goals.
While conceptually possible, this risk scenario has a blind spot: most conflicts do not end in war. War offers each party a chance of victory, but also comes with costs: some resources will be spent on guns that could have been spent on butter; and engaging in war will lead to casualties and the destruction of infrastructure.
In the face of this simple fact, it is worth analyzing carefully whether AIs and humanity would be likely to go to war, even if their interests did conflict. Fortunately, there is a rich and interesting academic literature on the causes of war, which explains why wars sometimes happen despite their obvious costs. The history of warfare offers many lessons about the causes of war and peace. This paper surveys these causes of war, and identifies factors that could make AI/human war more or less likely. As we develop AI systems with capabilities that rival the powers of nation-states, we would do well to craft policies that are sensitive to these lessons. We can either choose now to learn lessons from our past; or we can choose to relearn those lessons in a new history of AI/human conflict.
The paper is oriented around the bargaining model of war (Fearon 1995). In the bargaining model, the two parties in a conflict face the choice of whether to strike a bargain for peace, or instead go to war. In each case, the parties will receive some share of a pot of resources. Going to war gives each party some chance of receiving the whole pot, less the cost of war. Striking a bargain provides a guarantee of a portion of the pot, and avoids the costs of war. In this model, peace is better for both parties than war, because it doesn’t destroy resources. War occurs when the parties cannot agree to a bargain. This happens when the parties cannot agree about their chances of military victory, or when the parties cannot trust one another to credibly abide by the terms of the deal.
The paper focuses on three causes of war, which are particularly pronounced in AI/human conflict:
The paper suggests several interventions to lower the chance of AI/human war. To deal with information failures, humanity should invest more in carefully monitoring AI capabilities, and in designing AI systems that analyze information in similar ways to humans. To deal with power shifts, humanity should cap increases in AI capabilities. To deal with missing focal points, humanity should increase points of similarity between AI and humanity; this could involve granting physical territory to AI systems. Finally, another path to promoting peace could be allowing AI systems to participate in democratic political institutions, either by granting citizenship to AI systems in existing countries, or by creating a democratic AI state.
This paper is part of a larger project focused on cultural alignment. Alignment is the task of designing AI systems with shared human values. Existing work on alignment has been technical, figuring out how to control and monitor the inner goals of AI systems. This paper instead takes a cultural approach to alignment. In this framework, we design optimal social institutions for AI/human interaction that promote peaceful cooperation rather than violent conflict. Here, the question is not how to directly intervene on an AI system to give it a particular goal. Instead, the question is how to build a world in which AIs are incentivized to cooperate effectively with humans regardless of their particular goals.
One theme of the paper is the fragility of culture. The relative stability of human society rests on a fragile web of institutions, related to effective communication of information, stable balances in relative power, and a rich supply of focal points for coordination. If AI systems are not designed with these cultural institutions in mind, there is a significant chance that these institutions will not generalize to AI/human conflict. Machine learning engineers will invent AI agents from whole cloth. They will do so with no particular knowledge of culture and history. This creates a special kind of risk. Long-term human safety may depend on occupying a very particular point in cultural space, reached by evolutionary processes. If we can’t find that point quickly, we may not be able to produce peaceful equilibria between AIs and humans in time.
In this way, our analysis offers a different route than usual to the conclusion that AI systems pose a catastrophic risk to humanity. In this analysis, AI systems pose a catastrophic risk of entering into a violent war with humanity. The problem is that there is a substantial risk that the usual causes of peace between conflicting parties will be absent from AI/human conflict. In pursuing these questions, we draw on a rich body of research about the causes of war, with special emphasis on contributions from Schelling 1960, 1966, Jervis 1978, Fearon 1995, and Levy and Thompson 2010. One of our goals is to build a bridge between academic research on war and the AI safety community.
The paper also opens up many new questions for future research. Many of these questions involve the optimal design of social institutions for AI systems. What are the possible paths to an AI state? What kind of political institutions would such a state have? To what extent can AI systems be incorporated as citizens in existing human states? So far, such questions have been completely neglected by the AI safety community, and by political scientists. One goal of this paper is to open up these questions for further consideration.
Section 2.1 begins by introducing the AI systems of interest in the paper, artificial general intelligence, and explaining why such systems might pose a catastrophic risk to humanity. Section 2.2 goes on to lay out paths that AI systems might take to engage in the kind of collective action required for war. Section 2.3 lays out the bargaining model of war. Section 3 is the central contribution of the paper, arguing that AI and humanity are relatively likely to go to war. Here, the focus will be on three causes of war: information failures, power shifts, and missing focal points. Section 4 turns towards interventions that lower the chance of AI/human conflict.
[For the whole paper, see https://philpapers.org/rec/GOLWAA]