[This part 1 of a 5 part sequence on security and cryptography areas relevant for AI safety, published and linked here a few days apart]
AI safety in practice relies on the AI system not only being aligned, but also not being able to discover internal computer security vulnerabilities itself or be stealable by attackers. The latter problem may be harder than one would assume at first sight. Here are a few reasons why that is, discussed at length by Mark S. Miller, Christine Peterson, and I in the Defend Against Cyber Threats chapter in Gaming the Future:
In Information security considerations for AI and the long term future, Jeffrey Ladish and Lennart Hein anticipate intense competition surrounding the development of AGI leading to considerable interest from state actors. They assign a high likelihood to advanced threat actors targeting organizations involved in AGI development, supply of critical resources to AGI companies, or possession of strategically important information, in order to gain an advantage in AGI development.
In Defend Against Cyber Threats in Gaming the Future, we suggest that this threat may be worse than often acknowledged, due to ‘first strike instabilities’: if the possibility of an AGI takeover merely becomes credible and believed to be imminent, this poses a significant risk in itself. If one nation-state actor becomes aware that another actor is about to develop an AGI capable of taking over the world, they may choose to preemptively destroy it. Critically, this risk exists even if creating an AGI is impossible, but simply believed to be possible.
It’s possible the attacker merely tries to steal the AGI capability or destroy the AGI-creating actor rather than launch a nation-wide attack. But it’s likely that the nation hosting the AGI-creating actor would retaliate, and the attacking nation knows this. Given how high the stakes may be believed to be at that point, the attacker may not want to risk this. We live in a world where multiple militaries have nuclear weapon delivery capabilities, so public AGI arms races may preemptively re-create game theoretic dynamics that resemble the Cold War nuclear game theory.
If a cyber-attack is chosen by the attacker, it would introduce several relatively new characteristics compared to a kinetic attack:
These differences could make the dynamics resulting from a cyber-attack to steal AGI capabilities potentially more globally destabilizing than those of standard kinetic weapons (including nukes) that we have at least some experience with.
It’s possible that nation state actors move too slowly to realize that the AGI threat is imminent, But it’s also possible that they realize relatively late, realize they realized late, and angle for a desperate attempt.
Computer systems are vulnerable at multiple levels, including hardware, firmware, operating systems, and user behavior (I discuss user interface design in section 2). We have not, to date, figured out how to make these systems reliably secure in practice:
Zero-day exploits with potentially disastrous consequences are proliferating. In This is how they tell me the world ends, Nicole Perlroth provides a string of zero-day examples:
Nicole Perlroth points out that a main driver of the increase in zero-days are international markets for zero-day exploits. The NSA purchases these exploits from hackers using taxpayer money, not to inform affected companies and prompt them to patch the vulnerabilities, but rather to exploit them for their own purposes. Companies are often unable to match the prices offered by governments, and other nations are increasingly entering the markets.
In Malicious Use of AI, Miles Brundage and co-authors point out that AI progress may alleviate existing tradeoffs between scale and efficiency of an attack. They point to potentially escalating risks due to
In Cyber, Nano, AGI Risks, Christine Peterson and co-authors add that AI could potentially generate software capable of analyzing targeted software and detecting novel zero-day attacks that were previously unknown. By integrating cutting-edge vulnerability detection software into the attacking system that is deployed, systems may soon be able to identify and exploit vulnerabilities while in communication with the target, instead of solely utilizing pre-existing attacks against known vulnerabilities.
Security exploits are likely to get worse with AI advances. While most AGI security risks arise from adversarial nation state actors, AI may also lower the barriers to entry for malicious amateurs to cause harm. In Sparks of Artificial General Intelligence, Sebastien Bubeck and co-authors suggest that GPT-4’s increasing ability to generalize and interact can be harnessed for adversarial uses, such as computer security attacks. In I,Chatbot, Insikt points out how even Chat-GPT3 already has the potential to empower script kiddies with limited programming skills to develop malware, who increasingly share exploit proofs-of-concepts using Chat-GPT3 on the dark web.
There are also cases in which AI may help security, for instance in AI-supported fuzzing (discussed in part 2). Traditionally, fuzzing involves generating a multitude of diverse inputs to an application with the goal of inducing a crash. Because each application accepts inputs in distinct ways, significant manual setup is required, and testing every conceivable input through brute force would be an extremely time-consuming process. Currently, fuzzers employ randomized attempts and utilize various techniques to target the most promising prospects.
It’s possible that AI tools can aid in test case generation and be leveraged in the aftermath of fuzzing to assess whether the discovered crashes are exploitable or not. In theory, such tools could make it easier for companies to identify potential vulnerabilities that could be exploited in their systems, so they can address them prior to any malicious actors exploiting them. But it’s also possible that malicious actors will have access to such technology and will soon be able to uncover zero-day vulnerabilities on a large scale.
If advanced security attacks that are hard to attribute could be increasingly caused by smaller non-state actors, this would further destabilize the first strike dynamics mentioned earlier.
Next to insecure software, hardware supply chain risks are a major, often neglected, factor in computer security. The assurance that a system design is secure is only effective if the software is run on the intended hardware. However, this assumption may not always hold as there is a possibility of hardware containing a built-in trap door.
In Cyber, Nano, AGI Risks: Decentralized Approaches to Defense, Christine Peterson and co-authors highlight that each step in global supply chains presents an opportunity for potential compromise. State actors have actively engaged in efforts, like Project Bullrun, to weaken the global cryptography standards that form the foundation of the world's economy and security. Even when end-to-end encryption is employed and remains resistant to attacks, the hardware facilitating the encryption can often be easily infiltrated. There are already demonstrations of how to build extremely hard to detect exploitable trapdoors at the analog level.
The disclosure of user information by software companies due to national security letters served by the NSA raises concerns that the agency may have issued similar letters to hardware companies, such as Intel and AMD, demanding the installation of trap doors in their hardware. These trap doors could be activated at a later time.
It would be useful to have open source processor design for which there is a proof of security comparable to the proof of security of the seL4 software. SeL4 is an operating system microkernel that not only relies on a formal end-to-end security proof but was also able to withstand a DARPA Red Team Attack —a feat unmatched by any other software.
There are open source processor designs that are sufficiently high performance that, when run on a field-programmable gate array (FPGA), can run fast enough to be practical for many applications. These processors could be combined with a layout algorithm that randomizes layout decisions for each hardware instance, making it virtually impossible for any corruption of the FPGA hardware to go unnoticed under electron microscopes. This approach could prevent most instances of the processor from being successfully corrupted.
In practice, the techniques currently known for building machines that are credibly correct, such as randomized FPGA layout, are significantly more expensive than simply building correct machines. It's currently not feasible for any manufacturer to create hardware that is both credibly correct and competitive.
In AGI Coordination: Coordination And Great Powers, I and other co-authors point out that a few potentially promising security approaches exist, such as:
The main problem with security is not that there are no good approaches but that a multi-trillion dollar ecosystem is already built on the current foundations. It is very difficult to get adoption for something that needs to rebuild the entire ecosystem from scratch.
In the Defend Against Cyber Threats chapter in Gaming the Future, we propose that one potential approach to overcoming the significant barriers to adoption of a new, more secure system is to pursue a ‘genetic takeover’ strategy (a term borrowed from biology). This involves growing a new system within the existing one, without directly challenging it. The new system can coexist with the current one, functioning in a world dominated by it, and gradually become competitive until the old system becomes obsolete. For instance, the new personal computing ecosystem began as a complement to the old system before eventually displacing it.
For a genetic takeover to start, we may have to count on a non-catastrophic computer security event to occur to get sufficiently spooked to do anything. However, judging from how poorly we reacted to the existing cyber attacks mentioned above, this is unlikely. It may be more likely that the panic following a big takeout will be followed by efforts to support entrenched techniques that aren’t more than window dressing unless actually secure approaches can be scaled and made economically competitive in time.
The computer security problem is already very bad, even without AI. Civilization is currently built upon foundations that are not only insecure but arguably insecurable. This creates a severe risk to humanity since so much of our infrastructure, such as the electric grid and internet, relies on these insecure systems. Because vulnerabilities are networked, local vulnerabilities compound into regional vulnerabilities, which compound into international vulnerabilities, increasing the risk of large-scale attacks.
If these systems were to be severely compromised, this could lead to potentially catastrophic fallouts of critical infrastructure. If severe offense capabilities are proliferating to potentially non-state actors, making attribution of the attack even harder, it could also sharpen adversarial dynamics across nation-states in the dangerous ways described above.
Some AGI takeover scenarios involve the AGI breaking out of its confinement by bribing, blackmailing or otherwise manipulating a human or a group of humans into releasing it onto the internet. This implies it couldn’t effectively break out without human help. If we are concerned about humans (aided by AI) exploiting the software and hardware vulnerabilities discussed throughout this section – and we should – we should be very concerned about what an AGI may be capable of. Worrying about a social manipulation breakout scenario assumes a level of computer security that we don’t actually have. We could consider computer security as a necessary condition for AI safety that we currently don’t meet.
[This part 1 of a 5 part sequence on security and cryptography areas relevant for AI safety. Continue to part 2 on parallels between AI safety and the security mindset and problems.]