This post is authored by Jeffrey Ladish, who works on the security team at Anthropic, and Lennart Heim, who works on AI Governance with GovAI (more about us at the end). The views in the post are our own and do not speak for Anthropic or GovAI. This post follows up on Claire Zabel and Luke Muehlhauser’s 2019 post, Information security careers for GCR reduction.
We’d like to provide a brief overview on:
In a following post, we will explore:
The bulk of existential risk likely stems from technologies humans can develop. Among candidate technologies, we think that AGI, and to a lesser extent biotechnology, are most likely to cause human extinction. Among technologies that pose an existential threat, AGI is unique in that it has the potential to permanently shift the risk landscape and enable a stable future without significant risks of extinction or other permanent disasters. While experts in the field have significant disagreements about how to navigate the path to powerful aligned AGI responsibly, they tend to agree that actors that seek to develop AGI should be extremely cautious in the development, testing, and deployment process, given the failures could result in catastrophic risks, including human extinction.
NIST defines information security as “The protection of information and information systems from unauthorized access, use, disclosure, disruption, modification, or destruction.”
We believe that safe paths to aligned AGI will require extraordinary information security effort for the following reasons:
Okay, but why is an extraordinary effort required?
Even though this is a challenging problem requiring extraordinary effort, we think the investments are worth pursuing, and the current measures are insufficient.
It can be helpful to imagine concrete scenarios when thinking about AI security — the security of AI systems. Gwern recently wrote a compelling fictional take on one such scenario, including some plausible infosec failures, which we recommend checking out. Infosec people often think about systems in terms of attack surface. How complex is the system, how many components does it have, how attackable is each component, etc? Developing a modern AI system like GPT-3 involves a lot of researchers and developers -- the GPT-3 paper had 31 authors! Each developer has their own laptop, and needs some amount of access to the models they work on. Usually, code runs on cloud infrastructure, and there are many components that make up that infrastructure. Data needs to be collected and cleaned. Researchers need systems for training and testing models. In the case of GPT-3, an API is created to grant limited access for people outside the company.
Most of the components described here contain a staggering amount of complexity (see Figure 1). For example, the Linux kernel alone contains over 20 million lines of code. Each piece of hardware and software is a component that could be exploited. The developer could be using a malicious browser plugin that steals source code, or their antivirus software could be silently exfiltrating critical information. The underlying cloud infrastructure could be compromised, either because of underlying exploits in the cloud provider or because of misconfigurations or vulnerable software introduced by the organization.And these are just the technical components. What’s not depicted are the humans creating and using the software and hardware systems. Human operators are generally the weakest part of a computing system. For example, developers or system administrators could be phished, phones used for multifactor authentication systems could be SIM-swapped, or passwords for key accounts could be reset by exploiting help desk employees. This is generally described as social engineering. In short, modern AI systems are very complex systems built by many people, and thus are fundamentally difficult to secure.
Securing a modern AI project against attacks from general cybercriminals — such as ransomware operators, extortionists, etc —is difficult but not extraordinarily difficult. Securing a system as complex as a modern AI system against a state actor or an Advanced Persistent Threat (APT) actor is extremely difficult, as the examples in the reference class failures section demonstrate. We believe it is quite likely that state actors will increasingly target organizations developing AGI for the reasons listed below:
The US, China, Russia, Israel, and many other states are currently acting as if modern and near-term AI capabilities have the potential to improve strategic weapons systems, including unmanned aerial vehicles (UAVs), unmanned underwater vehicles (UUVs), submarine detection, missile detection systems, and hacking tools.
Some of these areas have already been the subject of successful state hacking activities. Open Philanthropy Project researcher Luke Muelhauser compiles several examples in this document, including: “In 2011-2013, Chinese hackers targeted more than 100 U.S. drone companies, including major defense contractors, and stole designs and other information related to drone technology… Pasternack (2013); Ray (2017).”
It’s not evident that espionage aimed at strategic AI technologies will necessarily target labs working on AGI if AGI itself is not recognized as a strategic technology, perhaps because it is perceived as too far out to be beneficial. However, it seems likely that AGI labs will develop more useful applications as they get closer to capable AGI systems, and that some of these applications will touch on areas states care about. For example, China used automated propaganda to target Taiwan elections in 2018 and could plausibly be interested in stealing language models like GPT-3 for this purpose. Code generation technologies like Codex and AlphaCode might have military applications as well, perhaps as hacking tools.
In addition to the direct military utility of AI systems, state-sponsored espionage is also likely to occur for the purpose of economic competition. AI companies have raised $216.6B in investment, with at least a couple of billion raised by companies specifically trying to pursue AGI development which makes them a valuable economic target. As already outlined, stealing IP for AGI development itself or required critical resources is of interest to malicious actors, such as nations or companies. If IP violations are hard to detect or enforce, it makes industrial espionage especially attractive.
The most concerning reason that states may target AGI labs is that they may believe that AGI is a strategically important technology. Militaries and intelligence agencies have access to the same arguments that convinced the EA community that AI risk is important. Nick Bostrom discusses the potential strategic implications of AGI in Superintelligence (2014, ch. 5). Like with the early development of nuclear weapons, states may fear that their rivals might develop AGI before them.
Note that it is the perception of state actors which matters in these cases. States may perceive new AI capabilities as having strategic benefits even if they do not. Likewise, if AI technologies have the potential for strategic impact but state actors do not recognize this, then state espionage is less of a concern.
Next to targeting organizations that develop AGI systems, we also think that organizations that either (a) supply critical resources to AGI labs or (b) research and develop AGI governance strategies are also at risk:
In trying to understand the difficulty of securing AI systems, it is useful to look at notable failures in securing critical information and ways to mitigate them. There is no shortage of examples — for a more comprehensive list, see this shared list of examples of high-stake information security breaches by Luke Muehlhauser. We want to present some recent examples — walking on the spectrum from highly technical state-sponsored attacks to relatively simple but devastating social engineering attacks.
One recent example of a highly technical attack is the Pegasus 0-click exploit. Developed by the NSO group, a software firm that sells “technology to combat terror and crime to governments only”, this attack enabled actors to gain full control of the victim’s iPhone (reading messages, files, eavesdropping, etc.). It was used to spy on human rights activists and was also connected to the murder of Jamal Khashoggi. As outlined in this blogpost by Google’s Project Zero, this attack was highly sophisticated and required a significant amount of resources for its development — costing millions of dollars for state actors to purchase.
On the other side of the spectrum, we have the Twitter account hijacking in 2020, where hackers gained access to internal Twitter tools by manipulating a small number of employees to gain access to their credentials. This then allowed them to take over Twitter accounts of prominent figures, such as Elon Musk and Bill Gates — and all of this was probably done by some teenagers.
Another more recent attack, which also likely relied heavily on social engineering tactics, is the hack of NVIDIA by the Lapsus$ group. This attack is especially interesting as NVIDIA is an important actor in developing AI chips. Their intellectual property (IP) being leaked and potentially being used by less cautious actors could accelerate competitors’ efforts in developing more powerful AI chips while actively violating others' IP norms. Notably, while the target of the hacking group is a target of interest to state actors, we have some first hints that this attack might actually have been conducted by a number of teenagers.
The Twitter account hijacking and recent NVIDIA hack are notable as the required resources for those attacks were relatively small. More significant efforts and good security practices could have mitigated those attacks or at least made them significantly more expensive. Companies that are of strategic importance, or in respect to this article, organizations relevant to the development of AGI systems, should benefit from the best security and be on a similar security level, as national governments — given their importance for (national) security.
Good engineering involves thinking about how things can be made to work; the security mindset involves thinking about how things can be made to fail.— Bruce Schneier
Good engineering involves thinking about how things can be made to work; the security mindset involves thinking about how things can be made to fail.
— Bruce Schneier
In addition to securing information that could play an outsized role in the future of humanity, we think that the information security field also showcases ways of thinking that are essential for AI alignment efforts. These ideas are outlined in Eliezer Yudkowsky’s post Security Mindset and Ordinary Paranoia and follow up post Security Mindset and the Logistic Success Curve. The security mindset is the practice of looking at a system through the lens of adversarial optimization, not just looking for ways to exploit the system, but looking for systemic weakness that might be abused even if the path to exploitation is not obvious. In the second post, Eliezer describes the kind of scenario where the security mindset is crucial to building a safe system:
…this scenario might hold generally wherever we demand robustness of a complex system that is being subjected to strong external or internal optimization pressures… Pressures that strongly promote the probabilities of particular states of affairs via optimization that searches across a large and complex state space…Pressures which therefore in turn subject other subparts of the system to selection for weird states and previously unenvisioned execution paths… Especially if some of these pressures may be in some sense creative and find states of the system or environment that surprise us or violate our surface generalizations…
This scenario describes security critical code like operating system kernels, and even more so describes AGI systems. AGI systems are incredibly dangerous because they are powerful, opaque optimizers and human engineers are unlikely to have a good understanding of the scope, scale, or target of their optimization power as they are being created. We believe this task is likely to fail without a serious application of the security mindset.
Applying the security mindset means going beyond merely imagining ways your system might fail. For example, if you’re concerned your code-writing model-with-uncertain-capabilities might constitute a security threat, and you decide that isolating it in a Docker container is sufficient, then you have failed to apply the security mindset. If your interpretability tools keep showing your models say things its internal representation knows are false, and you decide to use the output of the interpretability tools to train the model to avoid falsehoods, then you have failed to apply the security mindset.The security mindset is hard to learn and harder to teach. Still, we think that more interplay between the information security community and the AI alignment community could help more AI researchers build skills in this area and improve the security culture of AI organizations.
We think that labs developing powerful AGI systems should prioritize building secure information systems to protect against the theft and abuse of these systems. In addition, we also think that other relevant actors, such as organizations working on strategic research or critical suppliers, are increasingly becoming a target and also need to invest in information security controls.Our appeal to the AI alignment and governance community is to take information security seriously now, in order to build a firm foundation as threats intensify. Security is not a feature that’s easy to tack on later. There is ample low-hanging fruit — using up to date devices and software, end-to-end encryption, strong multi-factor authentication, etc. Setting up these controls is an excellent first step and high investment return. However, robust information security controls will require investment and difficult tradeoffs. People are usually the weak point of information systems. Therefore, training and background checks are essential. Information security needs are likely to become much more demanding as AGI labs, and those associated with AGI labs, are targeted by increasingly persistent and sophisticated attacks. In the recent Lapsus$ attacks, personal accounts were often targeted to gain access to 2FA systems to compromise company accounts. Tools like Pegasus, already utilized by agencies in dozens of countries, could easily be leveraged against those working on AI policy and research.
While we could make many more specific security recommendations, we want to emphasize the importance of threat awareness and investment in secure infrastructure, rather than any specific control. That being said, if you do have questions about specific security controls, or want help making your org more secure, feel free to reach out! Part of our goal is to help organizations just starting to think about security get started, as well as helping existing organizations to ramp up their security programs.
In this post, we’ve presented our argument for the importance of information security for the long term future. In the next post, we’ll give some concrete suggestions for ways people could contribute to the problem, including:
In the meantime, you can engage with others on related discussions in the Information Security in Effective Altruism Facebook group.
Many thanks to Ben Mann, Luke Muehlhauser, Markus Anderljung, and Leah McCuan for feedback on this post.
It's probably not possible to prevent nation-state attacks without nation-state-level assistance on your side. Detecting and preventing moles is something that even the NSA/CIA haven't been able to fully accomplish.
Truly secure infrastructure would be hardware designed, manufactured, configured, and operated in-house running formally verified software also designed in-house where individual people do not have root on any of the infrastructure and instead software automation manages all operations and requires M out of N people to agree on making changes where M is greater than the expected number of moles in the worst case.
If there's one thing the above model is, it's very costly to achieve (in terms of bureaucracy, time, expertise, money). But every exception to the list (remote manufacture, colocated data centers, ad-hoc software development, etc.) introduces significant risk of points of compromise which can spread across the entire organization.
The two FAANGs I've been at take the approach of trusting remotely manufactured hardware on two counts; explicitly trusting AMD and Intel not to be compromised, and establishing a tight enough manufacturing relationship with suppliers to have greater trust that backdoors won't be inserted in hardware and doing their own evaluations of finished hardware. Both of them ran custom firmware on most hardware (chipsets, network cards, hard disks, etc.) to minimize that route of compromise. They also, for the most part, manage their own sets of patches for the open source and free software they run, and have large security teams devoted to finding vulnerabilities and otherwise improving their internal codebase. Patches do get pushed upstream, but they insert themselves very early in responsible disclosures to patch their own systems first before public patches are available. Formal software verification is still in its infancy so lots of unit+integration tests and red-team penetration testing makes up for that a bit.
The AGI infrastructure security problem is therefore pretty sketchy for all but the largest security-focused companies or governments. There are best practices that small companies can do (what I tentatively recommend is "use G-Suite and IAM for security policy, turn on advanced account protection, use Chromebooks, and use GCP for compute; all of which gets 80-90% of the practical protections Googlers have internally) for infrastructure, but rolling their own piecemeal is fraught with risk and also costly. There simply are not public solutions as comprehensive or as well-maintained as what some of the FAANGs have achieved.
On top of infrastructure is the common jumble of machine-learning software pulled together from minimally policed public repositories to build a complex assembly of tools for training and validating models and running experiments. No one seems to have a cohesive story for ML operations, and there's a large reliance on big complex packages from many vendors (drivers + CUDA + libraries + model frameworks, etc.) that is usually the opposite of security-focused. It doesn't matter if the infrastructure is solid when a python notebook listens for commands on the public Internet in its default configuration, for example. Writing good ML tooling is also very costly, especially if it keeps up with the state of the art.
AI Alignment is a hard problem and information security is similarly hard because it attempts to enforce a subset of human values about data and resources in a machine-readable and machine-enforceable way. I agree with the authors that security is vitally important for AGI research but I don't have a lot of hope that it's achievable where it matters (against hostile nation-states). Security means costs, which usually means slow, which means unaligned AGI makes progress faster.
One thing I wonder: does this reduce the effective spending a company would be willing to make on a large model, given the likelihood of any competitive advantage it’d lend being eroded via cybertheft?
Could this go some of the way to explaining why we don’t get billion-dollar model runs, as opposed to engineering-heavy research which is naturally more distributed and harder to steal?
I'd expect companies to mitigate the risk of model theft with fairly affordable insurance. Movie studios and software companies invest hundreds of millions of dollars into individual easily copy-able MPEGs and executable files. Billion-dollar models probably don't meet the risk/reward criteria yet. When a $100M model is human-level AGI it will almost certainly be worth the risk of training a $1B model.
This could mitigate financial risk to the company but I don't think anyone will sell existential risk insurance, or that it would be effective if they did
In my experience the moment you touch anything open source, or anything third party not explicitly designed from the ground up security-first, you are guaranteed to lose. You can try to harden Linux, but the whole architecture is designed for openness and extendibility, not for security. It also neglects one of the main principles of safe design: subtractive change (more often than not, a goal can be achieved by removing/refactoring a component,then by adding extra). To quote Gordon Bell
“The cheapest, fastest, and most reliable components are those that aren't there.”
I agree that guarding against nation state cyberattacks is extremely hard, but disagree that this is particularly relevant. The proportion of capability-pushing research that travels between ML researchers through espionage is quite low.
Edit: Obviously misaligned AIs should not be leaked.