AI is here, and AGI is coming. It's quite possible that any work being done now will be futile in comparison to reducing AI risk.

This is one of those things that's unsettling for me as someone who did a Ph. D. in a non-AI area of computer science.

But one of the main vectors by which a bootstrap AGI will gain power is by hacking into other systems. And that's something I can do something about.

Not many appreciate this, but unhackable systems are very possible. Security vulnerabilities occur when there is some broken assumption or coding mistake. They are not omnipresent: someone has to put them there. Software has in general gotten more secure over the last few decades, and technologies that provide extremely high security guarantees have. Consider the verified hypervisor coming out of Bedrock Systems; RockSalt, an unbreakable sandbox;  or sel4, the verified kernel now being used in real safety-critical systems.

Suppose we "solve" security by bringing the vulnerabilities in important applications to near zero. Suppose we also "solve" the legacy problem, and are able to upgrade a super-majority of old software, included embedded devices, to be similarly secure. How much will this reduce AI risk?

To be clear: I personally am mainly interested in assuming this will be solved, and then asking the impact on AI safety. If you want talk about how hard it is, then, well, I won't be interested because I've given many lectures on closely related topics, although some others here may benefit from the discussion.

(When I call something verified or unbreakable, there are a number of technicalities about what exactly has been proven and what the assumptions are. E.g.: nothing I've mentioned provides guarantees against hardware attacks such as Row Hammer or instruction skipping. I'll be happy to explain these to anyone in great detail, but am more interested in discussion which assumes these will all be solved.)


New Answer
Ask Related Question
New Comment

10 Answers sorted by

Remember that security isn't primarily a technical problem. It's an economic/social/game theory problem.

It's not enough to be able to write safe code. You have to be able to deliver it at lower cost than non-safe code. And not lower cost in the long term, either. You have to be able to deliver total system functionality X next quarter at a lower cost. Every incremental change has to independently pass that economic filter. You also have to bear in mind that many of the costs of non-security tend to be externalized, whereas all of the costs of security tend to be internalized.

... unless, of course, you can find a way to change things so that people take a longer view and more of the costs are internalized. But those tend to demand politically difficult coercive measures.

There's also the problem of resistance from practitioners. The necessary discipline can be unpleasant, and there are big learning curves.

Changing what counts as "best practices" is hard.

Also, while I very much support formal methods, I think "unhackable" is overselling things by a lot. To get there, you'd have to be able to specify what would and would not be correct behavior for a big, ever-changing system with huge numbers of subsystems that are themselves complicated. And probably specify the correct behavior for every subsystem in the same general language. And, as you point out, there will always be issues that fall outside the model. The adversary, AI or otherwise, is not required to ignore Rowhammer just because you didn't think about it or couldn't model it.

I'm not saying give up, but I am saying don't underestimate the challenges...

I agree with about everything you said as well as several more criticisms along those lines you didn't say. I am probably more familiar with these issues than anyone else on this website with the possible exception of Jason Gross.

Now, suppose we can magic all that away. How much then will this reduce AI risk?

As others have written, I think you have to get very close to perfection before you get much of a win against the kind of AGI everybody on here is worried about, because you have to assume that it can find very subtle bugs. Also, if you assume it has access to the Internet or any other large selection of targets, it will attack the thing that has not been hardened... so you have to get everything hardened before this very smart adversary pops up. But it sure can't hurt. And it would help other stuff, too. Hey, can I ask an almost unrelated question that you're free to ignore or answer as a private message OR answer here? How good is formal verification for time and space these days?
I must disagree with the first claim. Defense-in-depth is very much a thing in cybersecurity. The whole "attack surface" idea assumes that, if you compromise any application, you can take over an entire machine or network of machines. That is still sometimes true, but continually less so. Think it's game over if you get root on a machine? Not if it's running SELinux. I can speak only in broad strokes here, as I have not published in verification. My publications are predominantly in programming tools of some form, mostly in program transformation and synthesis. There are two main subfields that fight over the term "verification": model checking and mechanized/interactive theorem proving. This is not counting people like Dawson Engler, who write very unsound static analysis tools but call it "verification" anyway. I give an ultra-brief overview of verification in [] I am more knowledgable about mechanized theorem proving, since my department has multiple labs who work in this area and I've taken a few of their seminars. But asking about time/space of verification really just makes sense for the automated part. I attended CAV in 2015 and went to a few model checking talks at ICSE 2016, and more recently talked to a friend on AWS's verification team about what some people there are doing with CBMC. Okay, and I guess I talked to someone who used to do model checking on train systems in France just two days ago. Outside of that exposure, I am super not-up-to-date with what's going on. But I'd still expect massive breakthroughs to make the news rounds over to my corner of academia, so I'll give my sense of the status quo. Explicit state enumeration can crush programs with millions or billions of states, while symbolic model checking routinely handles $10^100$ states. Those are both very small numbers.
Hmm. It looks like my reply notifications are getting batched now. I didn't realize I'd set that up. I've reordered some of this, because the latter parts get into the weeds a lot and may not be worth reading. I advise that anybody who gets bored stop reading there, because it's probably not going to get more interesting. For background, I haven't been doing security hands-on for the last few years, but I did it full time for about 25 years before that, and I still watch the space. I started out long enough ago that "cyber" sets my teeth on edge... STATE OF PRACTICE IN DEFENSE Well yes but... not that much less. A lot of what's done is, shall we say, "aspirational", and a lot of the rest works much better to reduce the rate of damage from human adversaries than it would to resist a total takeover from AGI that had decided a given system was on the critical path for its success. Today, if you're a real-world organization with a significant IT infrastructure, and you hire a skilled human penetration tester (or team), and you give them a reasonable amount of time, and you don't set artificially limiting "rules of engagement", they will almost always reach whatever objectives you set, by default full administrative access to all or almost all of your systems. All the changes over the last decade or two have not, in the end, appreciably reduced the chance of being "owned" by (sufficiently motivated) humans. The cost will be higher, but the probability of success is still close to one if somebody is willing to pay that cost. And the cost isn't really prohibitive. It's more the kind of cost increase that redirects an attacker to another victim than the kind that convinces them to go out of the penetrating business altogether. There are just too many possible ways to get in. The limiting factors are attacker time, expertise, and motivation. In the AGI scenario we're talking about, all three of those limits presumably get a lot less restrictive. ... and that's on the
Really appreciate this informative and well-written answer. Nice to hear from someone on the ground about SELinux instead of the NSA's own presentations.
I phrased my question about time and space badly. I was interested in proving the time and space behavior of the software "under scrutiny", not in the resource consumption of the verification systems themsvelves. It would be nice to be able to prove things like "this program will never allocate more than X memory", or "this service will always respond to any given request within Y time".
LOL! I know a few people who have worked in this area. Jan Hoffman and Peng Gong have worked on automatically inferring complexity. Tristan Knoth has gone the other way, including resource bounds in specs for program synthesis. There's a guy who did an MIT Ph. D. on building an operating system in Go, and as part of it needed an analyzer that can upper-bound the memory consumption of a system call. I met someone at CU Boulder working under Bor-Yuh Evan Chang who was also doing static analysis of memory usage, but I forget whom. So, those are some things that were going on. About all of these are 5+ years old, and I have no more recent updates. I've gone to one of Peng's talks and read none of these papers.

Good security is something that people will happily pay more for when it comes to PCs, smartphones and servers. I bought an iPhone in large part due to Apple's security claims. The big industry players are all making a push to solve the biggest security problems (kill the password!) and that gives me a lot of hope.

The real security nightmare is IoT devices. That probably will require political pressure to solve, since the IoT sub-industry has not responded to consumer pressure yet (people don't want lowest-cost shovelware in every appliance, and yet that is the only option for many categories like TVs.)

It's kind of hard to translate that into actual money changing hands, in ways that probably aren't as obvious when you're a buyer as when you're a seller. One question is "how much more"? It has to be more than the actual cost difference, which could be quite significant, especially in the early days of a new kind of development practice. But the bigger problem is that the buyer usually can't really evaluate whether the security is there or not anyhow. It's a lot cheaper to claim to have security than to actually have it. The same information problem actually applies even with the very biggest corporate buyers. Even the best independent evaluators have trouble, even if they have full source code and are putting in a lot of time and effort (for which they usually want to get paid...). On the ground, what almost every buyer sees is two (or more) products that claim to have "industry-leading security". If you're lucky, both may claim to be have been built following one or many secure process or technology standards. They may even have been audited to those standards (by auditors whose credibility you don't really know). But the standards are incomplete and gameable and it's hard to keep track of which ones cover what. In fact, some of the ones that get trumpeted the loudest are so narrow they're almost meaningless (looking at you, FIPS 140...). And it's almost impossible to create a complete standard that's generally applicable. Products are monstrously diverse and monstrously complicated, and attackers will directly target the areas your standard doesn't cover. You may see a "track record" for a company or even a product... but past performance is very much not a guarantee of future results, and rumors are really unreliable. It's hard for the companies themselves to give useful information about either their own products or others'. Credible, hard-to-fake signals of your own code quality are really hard to define or generate; the standards are about as good as i
I disagree with most of what was said in this comment, but I'm intrigued by everything you've said about IoT. Do you think you could do me a solid and go into more detail about IoT e.g. consumer pressure, political pressure, IoT being less secure (in the context of Zero Trust [] parallel verification), and particularly shovelware?
1Conor Sullivan21d
I'm certainly not an expect in IoT and I'm only reflecting my own experiences as a consumer.

Regarding the cost, I'd expect the road to AGI to deliver intermediate technologies that reduce the cost of writing provably secure code. In particular, I'd expect Copilot-like code generation systems to stay close to the leading edge of AI technology, if nothing else then because of their potential to deliver massive economic value.

Imagine some future version of Copilot that, in addition to generating code for you, also proves properties of the generated code. There might be reasons to do that beyond security: the requirement to provide specs and proofs in addition to code might make Copilot-like systems more consistent at generating correct programs.

I second the other answers that even if we completely solve cybersecurity, there would be substantial AI risk just by having the AI interact with humans, via manipulation, etc.

That said, I think it would close a huge part of the attack surface for the AI. If, in addition to that, suddenly in 2032 we discover how to make humans invulnerable to manipulation, I would feel much better about running experiments with unaligned AI, boxing, etc.

So I'd say it's something like "vastly better cybersecurity is not enough to contain unaligned AGI, but any hope of containing unaligned AGI requires vastly better cybersecurity"

Bringing technical vulnerabilities to near zero probably won't change the AGI safety environment appreciably. It would reduce the attack surface somewhat, but that just leaves everything else. Even humans can easily induce other humans to do things that bypass security measures, and it should be presumed that a superhuman AGI can do that even better. There are also a great many unforced errors that humans make.

This sort of thing should be done as a matter of course, but it's expensive and so far the magnitude of losses don't make it worthwhile. Yes, software failures cost probably tens of billions of dollars per year, but building software to those standards would cost on the order of trillions of dollars per year. Fermi estimate based on verifiable software methodologies taking on the order of 5-10x longer than the usual move fast and break things processes (down from the very much more that it takes currently), and something on the order of 30 million software developers in the world that would have to be multiplied to get the same stuff.

It is more likely that if such an approach became mandatory somehow then we would develop a lot less software, but that too would cost us a great deal in terms of lost opportunities. It's not all written for zero-sum marketing games, it just looks like it sometimes, and it wouldn't help a great deal against superhuman AGI anyway.

I suspect physics sidechannels[0] will be possible for AGI to exploit until we completely solve physics, and that it may be always possible to implement weird machines[1] on physics or biology. Consider physical / biological stenography of computation. Seeking feedback / instruction / comments from physicists / biologists.

I am skeptical that security is solvable. Even if you fix memory corruption, even if you fix business logic by creating programming languages that enable you to mathematically / formally specify the behavior of your application, the interaction of your application with reality, across the silicon/reality boundary, will almost always have leaky abstractions until we thoroughly understand physics and will always fail at the human behavior / game theory / social deception / hidden preferences level.

The current economic / systemic incentives for the construction of our computer / noncomputer systems do not reward doing things "correctly" / "securely" for most use cases (notable exception: aviation but c.f. Boeing 737 MAX[2]). This is a tremendous economic liability regardless of whether or not AGI exists. There are probably useful concrete actions (design a logic programming language that is usable by most existing developers to encode business logic by writing something that resembles math, or push forward static analysis / fuzzing research to eliminate entire classes of software vulnerability). 




It does help somewhat, if your strategy is leveraged in ways that involve directing the attention of the cybersecurity field as a whole. It doesn't help much if your plan is to just hunt for vulnerabilities yourself.

Two things to disclaim. First: we are not within striking distance of making the security of the-internet-as-a-whole able to stand up to a superintelligence. All of the interesting work to be done is in contexts much narrower in scope, like test environments with small API surface area, and AI labs protecting their source code from human actors. And, second: all of the cases where cybersecurity helps wind up bottoming out in buying time for something else, not solving the problem directly.

There are two main scenarios where cybersecurity could wind up mattering.

Scenario 1: The leading lab gets close to the threshold, and tries to pause while they figure out alignment details before they crank up the compute. Some other party steals the source code and launches the unfinished AI prematurely.

Scenario 2: A prototype AGI in the infrahuman range breaks out of a test or training environment. Had it not broken out, its misalignment would have been detected, and the lab that was training/testing it would've done something useful with the time left after halting that experiment.

I wrote a bit about scenario 2 in this paper. I think work aimed at addressing this scenario more or less has to be done from inside one of the relevant major AI labs, since their training/test environments are generally pretty bespoke and are kept internal.

I see some people here saying scenario 1 might be hopeless due to human factors, but I think this is probably incorrect. As a proof-of-concept, military R&D is sometimes done in (theoretically) airgapped facilities where employees are searched for USB sticks on the way out. Research addressing scenario 1 probably looks like figuring out how to capture the security benefits of that sort of work environment in a way that's more practical and less intrusive.

You might be interested in “ML for Cyberdefense” from this research agenda:

This is a very strange approach and something like this has never occurred to me. Cybersecurity has an overwhelmingly massive influence on the human side of AI safety, partly because all of todays AIs are built on computers that have to be protected from human hackers.

Regarding airgaps (and magnet gaps, which can penetrate faraday cages and communicate with an outside magnetometer by influencing current flows on an ordinary chip), EY once wrote that:

If you have an untrustworthy general superintelligence generating English strings meant to be "reasoning/arguments/proofs/explanations" about eg a nanosystem design, then I would not only expect the superintelligence to be able to fool humans in the sense of arguing for things that were not true in a way that fooled the humans, I'd expect the superintelligence to be able to covertly directly hack the humans in ways that I wouldn't understand even after having been told what happened. So you must have some prior belief about the superintelligence being aligned before you dared to look at the arguments.

So it would only be helpful if exponential intelligence increase was slow, stunted, or interrupted, and this is during the catastrophe. Someone recently asked about how dumb an AI would have to be in order to kill a lot of people if it started behaving erratically and began to model its overseers, this might be relevant to that.

I doubt your optimism on the level of security that is realistically achievable. Don't get me wrong: The software industry has made huge progress (at large costs!) in terms of security. Where before, most stuff popped a shell if you looked at it funny, it is now a large effort for many targets.

Further progress will be made.

If we extrapolate this progress -- we will optimistically reach a point where impactful reliable 0day is out of reach for most hobbyists and criminals, and the domain of natsec of great powers.

But I don't see how raising this waterline will help for AI risk in particular?

As in: godlike superintelligence is game over anyway. AI that is comparably good at exploitation as the rest of humanity taken together, is beyond what is realistically defendable against, in terms of wide-spread deployed security level. An AI that doesn't reach that level without human assistance is probably not lethal anyways.

On the other hand, one could imagine pivotal acts by humans with limited-but-substantial AI assistance that rely on the lack of wide-spread security.

Pricing human + weakish AI collaborations out of the world-domination-via-hacking game might actually make matters worse, in so far as weakish non-independent AI might be easier to keep aligned.

A somewhat dystopian wholesale surveillance of almost every word written and said by humans, combined with AI that is good enough at text comprehension and energy efficient enough to pervasively and correctly identify scary-looking research and flag it to human operators for intervention is plausibly pivotal and alignable, and makes for much better cyberpunk novels than burning GPUs anyway (mentally paging cstross, I want my Gibson homage in form of a "Turing Police"/laundry-verse crossover).

Also, good that you mentioned rowhammer. Rowhammer and the DRAM industries half-baked pitiful response are humankinds capitulation in terms of "making at least some systems actually watertight".

While I can't quantify, I think secure computer systems would help a lot by limiting the options of an AI attempting malicious actions.

Imagine a near-AGI system with uneven capabilities compared to humans. Maybe its GPT-like (natural language interaction) and Copilot-like (code understanding and generation) capabilities pass humans but robotics lags behind. More generally, in virtual domains, especially those involving strings of characters, it's superior, but elsewhere it's inferior. This is all easy to imagine because it's just assuming the relative balance of capabilities remains similar to what it is today.

Such a near-AGI system would presumably be superhuman at cyber-attacking. After all, that plays to its strengths. It'd be great at both finding new vulnerabilities and exploiting known ones. Having impenetrable cyber-defenses would neutralize this advantage.

Could the near-AGI system improve its robotics capabilities to gain an advantage in the physical world too? Probably, but that might take a significant amount of time. Doing things in the physical world is hard. No matter how smart you are, your mental model of the world is a simplification of true physical reality, so you will need to run experiments, which takes time and resources. That's unlike AlphaZero, for example, which can exceed human capabilities quickly because its experiments (self-play games) take place in a perfectly accurate simulation.

One last thing to consider is that provable security has the nice property that you can make progress on it without knowing the nature of the AI you'll be up against. Having robust cyber-defense will help whether AIs turn out to be deep-learning-based or something else entirely. That makes it in some sense a safe bet, even though it obviously can't solve AGI risk on its own.

1 comments, sorted by Click to highlight new comments since: Today at 3:44 PM

The least secure part is not in the code, it is in between one's ears, and there is no hope of fixing that. So, there is basically zero chance that switching to unhackable versions of every app and of every IoT device(hah!) would reduce AI x-risk or even extend the timeline appreciably.