Malicious non-state actors and AI safety

by keti2 min read25th Apr 202114 comments

2

AI
Frontpage

Here, I discuss the possibility of malicious non-state actors causing catastrophic suffering or existential risk.  This may be a very significant, but neglected issue.

Consider the sort of person who becomes a mass-shooter. They're malicious. They're willing to incur large personal costs to cause large amounts of suffering. However, mass-shootings kill only an extremely small proportion of people. But if they could AGI, they would have the potential to cause vastly greater amounts of suffering. So that is what some may try to do.

It wouldn't surprise me if they'd succeed. To do so, they would need to do two things: acquire the information necessary to create AGI, and be the first one to use it to either destroy or take over the world. Both of these sound pretty possible to me.

A lot of artificial intelligence capability and alignment research is public. If the information necessary to create and/or control AGI is public, then it would be easy for a malicious actor to obtain it.

If the information is private, then a malicious non-state actor could potentially join the organization to attain access to it. They could try to join the company organization to gain access to it. If they act like decent people, it may be extremely difficult to detect malicious tendencies in them.

Even if they can't join the company or organization, they could still potentially just steal the information. Of course, the organization could try to protect their information. But malicious actors trying to take over the world may act quite differently than regular cyber criminals, so it's not clear to me that regular information security would defend against them. For example, such an actor would potentially try physically breaking into the places with the information as well as coercing people for it. Both of these would normally be too dangerous for a regular cyber criminal, so the organization might not sufficiently anticipate the threat.

If a malicious actor can acquire the information necessary for creating AGI, I wouldn't be surprised if they would be able to destroy or take over the world before anyone else could.

First, the malicious actor would need a sufficiently large amount of computational resources. Computers in botnets can be rented out extremely cheaply, and vastly more cheaply than actually purchasing the computers. Or they could hack into massive numbers of computers on their own. People can massively distribute malware, and doing so is vastly cheaper than actually getting the hardware yourself.  I wouldn't be surprised if a malicious non-state actor would be able to have, for a time, more processing power than the competing organizations working on AI have available.

A malicious actor could also make preparations to allow its AGI to take over the world as quickly as possible. For example, they could provide lots of raw materials and physics textbooks to the AI to make it able to create nanotechnology as quickly as possible. If non-malicious actors aren't worrying about this, this may provide a malicious actor with a large advantage.

A malicious actor could also try to get ahead of the other AIs by neglecting safety. Potentially, non-malicious people would carefully review and test any AGI system for safety, but a malicious actor would probably be willing to avoid doing so.

For a malicious actor to establish a singleton assuming a hard takeoff, basically three conditions would be necessary: there is at least one malicious actor, at least one such actor can acquire the code for the AGI, and at least one actor who obtained the information is able to use it to establish a singleton.

I think assigning probabilities 0.5 to each of those conjunctions would be reasonable. All seem quite plausibly correct, and quite plausibly incorrect. I'm not sure what could be argued to justify much lower probabilities than these.

Using my guesses above, that would place probability of about 1/8 that a malicious person would seize control of the world, assuming a hard take off. Is that reasonable?

Currently, there hasn't seemed to have been much work dealing with malicious non-state actors. It seems overly neglected to me. Am I right?

2

14 comments, sorted by Highlighting new comments since Today at 2:53 PM
New Comment

The post Reducing long-term risks from malevolent actors is somewhat related and might be of interest to you. 

I refer you to Gwern's Terrorism Is Not About Terror:

Statistical analysis of terrorist groups’ longevity, aims, methods and successes reveal that groups are self-contradictory and self-sabotaging, generally ineffective; common stereotypes like terrorists being poor or ultra-skilled are false. Superficially appealing counter-examples are discussed and rejected. Data on motivations and the dissolution of terrorist groups are brought into play and the surprising conclusion reached: terrorism is a form of socialization or status-seeking.

and Terrorism Is Not Effective:

Terrorism is not about causing terror or casualties, but about other things. Evidence of this is the fact that, despite often considerable resources spent, most terrorists are incompetent, impulsive, prepare poorly for attacks, are inconsistent in planning, tend towards exotic & difficult forms of attack such as bombings, and in practice ineffective: the modal number of casualties per terrorist attack is near-zero, and global terrorist annual casualty have been a rounding error for decades. This is despite the fact that there are many examples of extremely destructive easily-performed potential acts of terrorism.

so any prospective murderer who was "malicious [and] willing to incur large personal costs to cause large amounts of suffering" would already have far better options than a mass shooting. Since we don't see them, I reject the "effective altruism hypothesis" and wouldn't bother worrying about maliciously non- or anti-aligned AI.

I'm not worried about the sort of person who would become a terrorist. Usually, they just have a goal like political change, and are willing to kill for it. Instead, I'm worried about the sort of person who become a mass-shooter or serial killer.

I'm worried about people who value hurting others for its own sake. If a terrorist group took control of AGI, then things might not be too bad. I think most terrorists don't want to damage the world, they just want their political change. So they could just use their AGI to enact whatever political or other changes they want, and after that not be evil. But if someone who just terminally values harming others, look a mass-shooter, took over the world, things would probably be much worse.

Could you clarify what you're thinking of when saying "so any prospective murderer who was "malicious [and] willing to incur large personal costs to cause large amounts of suffering" would already have far better options than a mass shooting"? What other, better options would they have that the don't do?

What other, better options would they have that the don't do?

Sorry, this is infohazard. You don't want someone to read this text and think "actually, this is a cool idea".

Instead, I'm worried about the sort of person who become a mass-shooter or serial killer. ... I'm worried about people who value hurting others for its own sake.

Empirically, almost or actually no mass-shooters (or serial killers) have this kind of abstract and scope-insensitive motivation. Look at this writeup of a DoJ study: it's almost always a specific combination of a violent and traumatic background, a short-term crisis period, and ready access to firearms.

I think the efforts to focus on issues of 'Mental Health' pay only lip service to this point. We live in a culture which relies on male culture to be about learning to traumatize others and learning to tolerate trauma, while at the same time decrying it as toxic male culture. Males are rightly confused these days, and the lack of adequate social services combined with a country filled with guns that continues to promote media of all sorts that celebrates violence as long as it's 'good violence', is a recipe for this kind of tragedy. Focusing on the individual shooters as being the problem isn't the answer. It is a systemic problem I believe.

This is a good point. I didn't know this. I really should have researched things more.

Even if there's just one such person, I think that one person still has a significant chance of succeeding.

However, more importantly, I don't see how we could rule out that there are people who want to cause widespread destruction and are willing to sacrifice things for it, even if they wouldn't be interested in being a serial killer or mass shooter.

I mean, I don't see how we have any data. I think that for almost all of history, there has been little opportunity for a single individual to cause world-level destruction. Maybe during the time around the Cold War someone could manage to trick the USSR and USA to start a nuclear war. Other than that, I can't think of much other opportunities.

There are eight billion people in the world, and potentially all it would take is one, with sufficient motivation, to bring a about a really bad outcome. Given we need a conjunction with eight billion, I think it would be hard to show that there is no such person.

So I'm still quite concerned about malicious non-state actors.

And I think there are some reasonably doable,  reasonably low-cost things someone could do about this. Potentially just having very thorough security clearance before allowing someone to work on AGI-related stuff could make a big difference. And increasing there physical security of the AGI organization could also be helpful. But currently, I don't think people at Google and other AI place is worrying about this. We could at least tell them about this.

Personally, I think this topic is worth considering since the potential downside of malevolence + AGI is so terrifying. *I have low epistemic confidence in what I’m about to say because serious thinking on the topic is only a few years old, I have no particular expertise and the landscape will probably change radically, in unpredictable ways, between now and AGI. 

For a malicious actor to establish a singleton assuming a hard takeoff, basically three conditions would be necessary: there is at least one malicious actor, at least one such actor can acquire the code for the AGI, and at least one actor who obtained the information is able to use it to establish a singleton.

I think assigning probabilities 0.5 to each of those conjunctions would be reasonable. All seem quite plausibly correct, and quite plausibly incorrect. I'm not sure what could be argued to justify much lower probabilities than these.

I’m not sure this is the best way of framing the probability (but see*). I reckon: 

  • There are many people on the planet who would act malevolently given a real chance to egt their hands on AGI. I’d say a conservative estimate would be the percentage of the population estimated to be psychopathic, which is 1%.
  • The vast majority of these people have a near-zero chance of getting anywhere near it. Just to throw a few numbers around wildly, maybe a very rich, very corrupt businessman would have 1% chance, while someone working on the core AGI development team could have as high as 50%. Then you’d have hackers, out-and-out criminals etc. to consider. This variable is so hard to even guess at because it depends how secret the project is, how seriously people are taking the prospect of AGI, and several other factors.
  • I’m agnostic on whether the 0.5 about the singleton should be higher or lower.

If security isn’t taken very seriously indeed, I don’t think we can disregard this. I’m concerned normalcy bias may cause us to be less prepared than we should be.

I'm not sure most people would have a near-zero chance of getting anywhere.  

If AGI researchers took physical security super seriously, I bet this would make a malicious actors quite unlikely to succeed. But it doesn't seem like they're doing this right now, and I'm not sure they will start. 

Theft, extortion, hacking, eavesdropping, and building botnets are things a normal person could do, so I don't see why they wouldn't have a fighting chance. I've been thinking about how someone could currently acquire private code from Google or some other current organization working on AI, and it sounds pretty plausible to me. I'm a little reluctant to go into details here due to informational hazards.

What do you think the difficulties would that make most people have a near-zero chance of getting anywhere? Is it from the difficulty in acquiring the code for the AGI? Or getting a mass of hacked computers big enough to compete with AGI researchers? Both seem pretty possible to me for a dedicated individual.

Hi! Missed your reply for a few days. Sorry, I'm new here.

I'm not sure most people would have a near-zero chance of getting anywhere.  

I think our disagreement may stem from our different starting points. I'm considering literally every person on the planet and saying that maybe 1% of them would act malevolently given AGI. So a sadistic version of me, say, would probably be in the 98% percentile of all sadists in terms of ability to obtain AGI (I know people working in AI, am two connections away from some really key actors, have a university education, have read Superintelligence etc.), yet my probability of success would be absolutely tiny – like 0.01% even if I tried my absolute hardest. That's what I mean when I say that most people would have a near-zero chance. There are maybe a few hundred (??) people in the world who we even need to consider.

Theft, extortion, hacking, eavesdropping, and building botnets are things a normal person could do, so I don't see why they wouldn't have a fighting chance.

I disagree. Theft and extortion are the only two (sort of) easy ones on the list imo. Most people can't hack or build botnets at all, and only certain people are in the right place to eavesdrop. 

But OK, maybe this isn't a real disagreement between us. My starting point is considering literally everybody on the planet, and I think you are only taking people into account who have a reasonable shot. 

How many people on the planet do you think meet the following conditions?

  1. Have > 1% of obtaining AGI.
  2. Have malevolent intent.

yet my probability of success would be absolutely tiny – like 0.01% even if I tried my absolute hardest. That's what I mean when I say that most people would have a near-zero chance. There are maybe a few hundred (??) people in the world who we even need to consider

Could you explain how you come to this conclusion? What do you think your fundamental roadblock would be? Getting the code for AGI or beating everyone else to superintelligence?]

 

How many people on the planet do you think meet the following conditions?

  1. Have > 1% of obtaining AGI.
  2. Have malevolent intent.

It's important to remember that there may be quite a few people who would act somewhat maliciously if they took control of AGI, but I best the vast majority of these people would never even consider trying to take control of the world. I think trying to  control AGI would just be far too much work and risk for the vast majority of people who want to cause suffering.

However, there still may be a  few people want to harm the world enough to justify trying. They would need to be extremely motivated to cause damage. It's a big world, though, so I wouldn't be surpized if there were a few people like this.

I think that a typical, highly motivated malicious actor would have much higher than 1% probability of succeeding. (If mainstream AI research starts taking security against malicious actors super seriously, the probability of the malicious actors' success would be very low, but I'm not sure it will be taken seriously enough.)

I disagree. Theft and extortion are the only two (sort of) easy ones on the list imo. Most people can't hack or build botnets at all, and only certain people are in the right place to eavesdrop. 

A person might not know how to hack, building botnets, or eavesdrop, but they could learn. I think a motivated, reasonably capable individual would be able to become proficient in all those things. And they potentially will have decades of training before they would need to use it.

yet my probability of success would be absolutely tiny – like 0.01% even if I tried my absolute hardest. That's what I mean when I say that most people would have a near-zero chance. There are maybe a few hundred (??) people in the world who we even need to consider

Could you explain how you come to this conclusion? What do you think your fundamental roadblock would be? Getting the code for AGI or beating everyone else to superintelligence?]

 

My fundamental roadblock would be getting the code to AGI. My hacking skills are non-existent and I wouldn't be able to learn enough to be useful even in a couple of decades. I wouldn't want to hire anybody to do the hacking for me as I wouldn't trust the hacker to give me my unlimited power once he got his hands on it. I don't have any idea how to assemble an elite armed squad or anything like that either.

My best shot would be to somehow turn my connections into something useful. Let's pretend I'm an acquaintance of Elon Musk's PA (this is a total fabrication, but I don't want to give any actual names, and this is the right ballpark). I'd need to somehow find a way to meet Elon Musk himself (1% chance), and then impress him enough that, over the years, I could become a trusted ally (0.5%). Then, I'd need Elon to be the first one to get AGI (2%) and then I'd need to turn my trusted position into an opportunity to betray him and get my hands on the most important invention ever (5%). So that's 20 million to one, but I've only spent a couple of hours thinking about it. I could possibly shorten the odds to 10,000 to one if I really went all in on the idea.

How would you do it? 

 

However, there still may be a  few people want to harm the world enough to justify trying. They would need to be extremely motivated to cause damage. It's a big world, though, so I wouldn't be surpized if there were a few people like this.

Here we agree. I think most of the danger will be concentrated in a few, highly competent individuals with malicious intent. They could be people close to the tech or people with enough power to get it via bribery, extortion, military force etc.

Has this been discussed in detail elsewhere? I only saw one other article relating to this.

I'm not sure if a regular psychopath would do anything particularly horrible if they controlled AGI. Psychopaths tend to be selfish, but I haven't heard of them being malicious. At least, I don't think a horrible torture outcome would occur. I'm more worried about people who are actually sadistic.

Could you explain what the 1% chance refers to when talking about a corrupt businessman? Is it the probability that a given businessman could cause a catastrophe? I think the chance would be a lot higher if the businessman tried. Such a person could potentially just hire some criminals to do the theft, extortion, or hacking. Do you think such criminals would also just be very unlikely to succeed? Attackers just need to find a single opening, and a non-malicious organization would need to defend against many?