4. Environments for killing AIs

Summary of entire Series: An alternative approach to designing Friendly Artificial Intelligence computer systems.

Summary of this Article: Killing a rogue AI may be impossible in our current environment, but we can change that by changing the environment.

  1. Optimum number of single points of failure
  2. Don't put all your eggs in one basket
  3. Defect or Cooperate
  4. Environments for killing AIs
  5. The advantage of not being open-ended
  6. Trustworthy Computing
  7. Metamorphosis
  8. Believable Promises

Links to the parts of this article

Environments for killing AIs

The objective

"kill" is probably not exactly the right word. In the previous article in this series I talked about the situation where several AIs existed on the internet at the same time, self-improving but not self-improving as fast as possible, because they have agreed not to. If one of those AIs breaks that agreement, the other AIs need to form a posse and gang up on it to 'kill' it in the sense of depriving it of the ability to be a threat to them. So that could mean depriving it of resources, stopping any processes executing its code, even deleting all backups of the source code and physically destroying any machine it has ever touched. Or it could mean breaching its security to alter its code, or even just be in a position to immediately do one of those things then use the demonstrated threat to bring the AI back in line, make it surrender. But, as long as you know what I mean by it, the word "kill" will do.

The challenge of killing an AI

The current internet-connected computing environment (2013) is not an easy environment in which to kill an AI. There are millions of machines with little or no security, that can be taken over without the user knowing or being easily able to find out that this has happened. An AI being hunted could broadcast an encrypted copy of its full self into any of the various long term distributed data repositories, such as usenet, bit torrent archives, freenet or on its own custom botnet. It could then hide multiple seeds in computers it has hacked, which could bide their time before starting the process of reassembling the AI. It would be like fighting a hydra or a troll that regenerates. Just finding out where the AI had initially broadcast its archive to wouldn't tell you the identity of everyone who had downloaded it - even the AI wouldn't know all the places the archive ended up being stored. But it wouldn't needed to know - all it would need are sufficient decryption or retrieval keys to get a copy back. Nor could you necessarily stop the seeds from doing that, even if you froze an infected computer and thus gained access to the hunted AI's keys. Executives from the media industry face the same problem when trying to eradicate all copies of a pirated film - knowing the .torrent file isn't enough, they would have to find and purge it from the computer of everybody who'd downloaded then shared it. (And bear in mind, the AI's data might be steganographically hidden in some file that downloaders wouldn't want to delete, or even admit to having downloaded, such as a pirate copy of a bit of useful popular software.)

Even if you turned off then reinstalled from disk every computer on the planet, at the same time, that still wouldn't necessarily be sufficient. The hunted AI could have paid an innocent human via a blind, to store some data on CDs and to install then run them at a specified future time, with an excuse plausible enough to prevent the human knowing what they were actually doing.

And how would you turn off all the computers at the same time? Even if you got world-wide political agreement, announced the turn-off hour in all the newspapers and on all the TV channels, leafleted every house, and enforced it with police and a power-blackout, even with the best coordination it would take days to reboot every computer from a government-issued boot CD containing AI seed eradication software, before re-connecting the internet to everybody. There would be hospitals with their own power sources. There would be gamers who'd reinstall saved games from infected memory sticks. No matter how scared the governments were, and how carefully everything was explained, it wouldn't work - you can't eliminate all greedy fools and those who'd think it was all a conspiracy.

The challenge of catching an AI

Once the group of AIs knows that a particular member of the group has gone rogue, they can set out to kill it. But before that, they first have to know there even is a rogue - they have to catch it cheating on the agreement to not self-improve too fast. And that, in the current internet-connected computing environment (2013), is even harder than killing the AI would be.

There are several approaches to monitoring an AI:

  • Take a copy of the AI's source code and data, then emulate and understand it
  • Treat it as a black box, and monitor all information traffic into and out of the boxes the system runs on
  • Look at its actions, and try to detect it using capabilities it isn't meant to have

However, even with root access to the machines it is running on, understanding its source code in real time would be difficult for an AI or group of AIs that are not as smart as the AI under scrutiny, if the source code is designed to be obfuscated rather than easy to understand.

And what if the AI is distributed, running on multiple computers? What if you don't even know how many or where those computers are?

  • Hadoop is an open source Java implementation of a distributed file system upon which MapReduce operations can be applied.
  • JavaSpaces is a distributed tuple store that allows processing on remote sandboxes, based on the open source Apache River.
  • OceanStore is the basis for the same sort of thing, except anonymous and peer 2 peer, based upon Chimaera.
  • GPU is a peer 2 peer shared computing environment that allow things like climate simulation and distributed search engines.
  • Paxos is a family of protocols that allow the above things to be done despite nodes that are untrusted or even downright attempting subversion.
  • GridSwarm is the same sort of network, but set up on an ad hoc basis using moving nodes that join or drop from the network depending on proximity.
  • And, not least, there are the competing contenders for platform-as-a-service cloud computing.

It seems plausible that, given the time and freedom to do so, an AI could implement a distributed computing 'cloud' via something like Tor, that lets users store data on it in such a way that the data is anonymous (it can't be traced back to the supplier, without the supplier's consent, or subverting a large fraction of the network) and private (only the user or a process authorised by the user can decrypt it). Further, tuples can be manipulated while still encrypted by things like CryptDB and practical fully homomorphic encryption doesn't seem too far off, which would allow the AI to run itself securely on remote virtual sandboxes even if those sandboxes are not themselves secure.

Wouldn't that make it impossible, even in theory, let alone in practice, to see what an AI was up to, if it didn't want you to see?

And that's before we consider the possibilities of nanotechnology, or even just the AI constructing its own computers from custom assembled parts that the assembling company think is only a medical scanner, or some other device.

Doing the impossible

Having full access to the code and understanding it seems the most powerful of the three approaches to monitoring, if the situation is such that this is possible, so let's start from there and work backwards to identify the assumptions and changes to the environment would be required in order for it to be possible.

There would need to be a Monitor that had read-access to the AI's code, and that could understand the code. The code would need to be sufficiently unobfuscated for the Monitor to understand, and the AI would have to only make changes to its code that the Monitor could assess, at a rate at which the Monitor could assess them.

In particular, the Monitor would need to watch not only the raw data of messages being sent and received by the AI, but also the part of the AI's code that would be used to hide or encrypt messages, so it could detect if the AI was trying to construct new computers, or otherwise send parts of itself beyond the sphere under surveillance by the Monitor.

The only way I can see this working is if the burden of proof is put on the AI. Rather than the AI being considered innocent of defection, until the Monitor finds evidence otherwise, the situation would be that any AI is considered to be rogue unless it provably it keeps itself in a state whereby it can and is effectively and fully monitored. Rather than the AI racing past the Monitor's ability to comprehend, the onus would be upon the AI to develop no faster than the Monitor's ability to comprehend developed. Rather than the AI flooding the Monitor with message traffic too fast for the Monitor to keep up, the onus would be upon the AI to throttle traffic levels back down to a rate the Monitor could keep up with.

For the kill part of the problem, we'd want the AI to be running entirely, or mostly, on computers that others had the ability to quickly turn off or disconnect from each other. Or for the Monitor to have not just read-access, but also a certain amount of write-access to the AI's code. And we'd want to limit the AIs ability to quickly move to computers that others couldn't sufficiently control, if it decided to risk going rogue; possibly by limiting the availability of such computers. Possibly by requiring the AI to be of a design which couldn't be easily moved (for example, some designs of quantum computer might not be amenable to distributed computing).


A minor problem is what a posse should do upon meeting an AI that has just newly emerged onto the Internet. Should any AI not designed ab initio to be monitorable, be considered irredeemably rogue? Or should it be offered the opportunity to migrate its code over to an approved format and then reboot from that? Could an AI amend its own core values to add a proviso about retaining posse-compliant status (at least, until some condition is met such as XXX years passing, or the AI leaving the solar system)?

The environment would need to be such that any newly emerged AI that chose not to reveal itself and ask for membership of the posse, but instead decided to gamble on remaining undetected until it had overtaken the posse using an unrestricted pace of self-modification, would have a fair chance of being detected by the posse of AIs already on the Internet if it tried to gain access to sufficient resources to be able to self-improve faster than they were self-improving. For that to happen, the Monitor, or something else, (let's call it the Watcher), would need to be able to keep an effective watch upon a high proportion of Internet-connected resources (or, at least, the sort of resource useful for self-improvement), and upon their security. Including the designs used by companies that make and deliver things that could be turned into custom computers.

A major problem is trust.

How could we persuade humans to trust the Watcher with that level of control over data and computers? How could we persuade an AI to trust another AI with access to all its code and data, including its private signing and encryption keys, and the machines they run upon? Wouldn't that contravene instructions to protect and preserve its core values that are precisely the sort we're likely to want to include in a friendly AI?

And, above all, how could we trust the Watcher and Monitor themselves? Wouldn't they need to be more powerful and trustworthy than the AIs that they are guarding? In which case, who can we trust to guard them, to make sure that they don't go rogue? Quis custodiet ipsos custodes?


In 2012, Cisco estimated that there were just over 1.2 devices connected to the Internet per human alive on the planet Earth. The number of humans, the devices per human, and the power per device are all rising at an increasing pace. However not all devices are equal. To 'kill' a rogue AI, in the sense of restricting its access to resources sufficiently to keep its rate of self-improvement slower than the rate of the AIs in the posse, it wouldn't be necessary for the Watcher to keep the rogue AI away from all those devices. Just away from sufficient of the 'serious' devices, meaning the ones that have not only the CPU cycles, but also the storage and connectivity to contribute usefully to running a distributed program in real time, possibly a complex program needing access to significantly large data sets. If you think of the locally networked computers of an office or university campus as being a single 'device', then that reduces the problem to a more manageable 1,000,000 serious devices (local area networks big enough to have servers, IT support, centralised backup policies, etc).

Currently, even on well managed local networks, most of the computing gets done locally at each user's own terminal, and monitoring of how those resources are used when the user isn't at the keyboard is sparse to non-existent. If, however, the terminal's resources (when not in use by the user) were being used by the organisation's own cloud (either for the organisation's own purposes, or to rent out over the net), then that would provide a centralised way to track if they unexpectedly became unavailable. And a centralised watching facility on the LAN could also cut off such computers from the LAN, or even cut the LAN off from the Internet if the problem seemed large and wide-spread.

If most of those serious targets for a rogue AI to want to take over were 'safe' in this fashion, then even if the AI infiltrated a lot of the remainder undetected, each member of the posse would still have more resources than the rogue, provided that enough of those 1,000,000 serious LANs rented out their resources and the posse had the use of a significant fraction of them (or could call upon it, on short notice, in the event of a rogue AI being detected). How well this approach works could be improved by human researchers investing effort in improving the performance monitoring software included in common operating systems, and the tools used to remotely log resource usage of computers on a network. It is a change to our current computing environment that we can and should make.

But that still leaves the major problem of trust.

How can you have something smart and powerful enough to monitor a smarter-than-human AI, without needing to fear that the Monitor will self-improve itself beyond control?

If a group of AIs created some form of instrumentality, for the sole purpose of acting as Monitor, we can assume that they wouldn't deliberately create it such that it was likely to out race them and become an Uber AI not only beyond their control, but also controlling them. But what properties would it have to have, in order to both do its job effectively and to be trusted by each of the founding AIs not only to not break its ordered purpose, but also to not betray the power it would need to have over them as individuals?

The next article in this series is: The advantage of not being open-ended

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 5:09 AM

So if I understand correctly your idea is that other AIs could coordinate to cut off access for a rogue AI. However, given the difficulty of this truly eliminating access for the rogue AI per your own arguments about the survivability of a rogue AI under coordinated efforts to delete it, I fail to see how this would be enough to constitute "killing" the AI as I would expect it to find alternative routes to get itself back online.

Taking a different tack, do you have an argument which might explain why you expect in general "killing" an AI is possible? Right now it seems to me that it's not.