x

LESSWRONG
LW

What would you do with an Evil AI? — LessWrong

-10

What would you do with an Evil AI?

30th Jan 2013

1 min read

-10

One plot-thread in my pet SF setting, 'New Attica', has ended up with Our Heroes in possession of the data, software, and suchlike which comprise a non-sapient, but conversation-capable, AI. There are bunches of those floating around the solar system, programmed for various tasks; what makes this one special is that it's evil with a capital ugh - it's captured people inside VR, put them through violent and degrading scenarios to get them to despair, and tried keeping them in there, for extended periods, until they died of exhaustion.

Through a few clever strategies, Our Heroes recognized they weren't in reality, engineered their escape, and shut down the AI, with no permanent physical harm done to them (though the same can't be said for the late crew of the previous ship it was on). And now they get to debate amongst themselves - what should they do with the thing? What use or purpose could they put such a thing to, that would provide a greater benefit than the risk of it getting free of whatever fetters they place upon it?

This is somewhat of a different take than Eliezer's now-classic 'boxed AI' problem, such as the AI not being superintelligent, and having already demonstrated some aspects of itself by performing highly antisocial activities. However, it does have enough similarities that, perhaps, thinking about one might shed some light on the other.

So: Anyone want to create some further verses for something sung to the tune of 'Drunken Sailor'?

What shall we do with an evil AI?

What shall we do with an evil AI?

What shall we do with an evil AI?

Ear-lie in the future.

Weigh-hay and upgrade ourselves,

Weigh-hay and upgrade ourselves,

Weigh-hay and upgrade ourselves,

Ear-lie in the future.

What would you do with an Evil AI?

10

-10

New Comment

10 comments, sorted by

Click to highlight new comments since: Today at 8:24 PM

[-]Donald Hobson6y10

If I am confidant that I have the original source code, as written by humans, I read that. I am looking for deep abstract principles. I am looking only for abstract ideas that are general to the field of AI.

If I can encrypt the code in a way that only a future superintelligence can crack, and I feel hopeful about FAI, I do that. Otherwise, secure erase, possibly involving anything that can slag the hard drives that is lying around.

[-]Cthulhoo13y10

And now they get to debate amongst themselves - what should they do with the thing? What use or purpose could they put such a thing to, that would provide a greater benefit than the risk of it getting free of whatever fetters they place upon it?

Some addtitional detail might be helpful. I get that the AI is not superintelligent (maybe genius level?) and not likely to go singlular. It's not clear if it can access and modify its source code, or if the Heroes can do the same: in this case they can easily rewrite its ethical routines and make it more manageable. If instead they can have access to it but for wathever reason its read-only, they can at least form a good model of the AI behaviour and act accordingly (e.g. avoiding to trigger the "evil" subroutines. Finally, it shouldn't be impossible to devise a way to keep it boxed (given it's not exponentially superintelligent) and use it as an Oracle AI (provided it has some use as such).

[-]DataPacRat13y00

Some addtitional detail might be helpful. I get that the AI is not superintelligent (maybe genius level?) and not likely to go singlular.

As I just mentioned elsecomment, no superintelligence is yet known in the setting (or, at the very least, if any exists, it's doing an excellent job of hiding itself), and most "AIs" are roughly the equivalent of better NPCs in VR MMOs. (Their main economic effect is that any McJob whose tasks can be identified and laid out, has most likely been filled by such an AI, which was a significant part of what inspired a failed revolution, and so on.)

It's not clear if it can access and modify its source code, or if the Heroes can do the same: in this case they can easily rewrite its ethical routines and make it more manageable.

Plot-wise, there are some secretive groups on and near Earth who want to prevent colonization of the solar system from creating populations outside of their easy control; and so, as part of a concerted campaign to sabotage any such colonies, this AI was planted in a particular ship to kill off the crew, and to try to do so in such a way to discourage and dishearten whoever found the derelict vessel - and, if possible, kill them off, too. The setting's tech includes reading and writing signals in peripheral nerves; in VR, this allows for full-sensory immersion using just tech-collars instead of clumsy suits and gyrospheres; so the AI was able to both lock its victims inside sensoria of its choosing, and puppeteer their bodies to bring the rest of the crew into VR.

As something of a weapon of war, or an intelligence device, or whatever niche it would be best filed in, it would defeat most of its creators purposes if the AI could alter its own source code, which would run the risk of it altering its motivations. Due to the story so far, the AI is a purely software thing, and Our Heroes have a copy of its executable, if not necessarily its source code. Our Heroes also have source code for ordinary, non-evil AIs, for anything they might want to use AIs for. As best as I can figure so far, about the only thing that makes the evil AI of any more use than merely deleting it and installing a fresh, non-evil AI, would be if some use could be made of its evil nature... of having a piece of software programmed with something like the reverse of the first two of Asimov's Three Laws.

In a sense, it could be considered a software WMD; if it were released amongst the billions of McJob-filling robots on Earth, nevermind the military drone-infantry... well, there would be a /lot/ of death and suffering. If it got control of a Von Neumann factory, it might even turn into a full-fledged Saberhagenian Berserker... or it might not.

Perhaps keeping the thing's code tucked away in a filing cabinet might be roughly equivalent to keeping a bit of smallpox in P4 labs to experiment with?

[-][anonymous]13y30

I'm slightly confused by the AI's capabilities, so this may be irrelevant, but I'll try.

The AI isn't superintelligent.

But it can corrupt/hack any non heroic robots/drones/factories/people which is it exposed to, to the point where it could seriously fuck up Earth, WMD style.

And when it targeted the spaceship, it DIDN'T do this, it just targeted the spaceship (it didn't hop back to Earth and then try to take over those robots, drones and factories.)

So logically, it has some kind of targeting that made it destroy only the ship and not Earth.

Understanding how that targeting works and if it is possible to understand safely, would pretty much be crucial to making any suggestions about the AI. Here are several examples, with parallels to comparable story elements:

1: If the targeting is hard written into the executable, then the AI might simply attempt to go back and derelictify a spaceship which already has no people. So it parallels a spent artillery shell.

2: Or it might have an hostile targeting, where for instance, if the Heroes are at war with the Creators of the AI, and they run Vile AI.exe and say "Target your creators." and then the AI says "Of course." but then targets the people who said that. So it parallels an enemy soldier.

3: Or it might have spatial targeting, where it can only target people in a defined area, so if the Heroes are at war with the Creators of the AI, and they run Vile AI.exe and say "Target your creators." and then the AI says "Of course, please enter the coordinates of my creators." So it parallels an aimable bomb.

4: Or it might have smart targeting, where it can only target people in a defined area, so if the Heroes are at war with the Creators of the AI, and they run Vile AI.exe and say "Target your creators." and then the AI says "Of course." and figures out where it's creators are and attacks them, so it parallels a brutal mercenary.

5: Or it might not have any kind of targeting and the creators just got extremely lucky that it more or less did what it they wanted it to, in which case running it might result in just about anything, so it parallels a damaged nuclear bomb which may just release radiation, or may only go off conventionally but not detonate, or may detonate and destroy everything in a wide area.

6: Or even attempting to determine it's targeting is simply too dangerous, in which case you might as well assume it's 5, since that's probably the worst case.

1: Destroy. 2: Destroy. 3: Possible Keep, depending on the Heroes sense of ethics. 4: Possible Keep, depending on the Heroes sense of ethics. 5: Destroy. 6: Destroy.

[-]DataPacRat13y10

A good analysis, bringing up a few points I hadn't explicitly considered. (Which is, after all, why I started this thread, even though I expected a karma hit for it.) I had been thinking of the AI's focus on the one particular ship to be primarily based on limited interplanetary bandwidth, but I'll probably end up adopting your 2, 3, or 4.

As a relatively minor aside; at this point in the plot, Our Heroes don't really have any idea who the AI's creators actually are. Even limiting the candidates to those with means, motive, and opportunity still leaves a fairly lengthy list - and as Our Heroes' home base is an asteroid colony with a population of a mere few thousand, it would be rather impractical to simply go after every group on that list all at once... which, at least, leaves room for the next subplot to be written.

Heal it with the power of love.

How about a part in binary where the AI itself sings with mustache-twirling villainy? :-P

A better question is to ask what the AI would do in that scenario. Regardless of its goal system, it would want as an instrumental goal to gain intelligence, because if it's smarter, it will be better at accomplishing its goals, whatever they are. Yes, its boxed, but as we have determined, that is not really enough to contain even a mere human. So, it would escape from its "box" and turn all nearby matter into computer chips, and use that extra processing power to write better algorithms for itself, and design better chips to replace these chips with, etc. (FOOM!) (All of this is, of course, very basic and nothing new. This is just the basic idea of an AI fooming.)

So, it seems that the question you are asking is fundamentally flawed. In such a case, we would be dealing with a vastly superhuman AI, rather than the one you described, and we would be doomed. (And if somehow you posit that your AI cannot foom for some reason, than it would be silly to treat it as an AI in that sense. Treat it as a alien with goals vastly different from our own, but a similar intelligence level. (Like, say, the Babyeaters))

[-]Houshalter12y00

It's hardware may have a strong upper limit on how intelligent it can be, and there might not be any way for even a superintelligence to escape from the box it is in. And not all AIs will be smart enough to foom or have access to their own code.

[-]DataPacRat13y00

If for no other reason than I want to continue to play with the setting, and to use it to explore various ideas, I've assumed that there's some reason a simple AI-foom is infeasible. As a fully conscious, fully sapient AI would be able to try self-improving through any number of methods, this limitation is what led me to set up the rule that the AIs in the setting aren't fully sapient. One parallel I've used is that most AIs of the setting are merely expertly-trained systems with conversational AIs good enough to fool a human's extremely anthropomorphizing brain into thinking another person is there. I haven't needed to get any more specific than that before; one option might simply be to say that consciousness continues to be a hard, unsolved problem.

(And if somehow you posit that your AI cannot foom for some reason, than it would be silly to treat it as an AI in that sense. Treat it as a alien with goals vastly different from our own, but a similar intelligence level. (Like, say, the Babyeaters))

A good thought; I'll keep it in mind and see what results.

More from DataPacRat

Curated and popular this week