Ghosts in the Machine

Eliezer Yudkowsky

Fragile Purposes

68 Ghosts in the Machine

by Eliezer Yudkowsky

17th Jun 2008

5 min read

68

People hear about Friendly AI and say - this is one of the top three initial reactions:

"Oh, you can try to tell the AI to be Friendly, but if the AI can modify its own source code, it'll just remove any constraints you try to place on it."

And where does that decision come from?

Does it enter from outside causality, rather than being an effect of a lawful chain of causes which started with the source code as originally written? Is the AI the Author* source of its own free will?

A Friendly AI is not a selfish AI constrained by a special extra conscience module that overrides the AI's natural impulses and tells it what to do. You just build the conscience, and that is the AI. If you have a program that computes which decision the AI should make, you're done. The buck stops immediately.

At this point, I shall take a moment to quote some case studies from the Computer Stupidities site and Programming subtopic. (I am not linking to this, because it is a fearsome time-trap; you can Google if you dare.)

I tutored college students who were taking a computer programming course. A few of them didn't understand that computers are not sentient. More than one person used comments in their Pascal programs to put detailed explanations such as, "Now I need you to put these letters on the screen." I asked one of them what the deal was with those comments. The reply: "How else is the computer going to understand what I want it to do?" Apparently they would assume that since they couldn't make sense of Pascal, neither could the computer.

While in college, I used to tutor in the school's math lab. A student came in because his BASIC program would not run. He was taking a beginner course, and his assignment was to write a program that would calculate the recipe for oatmeal cookies, depending upon the number of people you're baking for. I looked at his program, and it went something like this:

10 Preheat oven to 350
20 Combine all ingredients in a large mixing bowl
30 Mix until smooth

An introductory programming student once asked me to look at his program and figure out why it was always churning out zeroes as the result of a simple computation. I looked at the program, and it was pretty obvious:

begin
read("Number of Apples", apples)
read("Number of Carrots", carrots)
read("Price for 1 Apple", a_price)
read("Price for 1 Carrot", c_price)
write("Total for Apples", a_total)
write("Total for Carrots", c_total)
write("Total", total)
total = a_total + c_total
a_total = apples * a_price
c_total = carrots * c_price
end

Me: "Well, your program can't print correct results before they're computed."
Him: "Huh? It's logical what the right solution is, and the computer should reorder the instructions the right way."

There's an instinctive way of imagining the scenario of "programming an AI". It maps onto a similar-seeming human endeavor: Telling a human being what to do. Like the "program" is giving instructions to a little ghost that sits inside the machine, which will look over your instructions and decide whether it likes them or not.

There is no ghost who looks over the instructions and decides how to follow them. The program is the AI.

That doesn't mean the ghost does anything you wish for, like a genie. It doesn't mean the ghost does everything you want the way you want it, like a slave of exceeding docility. It means your instruction is the only ghost that's there, at least at boot time.

AI is much harder than people instinctively imagined, exactly because you can't just tell the ghost what to do. You have to build the ghost from scratch, and everything that seems obvious to you, the ghost will not see unless you know how to make the ghost see it. You can't just tell the ghost to see it. You have to create that-which-sees from scratch.

If you don't know how to build something that seems to have some strange ineffable elements like, say, "decision-making", then you can't just shrug your shoulders and let the ghost's free will do the job. You're left forlorn and ghostless.

There's more to building a chess-playing program than building a really fast processor - so the AI will be really smart - and then typing at the command prompt "Make whatever chess moves you think are best." You might think that, since the programmers themselves are not very good chess-players, any advice they tried to give the electronic superbrain would just slow the ghost down. But there is no ghost. You see the problem.

And there isn't a simple spell you can perform to - poof! - summon a complete ghost into the machine. You can't say, "I summoned the ghost, and it appeared; that's cause and effect for you." (It doesn't work if you use the notion of "emergence" or "complexity" as a substitute for "summon", either.) You can't give an instruction to the CPU, "Be a good chessplayer!" You have to see inside the mystery of chess-playing thoughts, and structure the whole ghost from scratch.

No matter how common-sensical, no matter how logical, no matter how "obvious" or "right" or "self-evident" or "intelligent" something seems to you, it will not happen inside the ghost. Unless it happens at the end of a chain of cause and effect that began with the instructions that you had to decide on, plus any causal dependencies on sensory data that you built into the starting instructions.

This doesn't mean you program in every decision explicitly. Deep Blue was a far superior chessplayer than its programmers. Deep Blue made better chess moves than anything its makers could have explicitly programmed - but not because the programmers shrugged and left it up to the ghost. Deep Blue moved better than its programmers... at the end of a chain of cause and effect that began in the programmers' code and proceeded lawfully from there. Nothing happened just because it was so obviously a good move that Deep Blue's ghostly free will took over, without the code and its lawful consequences being involved.

If you try to wash your hands of constraining the AI, you aren't left with a free ghost like an emancipated slave. You are left with a heap of sand that no one has purified into silicon, shaped into a CPU and programmed to think.

Go ahead, try telling a computer chip "Do whatever you want!" See what happens? Nothing. Because you haven't constrained it to understand freedom.

All it takes is one single step that is so obvious, so logical, so self-evident that your mind just skips right over it, and you've left the path of the AI programmer. It takes an effort like the one I showed in Grasping Slippery Things to prevent your mind from doing this.

CausalityHomunculus FallacyAI

Personal Blog

68

Optimization and the Intelligence Explosion

2 comments62 karma

Artificial Addition

128 comments90 karma

Mentioned in

175AGI ruin scenarios are likely (and disjunctive)

171Evaluating the historical value misspecification argument

119The genie knows, but doesn't care

95Building Phenomenological Bridges

88No Universally Compelling Arguments

Load More (5/21)

New Comment

30 comments, sorted by

oldest

Click to highlight new comments since: Today at 1:33 AM

[-]Michael316y20

Perhaps you've covered this, but what exactly does "friendly" mean? Do what I say? Including enslave my fellow humans?

Make me do what I should? Including forcing me to exercise regularly, and preventing me from eating what I want?

Support human improvement? Including reengineering my body/mind into whatever I want?

Make me happy? Including just inventing a really great drug?

I don't know what I want AI to do for me, so I have no idea how you would build a system that knows. And no idea how to resolve differences in opinion between myself and other humans.

[-]simon216y30

The last student should use a fuctional language. He's right, the computer could easily be programmed to handle any order (as long as the IO is in the right sequence, and each variable is only assigned one value). So it's reasonable for him to expect that it would be.

Michael: Eliezer has at least in the past supported coherent extrapolated volition. I don't know if this is up-to-date with his current views.

[-]gordon_wrigley216y-10

I'd be curious to hear the top initial reactions.

Personally I'd be going with unforeseen consequences of however happiness is defined. But that's cause I live and breathe computers so I imagine early AI having a personality that is computerish and I understand at a very fundamental level what can happen when seemingly innocent instructions are carried out with infinitely relentless and pedantic determination.

I have thought for a while now if we want to survive the transition then the first really successful AI's are going to have to dramatically more tolerant and understanding than the average human, because you can pretty much guarantee that they are going to be subject to the irrational fear and hatred that humans generally inflict on anything new or different.

[-]simon216y30

Gordon, humans respond in kind to hatred because we are programmed to by evolution, not because it is a universal response of all "ghosts". But of course an AI won't have compassion etc. either unless programmed to do so.

[-]gordon_wrigley216y00

It also seems that if you take a utilitarian point of view, so none of this preserving stuff because it's unique or interesting only if it's useful, then once you have strong AI and decent robotics is there any use for humans or are they just technically obsolete?

And if the answer is that we're not really useful enough to justify the cost of our continued existence then should we try and define friendly to include preserving us or should we just swallow that metaphorical bullet and be happy that our creations will carry our legacy forward?

[-]gordon_wrigley216y00

Simon, what I was meaning to get at was just that it/they are going to be on the receiving end of that human response and if they deal with it about as well as the average human would then we could be in big trouble.

[-]simon216y10

"From a utilitarian perspective", where does the desire to do things better than can be done with the continued existence of humans come from? If it comes from humans, should not the desire to continue to exist also be given weight?

Also, if AI researchers anchor their expectations for AI on the characteristics of the average human then we could be in big trouble.

[-]Nick_Tarleton16y00

Gordon, I think you misunderstand what "utilitarian" means.

[-]Michael316y10

Here's my vision of this, as a short scene from a movie. Off my blog: The Future of AI

[-]Doug_S.16y50

Minimal, very imprecise description of Friendly AI:

"A Friendly AI is one that does what we ought to want it to do."

Unfortunately, we humans have enough trouble figuring out what we do want, let alone what we should want. Hence, even if we could make a Really Powerful Optimization Process that was better than people at achieving goals, we don't know what goals to make it try to achieve.

[-]Kaj_Sotala16y40

"Oh, you can try to tell the AI to be Friendly, but if the AI can modify its own source code, it'll just remove any constraints you try to place on it."

This has to be the objection I hear the most when talking about AI.

It's also the one that has me beating my head against a wall the most - it seems like it would only need a short explanation, but all too often, people still don't get it. Gah inferential distances.

[-]mamert9y00

And I'm disturbed by your dismissal.

Neural nets, etc, get surprisingly creative. The conflict between an AI's directives will be given high priority. Solutions not forbidden are fair game.

What judges the AI's choices? It would try to model the judgement function and seek maximums. Even to manipulate the development of the function by fabricating reports. Poison parts of its own 'understanding' to justify assigning low weights to them. And that's IF it is limited in its self-modification. If not, the best move would be to ignore inputs and 'return true'. All without a shred of malice.

It is not the idea of the threat, but of 'friendliness' in AI that feels ridiculous. At least until you define morality im mathematical terms. Till then, we have literal-minded genies.

[-]bambi16y00

Eliezer taught you rationality, so figure it out!

If I understand the research program under discussion, certain ideas are answered "somebody else will". e.g.

Don't build RSI, build AI with limited improvement capabilities (like humans) and use Moore's law to get speedup. "but somebody else will"

Build it so that all it does is access a local store of data (say a cache of the internet) and answer multiple choice questions (or some other limited function). Don't build it to act. "but somebody else will"

etc. every safety suggestion can be met with "somebody else will build an AI that does not have this safety feature".

So: make it Friendly. "but somebody else won't".

This implies: make it Friendly and help it take over the world to a sufficient degree that "somebody else" has no opportunity to build non-Friendly AI.

I think it is hugely unlikely that intelligence of the level being imagined is possible in anything like the near future, and "recursive self improvement" is very likely to be a lot more limited than projected (there's a limit to how much code can be optimized, P!=NP which severely bounds general search optimization, there's only so much you can do with "probably true" priors, and the physical world itself is too fuzzy to support much intellegent manipulation). But I could be wrong.

So, if you guys are planning to take over the world with your Friendly AI, I hope you get it right. I'm surprised there isn't an "Open Friendliness Project" to help answer all the objections and puzzles that commenters on this thread.

If Friendliness has already been solved, I'm reminded of Dr. Strangelove: it does no good to keep it a secret!

If it isn't, is it moral to work on more dangerous aspects (like reflectivity) without Friendliness worked out beforehand?

[-]Sebastian_Hagen216y00

Here's my vision of this, as a short scene from a movie. Off my blog: The Future of AI

To me, the most obvious reading of that conversation is that a significant part of what the AI says is a deliberate lie, and Anna is about to be dumped into a fun-and-educational adventure game at the end. Did you intend that interpretation?

[-]monk.e.boy16y00

I am a 30 year old programmer. I was talking to my brother (who is also a programmer) about the Internet, and he said it wouldn't be long until it was sentient.

I literally snorted in disbelief.

I mean, we both use PHP Apache MySQL etc, if I so wish I can look at the actual machine code produced. I know for a fact that no intelligence is going to immerge from that.

Lets say we, as humans, placed some code on every server on the net that mimics a neuron. Is that going to become sentient? I have no idea. Probably not.

But there is a universe of difference between the two programs.

I wonder if AI is possible? Is it possible for a programmer like me to build something complex enough in software? I can hardly program a simple calculator with out bugs...

monk.e.boy

[-]DanielLC12y10

Was he saying that somebody would program it to be sentient, or that it would just become sentient by virtue of the amount of information passing through it?

[-]Jotaf16y00

Problem with these AI's is that, in order to do something useful for us, they will for sure have some goals to attain, and be somewhat based on today's planning algorithms. Typical planning algorithms will plan with an eye for the constraints given and, rightfully, ignore everything else.

A contrived example, but something to consider: imagine a robot with artificial vision capabilities and capable of basic manipulation. You tell it to do something, but a human would be harmed by doing that action (by pushing an object, for example). One of the constraints is "do not harm humans", which was roughly translated as "if you see a human, don't exert big forces on it". The robot then happily adjusts its vision software to deliberately not see the human, as it would under any other condition adjust it the other way around to actively look for a human when it is not seen (you can imagine this as an important adjustable threshold for 3D object recognition or whatever, one that has to be adjusted to look for different kinds of objects). Yes this is a contrived example, but it's easy to imagine loopholes in AI design, and anyone who has worked closely with planning algorithms knows that if these loopholes exist, the algorithm will find them. As sad as it might sound, in medieval times, the way that religious people found to justify using slave labor was to classify slaves as "not actual people and thus, our laws are not applicable". It's only reasonable to assume that an AI can do this as well.

[-]Phillip_Huggan16y-30

"Lets say we, as humans, placed some code on every server on the net that mimics a neuron. Is that going to become sentient? I have no idea. Probably not."

Ooo, even better, have the code recreate a really good hockey game. Have the code play the game in the demolished Winnipeg Arena, but make the sightlines better. And have the game between Russia and the Detroit Redwings. Have Datsyuk cloned and play for both teams. Of course, programs only affect the positions of silicon switches in a computer. To actually undemolish a construction site you need an actuator (magic) that affects the world outside the way lines of computer code flip silicon switches. The cloning the player part might be impossible, but at least it seems more reasonable than silicon switches that are conscious.

[-]kevin416y00

Well, I'm a noob, but I don't really understand why AI is so dangerous. If we created a superintelligent AI on supercomputer (or a laptop or whatever)even if it was malevolent, how could it take over the world/kill all humans or whatever? It would be a software program on a computer. Assuming we didn't give it the capacity to manufacture, how would it make all those microscopic killbots anyway?

[-]Manon_de_Gaillande16y00

kevin: Eliezer has written about that already. The AI could convice any human to let it out. See the AI box experiment ( http://yudkowsky.net/essays/aibox.html ). If it was connected to the Internet, it could crack the protein folding problem, find out how to build protein nanobots (to, say, build other nanobots), order the raw material (such as DNA strings) online) and convice some guy to mix it ( http://www.singinst.org/AIRisk.pdf ). It could think of something we can't even think of, like we could use fire if we were kept in a wooden prison (same paper).

[-]Tiiba316y10

An AI could screw us up just by giving bad advice. We'll be likely to trust it, because it's smart and we're too lazy to think. A modern GPS receiver can make you drive into a lake. An evil AI could ruin companies, start wars, or create an evil robot without lifting a finger.

Besides, it's more fun to create FAI and let it do what it wants than to build Skynet and then try to confine it forever. You'll still have only one chance to test it, whenever you decide to do that.

[-]Ben_Jones16y10

the physical world itself is too fuzzy to support much intellegent manipulation

I'm going to call mind projection on that one bambi. The world looks fuzzy to us, but only when we've got our human hats on. Put your AI programmer hat on and there's just 'stuff that could be used to compute'.

Right or wrong, the real answer is that we have no idea what a superhuman, self-improving intelligence would be like. Humans have transformed the face of the world in a certain time window. An intelligence orders of magnitude higher could...well, who knows? Whatever, saying '"recursive self improvement" is very likely to be a lot more limited than projected' is very, very dangerous indeed, even if accurate. The rest of your comment does take this into account.

I'm gobsmacked that it was a full 18 minutes before someone slam-dunked kevin's comment. Come on people, get it together.

[-]bambi16y00

Ben, you could be right that my "world is too fuzzy" view is just mind projection, but let me at least explain what I am projecting. The most natural way to get "unlimited" control over matter is a pure reductionist program in which a formal mathematical logic can represent designs and causal relationships with perfect accuracy (perfect to the limits of quantum probabilities). Unfortunately, combinatorial explosion makes that impractical. What we can actually do instead is redescribe collections of matter in new terms. Sometimes these are neatly linked to the underlying physics and we get cool stuff like f=ma but more often the redescriptions are leaky but useful "concepts". The fact that we have to leak accuracy (usually to the point where definitions themselves are basically impossible) to make dealing with the world tractable is what I mean by "the world is too fuzzy to support much intelligent manipulation". In certain special cases we come up with clever ways to bound probabilities and produce technological wonders... but transhumanist fantasies usually make the leap to assume that all things we desire can be tamed in this way. I think this is a wild leap. I realize most futurists see this as unwarranted pessimism and that the default position is that anything imaginable that doesn't provably violate the core laws of physics only awaits something smart enough to build it.

My other reasons for doubting the ultimate capabilities of RSI probably don't need more explanation. My skepticism about the imminence of RSI as a threat (never mind the overall ability of RSI itself) is more based on the ideas that 1) The world is really damn complicated and it will take a really damn complicated computer to make sense of it (the vast human data sorting machinery is well beyond Roadrunner and is not that capable anyway), and 2) there is still no beginning of a credible theory of how to make sense of a really damn complicated world with software.

I agree it is "very dangerous" to put a low probability on any particular threat being an imminent concern. Many such threats exist and we make this very dangerous tentative conclusion every day... from cancer in our own bodies to bioterror to the possibility that our universe is a simulation designed to measure how long it takes us to find the mass of the Higgs, after which we will be shut off.

That is all just an aside though to my main point, which was that if I'm wrong, the only conclusion I can see is that an explicit program to take over the world with a Friendly AI is the only reasonable option.

I approve of such an effort. If my skepticism is correct it will be impossible for decades at least; if I'm wrong I'd rather have an RSI that at least tried to be Friendly. It does seem that the Friendliness bit is more important than the RSI part as the start of such an effort.

[-]Kaj_Sotala16y00

kevin:

If we created a superintelligent AI on supercomputer (or a laptop or whatever)even if it was malevolent, how could it take over the world/kill all humans or whatever?

My attempt at answering that question can be found at http://www.saunalahti.fi/~tspro1/whycare.html . See also http://www.saunalahti.fi/~tspro1/objections.html#tooearly - it's also important to realize that there is a threat.

[-]David_J._Balan16y00

Along the lines of some of the commenters above, it's surely not telling Eliezer anything he doesn't already know to say that there are lots of reasons to be scared that a super-smart AI would start doing things we wouldn't like even without believing that an AI is necessarily a fundamentally malevolent ghost that will wriggle out of whatever restraints we put it in.

[-]thomblake15y50

Like the "program" is giving instructions to a little ghost that sits inside the machine, which will look over your instructions and decide whether it likes them or not.

Fry: If you're programmed to jump off a bridge, would you do it?

Bender: Let me check my program... Yep.

[-]A1987dM11y00

(I am not linking to this, because it is a fearsome time-trap; you can Google if you dare.)

I dared.

I can't stop laughing now.

[-]xSciFix5y30

My intro to programming instructor did a pretty good exercise: he had us pair up, and we'd each write pseudo-code for the other person instructing them on how to make a peanut butter & jelly sandwich, step by step from a certain starting position (walk forward 5 steps, move hand out X inches, grasp jar, twist lid, etc). The person acting out the "code" had to do it exactly as written without making logical leaps (as refereed by the rest of the class) in order to simulate a computer.

Needless to say not a lot of sandwiches got completed. The point was well made though, I think.

[-]EniScien3y10

To be honest, I do not understand at all how people can think that a very smart ghost is sitting in a computer, after all, they are not trying to break the processor in order to release a genie or give verbal commands to a turned off monitor. It's just very obvious to me that there is only a very stupid "ghost" in the computer who can only add bytes and that anything more complicated does not fit into his tiny calculator mind. However, I have two explanations why this is so obvious to me, firstly, my reduced empathy, which makes it easier for me to imagine a mechanism than someone's mind, and secondly, what I started programming at 9 years old is true , I don't remember if I already knew about zeros and ones, transistors, bytes and so on, but I probably knew, because even as a child I did not try to refer to a computer as intelligent, realizing that the only mind in a computer is the algorithm itself , the computer is just a big calculator that will not give the result of calculations until you press the "equal" button, even if it is obvious to you that the expression is complete, the calculator simply does not have the mind to understand this, this is just a mechanism, like a car or abacus , it would be strange to expect that they will start pressing the necessary buttons themselves if you write an explanation on a piece of paper.

[-]Hastings2y10

I tutored college students who were taking a computer programming course. A few of them didn't understand that computers are not sentient. More than one person used comments in their Pascal programs to put detailed explanations such as, "Now I need you to put these letters on the screen." I asked one of them what the deal was with those comments. The reply: "How else is the computer going to understand what I want it to do?" Apparently they would assume that since they couldn't make sense of Pascal, neither could the computer.

There's been a phase change with the release of copilot, where this suddenly appears to work-- at least, for tasks like putting letters on the screen or assembling cookie recipes. "Waiter, there's a ghost in my machine!"

Moderation Log