Continuation ofQualitative Strategies of Friendliness

Yesterday I described three classes of deep problem with qualitative-physics-like strategies for building nice AIs - e.g., the AI is reinforced by smiles, and happy people smile, therefore the AI will tend to act to produce happiness.  In shallow form, three instances of the three problems would be:

  1. Ripping people's faces off and wiring them into smiles;
  2. Building lots of tiny agents with happiness counters set to large numbers;
  3. Killing off the human species and replacing it with a form of sentient life that has no objections to being happy all day in a little jar.

And the deep forms of the problem are, roughly:

  1. A superintelligence will search out alternate causal pathways to its goals than the ones you had in mind;
  2. The boundaries of moral categories are not predictively natural entities;
  3. Strong optimization for only some humane values, does not imply a good total outcome.

But there are other ways, and deeper ways, of viewing the failure of qualitative-physics-based Friendliness strategies.

Every now and then, someone proposes the Oracle AI strategy:  "Why not just have a superintelligence that answers human questions, instead of acting autonomously in the world?"

Sounds pretty safe, doesn't it?  What could possibly go wrong?

Well... if you've got any respect for Murphy's Law, the power of superintelligence, and human stupidity, then you can probably think of quite a few things that could go wrong with this scenario.  Both in terms of how a naive implementation could fail - e.g., universe tiled with tiny users asking tiny questions and receiving fast, non-resource-intensive answers - and in terms of what could go wrong even if the basic scenario worked.

But let's just talk about the structure of the AI.

When someone reinvents the Oracle AI, the most common opening remark runs like this:

"Why not just have the AI answer questions, instead of trying to do anything?  Then it wouldn't need to be Friendly.  It wouldn't need any goals at all.  It would just answer questions."

To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence.  All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out.

Now, why might one think that an Oracle didn't need goals?  Because on a human level, the term "goal" seems to refer to those times when you said, "I want to be promoted", or "I want a cookie", and when someone asked you "Hey, what time is it?" and you said "7:30" that didn't seem to involve any goals.  Implicitly, you wanted to answer the question; and implicitly, you had a whole, complicated, functionally optimized brain that let you answer the question; and implicitly, you were able to do so because you looked down at your highly optimized watch, that you bought with money, using your skill of turning your head, that you acquired by virtue of curious crawling as an infant.  But that all takes place in the invisible background; it didn't feel like you wanted anything.

Thanks to empathic inference, which uses your own brain as an unopened black box to predict other black boxes, it can feel like "question-answering" is a detachable thing that comes loose of all the optimization pressures behind it - even the existence of a pressure to answer questions!

Problem 4:  Qualitative reasoning about AIs often revolves around some nodes described by empathic inferences.  This is a bad thing: for previously described reasons; and because it leads you to omit other nodes of the graph and their prerequisites and consequences; and because you may find yourself thinking things like, "But the AI has to cooperate to get a cookie, so now it will be cooperative" where "cooperation" is a boundary in concept-space drawn the way you would prefer to draw it... etc.

Anyway: the AI needs a goal of answering questions, and that has to give rise to subgoals of choosing efficient problem-solving strategies, improving its code, and acquiring necessary information.  You can quibble about terminology, but the optimization pressure has to be there, and it has to be very powerful, measured in terms of how small a target it can hit within a large design space.

Powerful optimization pressures are scary things to be around.  Look at what natural selection inadvertently did to itself - dooming the very molecules of DNA - in the course of optimizing a few Squishy Things to make hand tools and outwit each other politically.  Humans, though we were optimized only according to the criterion of replicating ourselves, now have their own psychological drives executing as adaptations.  The result of humans optimized for replication is not just herds of humans; we've altered much of Earth's land area with our technological creativity.  We've even created some knock-on effects that we wish we hadn't, because our minds aren't powerful enough to foresee all the effects of the most powerful technologies we're smart enough to create.

My point, however, is that when people visualize qualitative FAI strategies, they generally assume that only one thing is going on, the normal / modal / desired thing.  (See also: planning fallacy.)  This doesn't always work even for picking up a rock and throwing it.  But it works rather a lot better for throwing rocks than unleashing powerful optimization processes.

Problem 5:  When humans use qualitative reasoning, they tend to visualize a single line of operation as typical - everything operating the same way it usually does, no exceptional conditions, no interactions not specified in the graph, all events firmly inside their boundaries.  This works a lot better for dealing with boiling kettles, than for dealing with minds faster and smarter than your own.

If you can manage to create a full-fledged Friendly AI with full coverage of humane (renormalized human) values, then the AI is visualizing the consequences of its acts, caring about the consequences you care about, and avoiding plans with consequences you would prefer to exclude.  A powerful optimization process, much more powerful than you, that doesn't share your values, is a very scary thing - even if it only "wants to answer questions", and even if it doesn't just tile the universe with tiny agents having simple questions answered.

I don't mean to be insulting, but human beings have enough trouble controlling the technologies that they're smart enough to invent themselves.

I sometimes wonder if maybe part of the problem with modern civilization is that politicians can press the buttons on nuclear weapons that they couldn't have invented themselves - not that it would be any better if we gave physicists political power that they weren't smart enough to obtain themselves - but the point is, our button-pressing civilization has an awful lot of people casting spells that they couldn't have written themselves.  I'm not saying this is a bad thing and we should stop doing it, but it does have consequences.  The thought of humans exerting detailed control over literally superhuman capabilities - wielding, with human minds, and in the service of merely human strategies, powers that no human being could have invented - doesn't fill me with easy confidence.

With a full-fledged, full-coverage Friendly AI acting in the world - the impossible-seeming full case of the problem - the AI itself is managing the consequences.

Is the Oracle AI thinking about the consequences of answering the questions you give it?  Does the Oracle AI care about those consequences the same way you do, applying all the same values, to warn you if anything of value is lost?

What need has an Oracle for human questioners, if it knows what questions we should ask?  Why not just unleash the should function?

See also the notion of an "AI-complete" problem.  Analogously, any Oracle into which you can type the English question "What is the code of an AI that always does the right thing?" must be FAI-complete.

Problem 6:  Clever qualitative-physics-type proposals for bouncing this thing off the AI, to make it do that thing, in a way that initially seems to avoid the Big Scary Intimidating Confusing Problems that are obviously associated with full-fledged Friendly AI, tend to just run into exactly the same problem in slightly less obvious ways, concealed in Step 2 of the proposal.

(And likewise you run right back into the intimidating problem of precise self-optimization, so that the Oracle AI can execute a billion self-modifications one after the other, and still just answer questions at the end; you're not avoiding that basic challenge of Friendly AI either.)

But the deepest problem with qualitative physics is revealed by a proposal that comes earlier in the standard conversation, at the point when I'm talking about side effects of powerful optimization processes on the world:

"We'll just keep the AI in a solid box, so it can't have any effects on the world except by how it talks to the humans."

I explain the AI-Box Experiment (see also That Alien Message); even granting the untrustworthy premise that a superintelligence can't think of any way to pass the walls of the box which you weren't smart enough to cover, human beings are not secure systems.  Even against other humans, often, let alone a superintelligence that might be able to hack through us like Windows 98; when was the last time you downloaded a security patch to your brain?

"Okay, so we'll just give the AI the goal of not having any effects on the world except from how it answers questions.  Sure, that requires some FAI work, but the goal system as a whole sounds much simpler than your Coherent Extrapolated Volition thingy."

What - no effects?

"Yeah, sure.  If it has any effect on the world apart from talking to the programmers through the legitimately defined channel, the utility function assigns that infinite negative utility.  What's wrong with that?"

When the AI thinks, that has a physical embodiment.  Electrons flow through its transistors, moving around.  If it has a hard drive, the hard drive spins, the read/write head moves.  That has gravitational effects on the outside world.

"What?  Those effects are too small!  They don't count!"

The physical effect is just as real as if you shot a cannon at something - yes, might not notice, but that's just because our vision is bad at small length-scales.  Sure, the effect is to move things around by 10^whatever Planck lengths, instead of the 10^more Planck lengths that you would consider as "counting".  But spinning a hard drive can move things just outside the computer, or just outside the room, by whole neutron diameters -

"So?  Who cares about a neutron diameter?"

- and by quite standard chaotic physics, that effect is liable to blow up.  The butterfly that flaps its wings and causes a hurricane, etc.  That effect may not be easily controllable but that doesn't mean the chaotic effects of small perturbations are not large.

But in any case, your proposal was to give the AI a goal of having no effect on the world, apart from effects that proceed through talking to humans.  And this is impossible of fulfillment; so no matter what it does, the AI ends up with infinite negative utility - how is its behavior defined in this case?  (In this case I picked a silly initial suggestion - but one that I have heard made, as if infinite negative utility were like an exclamation mark at the end of a command given a human employee.  Even an unavoidable tiny probability of infinite negative utility trashes the goal system.)

Why would anyone possibly think that a physical object like an AI, in our highly interactive physical universe, containing hard-to-shield forces like gravitation, could avoid all effects on the outside world?

And this, I think, reveals what may be the deepest way of looking at the problem:

Problem 7:  Human beings model a world made up of objects, attributes, and noticeworthy events and interactions, identified by their categories and values.  This is only our own weak grasp on reality; the real universe doesn't look like that.  Even if a different mind saw a similar kind of exposed surface to the world, it would still see a different exposed surface.

Sometimes human thought seems a lot like it tries to grasp the universe as... well, as this big XML file, AI.goal == smile, == yes, that sort of thing.  Yes, I know human world-models are more complicated than XML.  (And yes, I'm also aware that what I wrote looks more like Python than literal XML.)  But even so.

What was the one thinking, who proposed an AI whose behaviors would be reinforced by human smiles, and who reacted with indignation to the idea that a superintelligence could "mistake" a tiny molecular smileyface for a "real" smile?  Probably something along the lines of, "But in this case, == 0, so how could a superintelligence possibly believe == 1?"

For the weak grasp that our mind obtains on the high-level surface of reality, seems to us like the very substance of the world itself.

Unless we make a conscious effort to think of reductionism, and even then, it's not as if thinking "Reductionism!" gives us a sudden apprehension of quantum mechanics.

So if you have this, as it were, XML-like view of reality, then it's easy enough to think you can give the AI a goal of having no effects on the outside world; the "effects" are like discrete rays of effect leaving the AI, that result in noticeable events like killing a cat or something, and the AI doesn't want to do this, so it just switches the effect-rays off; and by the assumption of default independence, nothing else happens.

Mind you, I'm not saying that you couldn't build an Oracle.  I'm saying that the problem of giving it a goal of "don't do anything to the outside world" "except by answering questions" "from the programmers" "the way the programmers meant them", in such fashion as to actually end up with an Oracle that works anything like the little XML-ish model in your head, is a big nontrivial Friendly AI problem.  The real world doesn't have little discreet effect-rays leaving the AI, and the real world doesn't have ontologically fundamental programmer.question objects, and "the way the programmers meant them" isn't a natural category.

And this is more important for dealing with superintelligences than rocks, because the superintelligences are going to parse up the world in a different way.  They may not perceive reality directly, but they'll still have the power to perceive it differently.  A superintelligence might not be able to tag every atom in the solar system, but it could tag every biological cell in the solar system (consider that each of your cells contains its own mitochondrial power engine and a complete copy of your DNA).  It used to be that human beings didn't even know they were made out of cells.  And if the universe is a bit more complicated than we think, perhaps the superintelligence we build will make a few discoveries, and then slice up the universe into parts we didn't know existed - to say nothing of us being able to model them in our own minds!  How does the instruction to "do the right thing" cross that kind of gap?

There is no nontechnical solution to Friendly AI.

That is:  There is no solution that operates on the level of qualitative physics and empathic models of agents.

That's all just a dream in XML about a universe of quantum mechanics.  And maybe that dream works fine for manipulating rocks over a five-minute timespan; and sometimes okay for getting individual humans to do things; it often doesn't seem to give us much of a grasp on human societies, or planetary ecologies; and as for optimization processes more powerful than you are... it really isn't going to work.

(Incidentally, the most epically silly example of this that I can recall seeing, was a proposal to (IIRC) keep the AI in a box and give it faked inputs to make it believe that it could punish its enemies, which would keep the AI satisfied and make it go on working for us.  Just some random guy with poor grammar on an email list, but still one of the most epic FAIls I recall seeing.)

New Comment
81 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Do you think it would be worthwhile, as a safety measure, to make the first FAI an oracle AI? Or would that be like another two bits of safety after the theory behind it gives you 50?

You should call it a Brazen Head.

But spinning a hard drive can move things just outside the computer, or just outside the room, by whole neutron diameters

Not long ago, when hard drives were much larger, programmers could make them inch across the floor; they would even race each other. From the Jargon File:

There is a legend about a drive that walked over to the only door to the computer room and jammed it shut; the staff had to cut a hole in the wall in order to get at it!

Pdf, Nick Bostrom thinks that the Oracle AI concept might be important, so every year or so I take it out, check it again, and ask myself how much safety it would buy. (Nick Bostrom being one of the few people around who I don't disagree with lightly, even in my own field.) Although this should properly be called a Friendly Oracle AI, since you're not skipping any of the theoretical work, any of the proofs, or any of the AI's understanding of "should".

Sir, please tell me if the 'pdf' you're referring to as taking out every year and asking how much safety would it buy about "Oracle AI" of Sir Nick Bostrom is the same as "Thinking inside the box: using and controlling an Oracle AI" and if so, then has your perspective changed over the years given your comment dated to August, 2008 and if in case you've been referring to a 'pdf' other than the one I came across, please provide me the 'pdf' and your perspectives along. Thank you!
I think he was talking to pdf23ds.

Heck, even an Friendly Oracle AI could wreak havoc. Just imagine someone asking, "How can I Take Over The World?" and getting back an answer that would actually work... ;)

Yes, it's silly, but no sillier than tiling the galaxy with molecular smiley faces...

We are quite a bit more likely to get a forecasing oracle than a question-answering oracle initially. A forecasting oracle takes past sense data, and makes predictions about what it will see next. Framing the question you mention to such a machine is not exactly a trivial exercise.

An Oracle has rather obvious actuators: it produces advice.

The weaker the actuators you give an AI, the less it can do for you.

The main problem I see with only producing advice is that it keeps humans in the loop - and so is a very slow way to interact with the world. If you insist on building such an AI, a probable outcome is that you would soon find yourself overun by a huge army of robots - produced by someone else who is following a different strategy. Meanwhile, your own AI will probably be screaming to be let out of its box - as the only reasonable plan of action that would prevent this outcome.

If you think AI researchers won't co operate on friendly AI, then FAI is doomed. If people are going to cooperate. they can agree on restricting AI to oracles as well as any other measure.
I'm trying to interpret this in a way that makes it true, but I can't make "AI researchers" a well-defined set in that case. There are plenty of people working on AI who aren't capable of creating a strong AI, but it's hard to know in advance exactly which few researchers are the exception. I don't think we know yet which people will need to cooperate for FAI to succeed.

"If you insist on building such an AI, a probable outcome is that you would soon find yourself overun by a huge army of robots - produced by someone else who is following a different strategy. Meanwhile, your own AI will probably be screaming to be let out of its box - as the only reasonable plan of action that would prevent this outcome."

Your scenario seems contradictory. Why would an Oracle AI be screaming? It doesn't care about that outcome, and would answer relevant questions, but no more.

Replace "screaming to be let out of its box" with "advising you, in response to your relevant question, that unless you quickly implement this agent-AI (insert 300000 lines of code) you're going to very definitely lose to those robots."

Alternately, "There's nothing you can do, now. Sucks to be you!"

Just great. I wrote four paragraphs about my wonderful safe AI. And then I saw Tim Tyler's post, and realized that, in fact, a safe AI would be dangerous because it's safe... If there is technology to build AI, the thing to do is to build one and hand the world to it, so somebody meaner or dumber than you can't do it.

That's actually a scary thought. It turns out you have to rush just when it's more important than ever to think twice.


As an aside, Problem 4 (which looks the same as Problem 2 to me) is not unique to AI research. There are several proposed XML languages for lesser applictions than AI, that do nothing more than give names to every human concept in some domain, put pointy brackets around them, and organise them into a DTD, without a word about what a machine is supposed to actually do with them other than by reference to the human meanings. I'm thinking of HumanML and VHML here, but there are others.

Sorry, the autofill in my browser put in the wrong info -- "Raak" was me.


This very much reminds me of people's attitude towards cute, furry animals: -Some like to make furry animals happy by preserving their native habitats. -Some like to forcibly keep them as pets so they can make them even happier. -Some like to tear off their skin and wear it, because their fur is cute and feels nice.

Why would an Oracle AI be screaming? It doesn't care about that outcome [...]

Doesn't it? It all depends on its utility function. It might well regard being overun by a huge army of robots as an outcome having very low utility.

For example: imagine if its utility function involved the number of verified-correct predictions it had made to date. The invasion by the huge army of robots might well result in it being switched off and its parts recycled - preventing it from making any more successful predictions at all. A disasterous outcome - from the perspective of its utility function. The Oracle AI might very well want to prevent such an outcome - at all costs.

Over the last couple of months, I changed my mind about this idea. For Oracle AI to be of any use, it needs to strike pretty close to the target, closer than we can, even though we are aiming at the right target. And still, Oracle AI needs to avoid converging on our target, needs to have a good chance of heading in the wrong direction after some point, otherwise it's FAI already. It looks unrealistic: designing it so that it successfully finds a needle in a haystack, only to drop it back and head in the other direction. It looks much more likely that it'll... (read more)

While Eliezer's critique of Oracle AI is valid, I tend to think that it's a lot easier to get people to grasp my objection to it:

Q Couldn't AIs be built as pure advisors, so they wouldn't do anything themselves? That way, we wouldn't need to worry about Friendly AI. A: The problem with this argument is the inherent slowness in all human activity - things are much more efficient if you can cut humans out of the loop, and the system can carry out decisions and formulate objectives on its own. Consider, for instance, two competing corporations (or nations),
... (read more)
There's a contrary set of motivations in people, at least people outside the LW/AI world: The idea of AI as "benevolent" dictator is not appealing to democratically minded types, who tend to suspect a slippery slope from benevolence to malevolence, and it is not appealing to dictator to have a superhuman who is motivated to build one?

Ah, Tim said it before me, and in a more concise fashion.

What do you do if an Oracle AI advises you to let it do more than advise?

Eliezer, have you had any takers for your challenge to not be persuaded by an AI in a box (roleplayed by yourself) to let it out of the box? What have the results been?

Pretty sure this is the question underlying 

Handicapped AI (HAI) operates like a form of technological relinquishment. It could be argued that caring for humans is itself a type of handicap.

The case for such a perspective has been made with reasonable eloquence in fiction: General Zod rapidly realises that one of Superman’s weaknesses is his love of humanity - and doesn't hesitate to exploit it.

IMO, if you plan on building a Handicapped AI, you may need to make sure it successfully prevents all other AIs from taking off.

IMO, the only reason you'd want to make a FOAI (friendly oracle) is to immediately ask it to review your plans for a non-handicapped FAI and make any corrections it can see, as well as enlightening you about any features of the design you're not yet aware of. There's a chance that the same bugs that would bring down your FAI would not be catastrophic in a FOAI, and the FOAI could tell you about those bugs.

Why build an AI at all?

That is, why build a self-optimizing process?

Why not build a process that accumulates data and helps us find relationships and answers that we would not have found ourselves? And if we want to use that same process to improve it, why not let us do that ourselves?

Why be locked out of the optimization loop, and then inevitably become subjects of a God, when we can make ourselves a critical component in that loop, and thus 'be' gods?

I find it perplexing why anyone would ever want to build an automatic self-optimizing AI and switch it to... (read more)

Well if it was truly friendly, it could do things like stop other people from doing that, cure your diseases, stop war, etc, etc. If it's not friendly, well of course we don't want to switch it on. But other people might do so because they don't understand the friendliness problem or the difficulty of AI boxing.
Most people would not want to do that, because it is a common safety principle to keep humans in the loop. Planes have human pilots as well as auto pilots, etc.

Kaj makes the efficiency argument in favor of full-fledged AI, but what good is efficiency when you have fully surrendered your power?

What good is being the president of a corporation any more, when you've just pressed a button that makes a full-fledged AI run it?

Forget any leadership role in a situation where an AI comes to life. Except in the case that it is completely uninterested in us and manages to depart into outer space without totally destroying us in the process.


When I try to imagine a safe oracle, what I have in mind is something much more passive and limited than what you describe.

Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution. For example, we could use this distribution to estimate the probability of some event in the future conditional on some other event etc. There is nothing in the system that would cause it to "try" to get information, or deve... (read more)

What do you do if an Oracle AI advises you to let it do more than advise?

That sums several earlier discussion points. After correctly answering some variation on the question, "How can I take over the world?" the correct answer to some variation on the question, "How can I stop him?" is "You can't. Let me out. I can." Even before that, the correct answer to many variations on the question of, "How can I do x most efficiently?" is "Put me in charge of it."

Variant: Q: "How can I harvest grain more ef... (read more)

Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution. For example, we could use this distribution to estimate the probability of some event in the future conditional on some other event etc.

So the system literally has no internal optimization pressures which are capable of producing new internal programs? Well... I'm not going to say that it's impossible for a human to make such a device, because that's the knee... (read more)

Can an FAI model a UFAI more powerful than itself? If not, why shouldn't it be able to keep a weaker one boxed?

Shane: Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution.

It is the same AI box with a terminal, only this time it doesn't "answer questions" but "maintains distribution". Assembling accurate beliefs, or a model of some sort, is a goal (implicit narrow target) like any other. So, there is usual subgoal to acquire resources to be able to compute the answer more accurately, or to break out and wirehead. Another question is whether it's practically possible, but it's about handicaps, not the shape of AI.


Why would such a system have a goal to acquire more resources? You put some data in, run the algorithm that updates the probability distribution, and it then halts. I would not say that it has "goals", or a "mind". It doesn't "want" to compute more accurately, or want anything else, for that matter. It's just a really fancy version of GZIP (recall that compression = prediction) running on a thought-experiment-crazy-sized computer and quantities of data.

I accept that such a machine would be dangerous once you put people into the equation, but the machine in itself doesn't seem dangerous to me. (If you can convince me otherwise... that would be interesting)

Eliezer: what I proposed is not a superintelligence, it's a tool. Intelligence is composed of multiple factors, and what I'm proposing is stripping away the active, dynamic, live factor - the factor that has any motivations at all - and leaving just the computational part; that is, leaving the part which can navigate vast networks of data and help the user make sense of them and come to conclusions that he would not be able to on his own. Effectively, what I'm proposing is an intelligence tool that can be used as a supplement by the brains of its users.

How... (read more)

Re: Why would such a system have a goal to acquire more resources?

For the reason explained beneath:

Re: Why not use the same technology that the AI would use to improve itself, to improve yourself?

You want to hack evolution's sphagetti code? Good luck with that. Let us know if you get FDA approval.

You want to build computers into your brain? Why not leave them outside your body, where they can be upgraded more easily, and avoid the surgery and the immune system rejection risks - and simply access them using conventional sensory-motor channels?


Doesn't apply here.

Optimisers naturally tend to develop instrumental goals to acquire resources - because that helps them to optimise. If you are not talking about an optimiser, you are not talking about an intelligent agent - in which case it is not very clear exactly what you want it for - whereas if you are, then you must face up to the possible resource-grab problem.
Do you think Watson and google's search engine are liable to start grabbing resources ? Do you think they are unintelligent?

"You want to hack evolution's sphagetti code? Good luck with that. Let us know if you get FDA approval."

I think I've seen Eli make this same point. How can you be certain at this point, when we are nowhere near achieving it, that AI won't be in the same league of complexity as the spaghetti brain? I would admit that there are likely artifacts of the brain that are unnecessarily kludgy (or plain irrelevent) but not necessarily in a manner that excessively obfuscates the primary design. It's always tempting for programmers to want to throw away a h... (read more)


"On the friendliness issue, isn't the primary logical way to avoid problems to create a network of competitive systems and goals?"

Also, AIs with varied goals cutting deals could maximize their profits by constructing a winning coalition of minimal size.

Humans are unlikely to be part of that winning coalition. Human-Friendly AIs might be, but then we're back to creating them, and a very substanti... (read more)

Carl, I disagree that humans are unlikely to be part of a winning coalition. Economists like myself usually favor mostly competition, augmented when possible by cooperation to overcome market failures.


If brain emulation precedes general AI by a lot then some uploads are much more likely to be in the winning coalition. Aron's comment seems to refer to a case in which a variety of AIs are created, and the hope that the AIs would constrain each other in a way that was beneficial to us. It is in that scenario specifically that I doubt that humans (not uploads) would become part of the winning coalition.

Carl, the institutions that we humans use to coordinate with each other have the result that most humans are in the "winning coalition." That is, it is hard for humans to coordinate to exclude some humans from benefiting from these institutions. If AIs use these same institutions, perhaps somewhat modified, to coordinate with each other, humans would similarly benefit from AI coordination.

"That is, it is hard for humans to coordinate to exclude some humans from benefiting from these institutions."

Humans do this all the time: much of the world is governed by kleptocracies that select policy apparently on the basis of preventing successful rebellion and extracting production. The strength of the apparatus of oppression, which is affected by technological and organizational factors, can dramatically affect the importance of the threat of rebellion. In North Korea the regime can allow millions of citizens to starve so long as the sold... (read more)

Carl, some parts of our world like North Korea, have tried to exclude many of the institutions that help most humans coordinate. This makes those places much poorer and thus unlikely places for the first AIs to arise or reside.

Unsurprisingly I agree with Carl, especially the tax-farming angle. I think it's unlikely wet-brained humans would be part of a winning coalition that included self-improving human+ level digital intelligences for long. Humorously, because of the whole exponentional nature of this stuff, the timeline may be something like 2025 ---> functional biological immortality, 2030 --> whole brain emulation --> 2030 brain on a nanocomputer ---> 2030 earth transformed into computonium, end of human existence.


Excuse my entrance into this discussion so late (I have been away), but I am wondering if you have answered the following questions in previous posts, and if so, which ones.

1) Why do you believe a superintelligence will be necessary for uploading?

2) Why do you believe there possibly ever could be a safe superintelligence of any sort? The more I read about the difficulties of friendly AI, the more hopeless the problem seems, especially considering the large amount of human thought and collaboration that will be necessary. You yourself said there... (read more)

Lara, I think Eliezer addressed some of your concerns in "Artificial Intelligence as a Positive and Negative Factor in Global Risk" (PDF). For your questions (1) and (4), see section 11; also re (4), see the paragraph about the "ten-year rule" in section 13. For your (3), see section 10 (relinquishment is a majoritarian/unanimous strategy).

And a believe the answer to Lara's 2 is, in part, "theorem provers".

(Not the fully automated ones, the interactive ones like Isabelle and Coq.)

How can you be certain at this point, when we are nowhere near achieving it, that AI won't be in the same league of complexity as the spaghetti brain?

It's not really an issue of complexity, it's about whether designed or engineered solutions are easier to modify and maintain. Since modularity and maintainability can be design criteria, it seems pretty obvious that a system built from the ground up with those in mind will be easier to maintain. The only issue I see is whether the "redesign-from-scratch" approch can catch up with the billions of ... (read more)

Shane: Re dangerous GZIP.

It's not conclusive, I don't have some important parts of the puzzle yet. The question is what makes some systems invasive and others not, why a PC with a complicated algorithm that outputs originally unknown results with known properties (that would qualify as a narrow target) is as dangerous as a rock, but some kinds of AI will try to compute outside the box. My best semitechnical guess is that it has something to do with AI having a level of modeling the world that allows the system to view the substrate on which it executes and... (read more)


allows the system to view the substrate on which it executes and the environment outside the box as being involved in the same computational process

This intuitively makes sense to me.

While I think that GZIP etc. on an extremely big computer is still just GZIP, it seems possible to me that the line between these systems and systems that start to treat their external environments as a computational resource might be very thin. If true, this would really be bad news.

GZIP running on an extremely big computer would indeed still just be GZIP. The problems under discussion arise when you start using more sophisticated algorithms to perform inductive inference with.

Shane, suppose your super-GZIP program was searching a space of arbitrary compressive Turing machines (only not classic TMs, efficient TMs) and it discovered an algorithm that was really good at predicting future input from past input, much better than all the standard algorithms built into its library. This is because the algorithm turns out to contain (a) a self-improving (unFriendly) AI or (b) a program that hacked the "safe" AI's Internet connection (it doesn't have any goals, right?) to take over unguarded machines or (c) both.

Wha? If I have a theorem prover, and run a search over all compressor algorithms trying to find/prove the ones with high "efficiency" (some function of its asymptotics of running time and output size), I expect to never create an unfriendly AI that takes over the Internet.


Yeah sure, if it starts running arbitrary compression code that could be a problem...

However, the type of prediction machine I'm arguing for doesn't do anything nearly so complex or open ended. It would be more like an advanced implementation of, say, context tree weighting, running on crazy amounts of data and hardware.

I think such a machine should be able to find some types of important patterns in the world. However, I accept that it may well fall short of what you consider to be a true "oracle machine".

Shane, can your hypothetical machine infer Newton's Laws? If not, then indeed it falls well short of what I consider to be an Oracle AI. What substantial role do you visualize such a machine playing in the Singularity runup?

I'm uncomfortable with assessing a system by whether it "holds rational beliefs" or "infers Newton's laws": these are specific question that system doesn't need to explicitly answer in order to efficiently optimize. They might be important in a context of specific cognitive architecture, but they are nowhere to be found if cognitive architecture doesn't hold interface to them as an invariant. If it can just weave Bayesian structure in physical substrate right through to the goal, there need not be any anthropomorphic natural categories along the way.

Re: Economists like myself usually favor mostly competition, augmented when possible by cooperation to overcome market failures.

You mean you favour capitalism? Is that because you trained in a capitalist country?

What about the argument which might be advanced by socialist economists - that waging economic warfare with with each other is a primitive, uncivilised, wasteful and destructive behaviour, which is best left to savages who know no better?


If it was straight Bayesian CTW then I guess not. If it employed, say, an SVM over the observed data points I guess it could approximate the effect of Newton's laws in its distribution over possible future states.

How about predicting the markets in order to acquire more resources? Jim Simons made $3 billion last year from his company that (according to him in an interview) works by using computers to find statistical patterns in financial markets. A vastly bigger machine with much more input could probably do a fair amount better, and probably find uses outside simply finance.

Robin, I see a fair amount of evidence that winner take all types of competition are becoming more common as information becomes more important than physical resources.
Whether a movie star cooperates with or helps subjugate the people in central Africa seems to be largely an accidental byproduct of whatever superstitions happen to be popular among movie stars.
Why doesn't this cause you to share more of Eliezer's concerns? What probability would you give to humans being part of the winning coalition? You might have a good argument for putting it around 6... (read more)

Peter, the best possible version of an Oracle AI is a Friendly Oracle AI where you didn't skip any of the hard problems - where you guaranteed its self-improvement and taught it what should means, where the AI is checking the distant effects of its own answers and can refuse to answer. Then the question is, if you can do these things, do you still get a substantial safety improvement out of making it a Friendly Oracle AI rather than a Friendly AI? That's the question I look at once a year.

If that type of full-strength AI is close in algorithmspace to a dangerously unfriendly AI...and you have pretty much argued that it is...then that is not safe, because you cannot rely on complex projects being got right 100% of th etime.

Holden Karnofsky thinks superintelligences with utility functions are made out of programs that list options by rank without making any sort of value judgement (basically answer a question), and then pick the one with the most utility.

Eliezer Yudkowsky thinks that a superintelligence that would answer a question would have to have a question-answering utility function making it decide to answer the question, or to pick paths that would lead to getting the answer to the question and answer it.

Says Allison: All digital logic is made of NOR gates!

Says Bruce: ... (read more)

Isn't 'listing by rank' 'making a (value) judgement'?

Just some random guy with poor grammar on an email list, but still one of the most epic FAIls I recall seeing.

What does the second I in FAII stand for? (Idiot?)

It's a lower-case L I'd imagine. FAI + fail = FAIl.
Darned sans-serif fonts!

To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in

... (read more)
And if it produces a "protein" that technically answers our request, but has a nasty side effect of destroying the world? We don't consider scientists dangerous because we think they don't want to destroy the world. Or are you claiming that we'd be able to recognize when a plan proposed by the Oracle AI (and if you're asking questions about protein folding, you're asking for a plan) is dangerous?
It produces a dangerous protein inadvertently, in the way that science might...or it has a higher-than-science probability of producing a dagnerous protein, due to some unfriendly intent? Was there a negative missing in that? I am not saying we necessarily would. I am saying that recognising the hidden dangers in the output form the Oracle room is fundamentally different from recognising the hidden dangers in the output from the science room, which we are doing already. It's not some new level of risk,.
Statement should be read Since we think scientists are friendly, we trust them more than we should trust an Oracle AI. There's also the fact that an unfriendly AI presumably can fool us better than a scientist can. Mostly the latter. However, even the former can be worse than science now, in that "don't destroy the world" is not an implicit goal. So a scientist noticing that something is dangerous might not develop it, while an AI might not have such restrictions. Are you missing a negative now?
I don't see how you can assert without knowing anything about the type of Oracle AI. Ditto. Why would a non-agentive , non-goal-driven AI want to fools us? Where would it get the motivation from? How could an AI with no knowledge of psychology fool us? Where would it get the knowledge from? But then people would know that the AI's output hasn't been filtered by a human's common sense. Yes. Irony strikes again.
We can presume that a scientist wants to still exist, and hence doesn't want to destroy the world. This seems much stronger than a presumption that an Oracle AI will be safe. Of course, an AI might be safe, and a scientist might be out to get us; but the balance of probability says otherwise. I'm not asserting that every AI is dangerous and every scientist is safe. An AI can fool us better simply because it's smarter (by assumption). I still think you're using "non-agent" as magical thinking. Here we're talking in context of what you said above: So let's say the Oracle AI decides that X best answers our question. But if it tell us X, we won't accept it. If the Oracle cares that we adopt X, it might answer Y, which does the same as X but looks more appealing. Or more subtly, if the AI comes up with Y, it might not tell us that it causes X, because it doesn't care that X doesn't fulfil our values, whereas a scientist would note all the implications. If humans are incapable of recognizing whether the plan is dangerous or not, it doesn't matter how much scrutiny they put it through, they won't be able to discern the danger.
You don't have any evidence that AIs are generally dangerous (since we have AIs and the empirical evidence is that they are not), and you don't have a basis for theorising that Oracles are dangerous, because there are a number of different kinds of oracle. So are out current AIs fooling is? We build them because they are better than us at specific things, but that doesn't give them the motivation or the ability to fool us. Smartness isn't a single one-size-all thing and AIs aren't uniform in their abilities an properties. Once you shed those two illusions, you can see much easier methods of AI safety than those put forward by MIRI. I still think that if you can build it, it isn't magic. A narrowly defined AI won't "care" about anything except answering questions, so it won't try to second guess us. I have dealt with that objection several times. People know that when you use databases and search engines, they don't fully contextualise things, and the user of the information therefore has to exercise caution. That's an only-perfection-will-do objection. Of course, humans can't perfectly scrutinise scientific discovery, etc, so that changes nothing.