Dreams of Friendliness

[-]pdf23ds17y70

Do you think it would be worthwhile, as a safety measure, to make the first FAI an oracle AI? Or would that be like another two bits of safety after the theory behind it gives you 50?

[-]steven17y00

You should call it a Brazen Head.

[-]Aaron317y30

But spinning a hard drive can move things just outside the computer, or just outside the room, by whole neutron diameters

Not long ago, when hard drives were much larger, programmers could make them inch across the floor; they would even race each other. From the Jargon File:

There is a legend about a drive that walked over to the only door to the computer room and jammed it shut; the staff had to cut a hole in the wall in order to get at it!

[-]Eliezer Yudkowsky17y60

Pdf, Nick Bostrom thinks that the Oracle AI concept might be important, so every year or so I take it out, check it again, and ask myself how much safety it would buy. (Nick Bostrom being one of the few people around who I don't disagree with lightly, even in my own field.) Although this should properly be called a Friendly Oracle AI, since you're not skipping any of the theoretical work, any of the proofs, or any of the AI's understanding of "should".

0thetasafe9y

Sir, please tell me if the 'pdf' you're referring to as taking out every year and asking how much safety would it buy about "Oracle AI" of Sir Nick Bostrom is the same as "Thinking inside the box: using and controlling an Oracle AI" and if so, then has your perspective changed over the years given your comment dated to August, 2008 and if in case you've been referring to a 'pdf' other than the one I came across, please provide me the 'pdf' and your perspectives along. Thank you!

0DragonGod8y

I think he was talking to pdf23ds.

[-]Doug_S.17y30

Heck, even an Friendly Oracle AI could wreak havoc. Just imagine someone asking, "How can I Take Over The World?" and getting back an answer that would actually work... ;)

Yes, it's silly, but no sillier than tiling the galaxy with molecular smiley faces...

2timtyler15y

We are quite a bit more likely to get a forecasing oracle than a question-answering oracle initially. A forecasting oracle takes past sense data, and makes predictions about what it will see next. Framing the question you mention to such a machine is not exactly a trivial exercise.

[-]Tim_Tyler17y20

An Oracle has rather obvious actuators: it produces advice.

The weaker the actuators you give an AI, the less it can do for you.

The main problem I see with only producing advice is that it keeps humans in the loop - and so is a very slow way to interact with the world. If you insist on building such an AI, a probable outcome is that you would soon find yourself overun by a huge army of robots - produced by someone else who is following a different strategy. Meanwhile, your own AI will probably be screaming to be let out of its box - as the only reasonable plan of action that would prevent this outcome.

-3TheAncientGeek10y

If you think AI researchers won't co operate on friendly AI, then FAI is doomed. If people are going to cooperate. they can agree on restricting AI to oracles as well as any other measure.

0Brilliand10y

I'm trying to interpret this in a way that makes it true, but I can't make "AI researchers" a well-defined set in that case. There are plenty of people working on AI who aren't capable of creating a strong AI, but it's hard to know in advance exactly which few researchers are the exception. I don't think we know yet which people will need to cooperate for FAI to succeed.

[-]Carl_Shulman217y10

"If you insist on building such an AI, a probable outcome is that you would soon find yourself overun by a huge army of robots - produced by someone else who is following a different strategy. Meanwhile, your own AI will probably be screaming to be let out of its box - as the only reasonable plan of action that would prevent this outcome."

Your scenario seems contradictory. Why would an Oracle AI be screaming? It doesn't care about that outcome, and would answer relevant questions, but no more.

[-]MarkusRamikin14y100

Replace "screaming to be let out of its box" with "advising you, in response to your relevant question, that unless you quickly implement this agent-AI (insert 300000 lines of code) you're going to very definitely lose to those robots."

2Luke_A_Somers13y

Alternately, "There's nothing you can do, now. Sucks to be you!"

[-]Tiiba217y80

Just great. I wrote four paragraphs about my wonderful safe AI. And then I saw Tim Tyler's post, and realized that, in fact, a safe AI would be dangerous because it's safe... If there is technology to build AI, the thing to do is to build one and hand the world to it, so somebody meaner or dumber than you can't do it.

That's actually a scary thought. It turns out you have to rush just when it's more important than ever to think twice.

[-]Raak17y00

As an aside, Problem 4 (which looks the same as Problem 2 to me) is not unique to AI research. There are several proposed XML languages for lesser applictions than AI, that do nothing more than give names to every human concept in some domain, put pointy brackets around them, and organise them into a DTD, without a word about what a machine is supposed to actually do with them other than by reference to the human meanings. I'm thinking of HumanML and VHML here, but there are others.

[-]Richard_Kennaway17y00

Sorry, the autofill in my browser put in the wrong info -- "Raak" was me.

[-]Grant17y100

This very much reminds me of people's attitude towards cute, furry animals: -Some like to make furry animals happy by preserving their native habitats. -Some like to forcibly keep them as pets so they can make them even happier. -Some like to tear off their skin and wear it, because their fur is cute and feels nice.

[-]Tim_Tyler17y00

Why would an Oracle AI be screaming? It doesn't care about that outcome [...]

Doesn't it? It all depends on its utility function. It might well regard being overun by a huge army of robots as an outcome having very low utility.

For example: imagine if its utility function involved the number of verified-correct predictions it had made to date. The invasion by the huge army of robots might well result in it being switched off and its parts recycled - preventing it from making any more successful predictions at all. A disasterous outcome - from the perspective of its utility function. The Oracle AI might very well want to prevent such an outcome - at all costs.

[-]Vladimir_Nesov17y20

Over the last couple of months, I changed my mind about this idea. For Oracle AI to be of any use, it needs to strike pretty close to the target, closer than we can, even though we are aiming at the right target. And still, Oracle AI needs to avoid converging on our target, needs to have a good chance of heading in the wrong direction after some point, otherwise it's FAI already. It looks unrealistic: designing it so that it successfully finds a needle in a haystack, only to drop it back and head in the other direction. It looks much more likely that it'll... (read more)

[-]Kaj_Sotala17y20

While Eliezer's critique of Oracle AI is valid, I tend to think that it's a lot easier to get people to grasp my objection to it:

Q Couldn't AIs be built as pure advisors, so they wouldn't do anything themselves? That way, we wouldn't need to worry about Friendly AI. A: The problem with this argument is the inherent slowness in all human activity - things are much more efficient if you can cut humans out of the loop, and the system can carry out decisions and formulate objectives on its own. Consider, for instance, two competing corporations (or nations),

... (read more)

0TheAncientGeek10y

There's a contrary set of motivations in people, at least people outside the LW/AI world: The idea of AI as "benevolent" dictator is not appealing to democratically minded types, who tend to suspect a slippery slope from benevolence to malevolence, and it is not appealing to dictator to have a superhuman rival...so who is motivated to build one?

[-]Kaj_Sotala17y00

Ah, Tim said it before me, and in a more concise fashion.

[-]Richard_Kennaway17y00

What do you do if an Oracle AI advises you to let it do more than advise?

Eliezer, have you had any takers for your challenge to not be persuaded by an AI in a box (roleplayed by yourself) to let it out of the box? What have the results been?

0themusicgod14y

Pretty sure this is the question underlying https://www.overcomingbias.com/2007/01/disagree_with_s.html

[-]Tim_Tyler17y00

Handicapped AI (HAI) operates like a form of technological relinquishment. It could be argued that caring for humans is itself a type of handicap.

The case for such a perspective has been made with reasonable eloquence in fiction: General Zod rapidly realises that one of Superman’s weaknesses is his love of humanity - and doesn't hesitate to exploit it.

IMO, if you plan on building a Handicapped AI, you may need to make sure it successfully prevents all other AIs from taking off.

[-]pdf23ds17y00

IMO, the only reason you'd want to make a FOAI (friendly oracle) is to immediately ask it to review your plans for a non-handicapped FAI and make any corrections it can see, as well as enlightening you about any features of the design you're not yet aware of. There's a chance that the same bugs that would bring down your FAI would not be catastrophic in a FOAI, and the FOAI could tell you about those bugs.

[-]denis_bider17y40

Why build an AI at all?

That is, why build a self-optimizing process?

Why not build a process that accumulates data and helps us find relationships and answers that we would not have found ourselves? And if we want to use that same process to improve it, why not let us do that ourselves?

Why be locked out of the optimization loop, and then inevitably become subjects of a God, when we can make ourselves a critical component in that loop, and thus 'be' gods?

I find it perplexing why anyone would ever want to build an automatic self-optimizing AI and switch it to... (read more)

1Houshalter12y

Well if it was truly friendly, it could do things like stop other people from doing that, cure your diseases, stop war, etc, etc. If it's not friendly, well of course we don't want to switch it on. But other people might do so because they don't understand the friendliness problem or the difficulty of AI boxing.

0TheAncientGeek10y

Most people would not want to do that, because it is a common safety principle to keep humans in the loop. Planes have human pilots as well as auto pilots, etc.

[-]denis_bider17y00

Kaj makes the efficiency argument in favor of full-fledged AI, but what good is efficiency when you have fully surrendered your power?

What good is being the president of a corporation any more, when you've just pressed a button that makes a full-fledged AI run it?

Forget any leadership role in a situation where an AI comes to life. Except in the case that it is completely uninterested in us and manages to depart into outer space without totally destroying us in the process.

[-]Shane_Legg17y80

Eli:

When I try to imagine a safe oracle, what I have in mind is something much more passive and limited than what you describe.

Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution. For example, we could use this distribution to estimate the probability of some event in the future conditional on some other event etc. There is nothing in the system that would cause it to "try" to get information, or deve... (read more)

[-]Zubon17y70

What do you do if an Oracle AI advises you to let it do more than advise?

That sums several earlier discussion points. After correctly answering some variation on the question, "How can I take over the world?" the correct answer to some variation on the question, "How can I stop him?" is "You can't. Let me out. I can." Even before that, the correct answer to many variations on the question of, "How can I do x most efficiently?" is "Put me in charge of it."

Variant: Q: "How can I harvest grain more ef... (read more)

[-]Eliezer Yudkowsky17y40

Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution. For example, we could use this distribution to estimate the probability of some event in the future conditional on some other event etc.

So the system literally has no internal optimization pressures which are capable of producing new internal programs? Well... I'm not going to say that it's impossible for a human to make such a device, because that's the knee... (read more)

-1TheAncientGeek10y

Can an FAI model a UFAI more powerful than itself? If not, why shouldn't it be able to keep a weaker one boxed?

[-]Vladimir_Nesov17y10

Shane: Consider a system that simply accepts input information and integrates it into a huge probability distribution that it maintains. We can then query the oracle by simply examining this distribution.

It is the same AI box with a terminal, only this time it doesn't "answer questions" but "maintains distribution". Assembling accurate beliefs, or a model of some sort, is a goal (implicit narrow target) like any other. So, there is usual subgoal to acquire resources to be able to compute the answer more accurately, or to break out and wirehead. Another question is whether it's practically possible, but it's about handicaps, not the shape of AI.

[-]Shane_Legg17y10

Vladimir:

Why would such a system have a goal to acquire more resources? You put some data in, run the algorithm that updates the probability distribution, and it then halts. I would not say that it has "goals", or a "mind". It doesn't "want" to compute more accurately, or want anything else, for that matter. It's just a really fancy version of GZIP (recall that compression = prediction) running on a thought-experiment-crazy-sized computer and quantities of data.

I accept that such a machine would be dangerous once you put people into the equation, but the machine in itself doesn't seem dangerous to me. (If you can convince me otherwise... that would be interesting)

[-]denis_bider17y00

Eliezer: what I proposed is not a superintelligence, it's a tool. Intelligence is composed of multiple factors, and what I'm proposing is stripping away the active, dynamic, live factor - the factor that has any motivations at all - and leaving just the computational part; that is, leaving the part which can navigate vast networks of data and help the user make sense of them and come to conclusions that he would not be able to on his own. Effectively, what I'm proposing is an intelligence tool that can be used as a supplement by the brains of its users.

How... (read more)

[-]Tim_Tyler17y00

Re: Why would such a system have a goal to acquire more resources?

For the reason explained beneath: http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/

[-]Tim_Tyler17y10

Re: Why not use the same technology that the AI would use to improve itself, to improve yourself?

You want to hack evolution's sphagetti code? Good luck with that. Let us know if you get FDA approval.

You want to build computers into your brain? Why not leave them outside your body, where they can be upgraded more easily, and avoid the surgery and the immune system rejection risks - and simply access them using conventional sensory-motor channels?

[-]Shane_Legg17y00

Tim:

Doesn't apply here.

-2timtyler15y

Optimisers naturally tend to develop instrumental goals to acquire resources - because that helps them to optimise. If you are not talking about an optimiser, you are not talking about an intelligent agent - in which case it is not very clear exactly what you want it for - whereas if you are, then you must face up to the possible resource-grab problem.

1TheAncientGeek10y

Do you think Watson and google's search engine are liable to start grabbing resources ? Do you think they are unintelligent?

[-]Aron17y20

"You want to hack evolution's sphagetti code? Good luck with that. Let us know if you get FDA approval."

I think I've seen Eli make this same point. How can you be certain at this point, when we are nowhere near achieving it, that AI won't be in the same league of complexity as the spaghetti brain? I would admit that there are likely artifacts of the brain that are unnecessarily kludgy (or plain irrelevent) but not necessarily in a manner that excessively obfuscates the primary design. It's always tempting for programmers to want to throw away a h... (read more)

[-]Carl_Shulman217y20

Aron,

"On the friendliness issue, isn't the primary logical way to avoid problems to create a network of competitive systems and goals?"

http://www.nickbostrom.com/fut/evolution.html http://hanson.gmu.edu/filluniv.pdf

Also, AIs with varied goals cutting deals could maximize their profits by constructing a winning coalition of minimal size.

http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=9962

Humans are unlikely to be part of that winning coalition. Human-Friendly AIs might be, but then we're back to creating them, and a very substanti... (read more)

[-]RobinHanson17y00

Carl, I disagree that humans are unlikely to be part of a winning coalition. Economists like myself usually favor mostly competition, augmented when possible by cooperation to overcome market failures.

[-]Carl_Shulman217y00

Robin,

If brain emulation precedes general AI by a lot then some uploads are much more likely to be in the winning coalition. Aron's comment seems to refer to a case in which a variety of AIs are created, and the hope that the AIs would constrain each other in a way that was beneficial to us. It is in that scenario specifically that I doubt that humans (not uploads) would become part of the winning coalition.

[-]RobinHanson17y10

Carl, the institutions that we humans use to coordinate with each other have the result that most humans are in the "winning coalition." That is, it is hard for humans to coordinate to exclude some humans from benefiting from these institutions. If AIs use these same institutions, perhaps somewhat modified, to coordinate with each other, humans would similarly benefit from AI coordination.

[-]Carl_Shulman217y20

"That is, it is hard for humans to coordinate to exclude some humans from benefiting from these institutions."

Humans do this all the time: much of the world is governed by kleptocracies that select policy apparently on the basis of preventing successful rebellion and extracting production. The strength of the apparatus of oppression, which is affected by technological and organizational factors, can dramatically affect the importance of the threat of rebellion. In North Korea the regime can allow millions of citizens to starve so long as the sold... (read more)

[-]RobinHanson17y00

Carl, some parts of our world like North Korea, have tried to exclude many of the institutions that help most humans coordinate. This makes those places much poorer and thus unlikely places for the first AIs to arise or reside.

[-]Hopefully_Anonymous17y10

Unsurprisingly I agree with Carl, especially the tax-farming angle. I think it's unlikely wet-brained humans would be part of a winning coalition that included self-improving human+ level digital intelligences for long. Humorously, because of the whole exponentional nature of this stuff, the timeline may be something like 2025 ---> functional biological immortality, 2030 --> whole brain emulation --> 2030 brain on a nanocomputer ---> 2030 earth transformed into computonium, end of human existence.

[-]Laura B17y10

Eliezer,

Excuse my entrance into this discussion so late (I have been away), but I am wondering if you have answered the following questions in previous posts, and if so, which ones.

1) Why do you believe a superintelligence will be necessary for uploading?

2) Why do you believe there possibly ever could be a safe superintelligence of any sort? The more I read about the difficulties of friendly AI, the more hopeless the problem seems, especially considering the large amount of human thought and collaboration that will be necessary. You yourself said there... (read more)

[-]Z._M._Davis17y10

Lara, I think Eliezer addressed some of your concerns in "Artificial Intelligence as a Positive and Negative Factor in Global Risk" (PDF). For your questions (1) and (4), see section 11; also re (4), see the paragraph about the "ten-year rule" in section 13. For your (3), see section 10 (relinquishment is a majoritarian/unanimous strategy).

[-]pdf23ds17y00

And a believe the answer to Lara's 2 is, in part, "theorem provers".

[-]pdf23ds17y00

(Not the fully automated ones, the interactive ones like Isabelle and Coq.)

[-]Tim_Tyler17y00

How can you be certain at this point, when we are nowhere near achieving it, that AI won't be in the same league of complexity as the spaghetti brain?

It's not really an issue of complexity, it's about whether designed or engineered solutions are easier to modify and maintain. Since modularity and maintainability can be design criteria, it seems pretty obvious that a system built from the ground up with those in mind will be easier to maintain. The only issue I see is whether the "redesign-from-scratch" approch can catch up with the billions of ... (read more)

[-]Vladimir_Nesov17y10

Shane: Re dangerous GZIP.

It's not conclusive, I don't have some important parts of the puzzle yet. The question is what makes some systems invasive and others not, why a PC with a complicated algorithm that outputs originally unknown results with known properties (that would qualify as a narrow target) is as dangerous as a rock, but some kinds of AI will try to compute outside the box. My best semitechnical guess is that it has something to do with AI having a level of modeling the world that allows the system to view the substrate on which it executes and... (read more)

[-]Shane_Legg17y10

Vladimir:

allows the system to view the substrate on which it executes and the environment outside the box as being involved in the same computational process

This intuitively makes sense to me.

While I think that GZIP etc. on an extremely big computer is still just GZIP, it seems possible to me that the line between these systems and systems that start to treat their external environments as a computational resource might be very thin. If true, this would really be bad news.

0timtyler15y

GZIP running on an extremely big computer would indeed still just be GZIP. The problems under discussion arise when you start using more sophisticated algorithms to perform inductive inference with.

[-]Eliezer Yudkowsky17y30

Shane, suppose your super-GZIP program was searching a space of arbitrary compressive Turing machines (only not classic TMs, efficient TMs) and it discovered an algorithm that was really good at predicting future input from past input, much better than all the standard algorithms built into its library. This is because the algorithm turns out to contain (a) a self-improving (unFriendly) AI or (b) a program that hacked the "safe" AI's Internet connection (it doesn't have any goals, right?) to take over unguarded machines or (c) both.

6cousin_it15y

Wha? If I have a theorem prover, and run a search over all compressor algorithms trying to find/prove the ones with high "efficiency" (some function of its asymptotics of running time and output size), I expect to never create an unfriendly AI that takes over the Internet.

[-]Shane_Legg17y20

Eli,

Yeah sure, if it starts running arbitrary compression code that could be a problem...

However, the type of prediction machine I'm arguing for doesn't do anything nearly so complex or open ended. It would be more like an advanced implementation of, say, context tree weighting, running on crazy amounts of data and hardware.

I think such a machine should be able to find some types of important patterns in the world. However, I accept that it may well fall short of what you consider to be a true "oracle machine".

[-]Eliezer Yudkowsky17y00

Shane, can your hypothetical machine infer Newton's Laws? If not, then indeed it falls well short of what I consider to be an Oracle AI. What substantial role do you visualize such a machine playing in the Singularity runup?

[-]Vladimir_Nesov17y30

I'm uncomfortable with assessing a system by whether it "holds rational beliefs" or "infers Newton's laws": these are specific question that system doesn't need to explicitly answer in order to efficiently optimize. They might be important in a context of specific cognitive architecture, but they are nowhere to be found if cognitive architecture doesn't hold interface to them as an invariant. If it can just weave Bayesian structure in physical substrate right through to the goal, there need not be any anthropomorphic natural categories along the way.

[-]Tim_Tyler17y-30

Re: Economists like myself usually favor mostly competition, augmented when possible by cooperation to overcome market failures.

You mean you favour capitalism? Is that because you trained in a capitalist country?

What about the argument which might be advanced by socialist economists - that waging economic warfare with with each other is a primitive, uncivilised, wasteful and destructive behaviour, which is best left to savages who know no better?

[-]Shane_Legg17y00

Eli:

If it was straight Bayesian CTW then I guess not. If it employed, say, an SVM over the observed data points I guess it could approximate the effect of Newton's laws in its distribution over possible future states.

How about predicting the markets in order to acquire more resources? Jim Simons made $3 billion last year from his company that (according to him in an interview) works by using computers to find statistical patterns in financial markets. A vastly bigger machine with much more input could probably do a fair amount better, and probably find uses outside simply finance.

[-]Peter_McCluskey17y10

Robin, I see a fair amount of evidence that winner take all types of competition are becoming more common as information becomes more important than physical resources.
Whether a movie star cooperates with or helps subjugate the people in central Africa seems to be largely an accidental byproduct of whatever superstitions happen to be popular among movie stars.
Why doesn't this cause you to share more of Eliezer's concerns? What probability would you give to humans being part of the winning coalition? You might have a good argument for putting it around 6... (read more)

[-]Eliezer Yudkowsky17y10

Peter, the best possible version of an Oracle AI is a Friendly Oracle AI where you didn't skip any of the hard problems - where you guaranteed its self-improvement and taught it what should means, where the AI is checking the distant effects of its own answers and can refuse to answer. Then the question is, if you can do these things, do you still get a substantial safety improvement out of making it a Friendly Oracle AI rather than a Friendly AI? That's the question I look at once a year.

-2TheAncientGeek10y

If that type of full-strength AI is close in algorithmspace to a dangerously unfriendly AI...and you have pretty much argued that it is...then that is not safe, because you cannot rely on complex projects being got right 100% of th etime.

[-]Mestroyer14y50

Holden Karnofsky thinks superintelligences with utility functions are made out of programs that list options by rank without making any sort of value judgement (basically answer a question), and then pick the one with the most utility.

Eliezer Yudkowsky thinks that a superintelligence that would answer a question would have to have a question-answering utility function making it decide to answer the question, or to pick paths that would lead to getting the answer to the question and answer it.

Says Allison: All digital logic is made of NOR gates!

Says Bruce: ... (read more)

1Lapsed_Lurker13y

Isn't 'listing by rank' 'making a (value) judgement'?

[-]A1987dM12y00

Just some random guy with poor grammar on an email list, but still one of the most epic FAIls I recall seeing.

What does the second I in FAII stand for? (Idiot?)

3Moss_Piglet12y

It's a lower-case L I'd imagine. FAI + fail = FAIl.

0A1987dM12y

Darned sans-serif fonts!

[-]TheAncientGeek10y00

To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in

... (read more)

0ike10y

And if it produces a "protein" that technically answers our request, but has a nasty side effect of destroying the world? We don't consider scientists dangerous because we think they don't want to destroy the world. Or are you claiming that we'd be able to recognize when a plan proposed by the Oracle AI (and if you're asking questions about protein folding, you're asking for a plan) is dangerous?

0TheAncientGeek10y

It produces a dangerous protein inadvertently, in the way that science might...or it has a higher-than-science probability of producing a dagnerous protein, due to some unfriendly intent? Was there a negative missing in that? I am not saying we necessarily would. I am saying that recognising the hidden dangers in the output form the Oracle room is fundamentally different from recognising the hidden dangers in the output from the science room, which we are doing already. It's not some new level of risk,.

0ike10y

Statement should be read Since we think scientists are friendly, we trust them more than we should trust an Oracle AI. There's also the fact that an unfriendly AI presumably can fool us better than a scientist can. Mostly the latter. However, even the former can be worse than science now, in that "don't destroy the world" is not an implicit goal. So a scientist noticing that something is dangerous might not develop it, while an AI might not have such restrictions. Are you missing a negative now?

0TheAncientGeek10y

I don't see how you can assert without knowing anything about the type of Oracle AI. Ditto. Why would a non-agentive , non-goal-driven AI want to fools us? Where would it get the motivation from? How could an AI with no knowledge of psychology fool us? Where would it get the knowledge from? But then people would know that the AI's output hasn't been filtered by a human's common sense. Yes. Irony strikes again.

0ike10y

We can presume that a scientist wants to still exist, and hence doesn't want to destroy the world. This seems much stronger than a presumption that an Oracle AI will be safe. Of course, an AI might be safe, and a scientist might be out to get us; but the balance of probability says otherwise. I'm not asserting that every AI is dangerous and every scientist is safe. An AI can fool us better simply because it's smarter (by assumption). I still think you're using "non-agent" as magical thinking. Here we're talking in context of what you said above: So let's say the Oracle AI decides that X best answers our question. But if it tell us X, we won't accept it. If the Oracle cares that we adopt X, it might answer Y, which does the same as X but looks more appealing. Or more subtly, if the AI comes up with Y, it might not tell us that it causes X, because it doesn't care that X doesn't fulfil our values, whereas a scientist would note all the implications. If humans are incapable of recognizing whether the plan is dangerous or not, it doesn't matter how much scrutiny they put it through, they won't be able to discern the danger.

-2TheAncientGeek10y

You don't have any evidence that AIs are generally dangerous (since we have AIs and the empirical evidence is that they are not), and you don't have a basis for theorising that Oracles are dangerous, because there are a number of different kinds of oracle. So are out current AIs fooling is? We build them because they are better than us at specific things, but that doesn't give them the motivation or the ability to fool us. Smartness isn't a single one-size-all thing and AIs aren't uniform in their abilities an properties. Once you shed those two illusions, you can see much easier methods of AI safety than those put forward by MIRI. I still think that if you can build it, it isn't magic. A narrowly defined AI won't "care" about anything except answering questions, so it won't try to second guess us. I have dealt with that objection several times. People know that when you use databases and search engines, they don't fully contextualise things, and the user of the information therefore has to exercise caution. That's an only-perfection-will-do objection. Of course, humans can't perfectly scrutinise scientific discovery, etc, so that changes nothing.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

29

Dreams of Friendliness

29

29