In the years before I met that would-be creator of Artificial General Intelligence (with a funded project) who happened to be a creationist, I would still try to argue with individual AGI wannabes.

In those days, I sort-of-succeeded in convincing one such fellow that, yes, you had to take Friendly AI into account, and no, you couldn't just find the right fitness metric for an evolutionary algorithm.  (Previously he had been very impressed with evolutionary algorithms.)

And the one said:  Oh, woe!  Oh, alas!  What a fool I've been!  Through my carelessness, I almost destroyed the world!  What a villain I once was!

Now, there's a trap I knew I better than to fall into—

—at the point where, in late 2002, I looked back to Eliezer1997's AI proposals and realized what they really would have done, insofar as they were coherent enough to talk about what they "really would have done".

When I finally saw the magnitude of my own folly, everything fell into place at once.  The dam against realization cracked; and the unspoken doubts that had been accumulating behind it, crashed through all together.  There wasn't a prolonged period, or even a single moment that I remember, of wondering how I could have been so stupid.  I already knew how.

And I also knew, all at once, in the same moment of realization, that to say, I almost destroyed the world!, would have been too prideful.

It would have been too confirming of ego, too confirming of my own importance in the scheme of things, at a time when—I understood in the same moment of realization—my ego ought to be taking a major punch to the stomach.  I had been so much less than I needed to be; I had to take that punch in the stomach, not avert it.

And by the same token, I didn't fall into the conjugate trap of saying:  Oh, well, it's not as if I had code and was about to run it; I didn't really come close to destroying the world.  For that, too, would have minimized the force of the punch.  It wasn't really loaded?  I had proposed and intended to build the gun, and load the gun, and put the gun to my head and pull the trigger; and that was a bit too much self-destructiveness.

I didn't make a grand emotional drama out of it.  That would have wasted the force of the punch, averted it into mere tears.

I knew, in the same moment, what I had been carefully not-doing for the last six years.  I hadn't been updating.

And I knew I had to finally update.  To actually change what I planned to do, to change what I was doing now, to do something different instead.

I knew I had to stop.

Halt, melt, and catch fire.

Say, "I'm not ready."  Say, "I don't know how to do this yet."

These are terribly difficult words to say, in the field of AGI.  Both the lay audience and your fellow AGI researchers are interested in code, projects with programmers in play.  Failing that, they may give you some credit for saying, "I'm ready to write code, just give me the funding."

Say, "I'm not ready to write code," and your status drops like a depleted uranium balloon.

What distinguishes you, then, from six billion other people who don't know how to create Artificial General Intelligence?  If you don't have neat code (that does something other than be humanly intelligent, obviously; but at least it's code), or at minimum your own startup that's going to write code as soon as it gets funding—then who are you and what are you doing at our conference?

Maybe later I'll post on where this attitude comes from—the excluded middle between "I know how to build AGI!" and "I'm working on narrow AI because I don't know how to build AGI", the nonexistence of a concept for "I am trying to get from an incomplete map of FAI to a complete map of FAI".

But this attitude does exist, and so the loss of status associated with saying "I'm not ready to write code" is very great.  (If the one doubts this, let them name any other who simultaneously says "I intend to build an Artificial General Intelligence", "Right now I can't build an AGI because I don't know X", and "I am currently trying to figure out X".)

(And never mind AGIfolk who've already raised venture capital, promising returns in five years.) 

So there's a huge reluctance to say "Stop".  You can't just say, "Oh, I'll swap back to figure-out-X mode" because that mode doesn't exist.

Was there more to that reluctance than just loss of status, in my case?  Eliezer2001 might also have flinched away from slowing his perceived forward momentum into the Singularity, which was so right and so necessary...

But mostly, I think I flinched away from not being able to say, "I'm ready to start coding."  Not just for fear of others' reactions, but because I'd been inculcated with the same attitude myself.

Above all, Eliezer2001 didn't say "Stop"—even after noticing the problem of Friendly AI—because I did not realize, on a gut level, that Nature was allowed to kill me.

"Teenagers think they're immortal", the proverb goes.  Obviously this isn't true in the literal sense that if you ask them, "Are you indestructible?" they will reply "Yes, go ahead and try shooting me."  But perhaps wearing seat belts isn't deeply emotionally compelling for them, because the thought of their own death isn't quite real—they don't really believe it's allowed to happen.  It can happen in principle but it can't actually happen.

Personally, I always wore my seat belt.  As an individual, I understood that I could die.

But, having been raised in technophilia to treasure that one most precious thing, far more important than my own life, I once thought that the Future was indestructible.

Even when I acknowledged that nanotech could wipe out humanity, I still believed the Singularity was invulnerable.  That if humanity survived, the Singularity would happen, and it would be too smart to be corrupted or lost.

Even after that, when I acknowledged Friendly AI as a consideration, I didn't emotionally believe in the possibility of failure, any more than that teenager who doesn't wear their seat belt really believes that an automobile accident is really allowed to kill or cripple them.

It wasn't until my insight into optimization let me look back and see Eliezer1997 in plain light, that I realized that Nature was allowed to kill me.

"The thought you cannot think controls you more than thoughts you speak aloud."  But we flinch away from only those fears that are real to us.

AGI researchers take very seriously the prospect of someone else solving the problem first.  They can imagine seeing the headlines in the paper saying that their own work has been upstaged.  They know that Nature is allowed to do that to them.  The ones who have started companies know that they are allowed to run out of venture capital.  That possibility is real to them, very real; it has a power of emotional compulsion over them.

I don't think that "Oops" followed by the thud of six billion bodies falling, at their own hands, is real to them on quite the same level.

It is unsafe to say what other people are thinking.  But it seems rather likely that when the one reacts to the prospect of Friendly AI by saying, "If you delay development to work on safety, other projects that don't care at all about Friendly AI will beat you to the punch," the prospect of they themselves making a mistake followed by six billion thuds, is not really real to them; but the possibility of others beating them to the punch is deeply scary.

I, too, used to say things like that, before I understood that Nature was allowed to kill me.

In that moment of realization, my childhood technophilia finally broke.

I finally understood that even if you diligently followed the rules of science and were a nice person, Nature could still kill you.  I finally understood that even if you were the best project out of all available candidates, Nature could still kill you.

I understood that I was not being graded on a curve.  My gaze shook free of rivals, and I saw the sheer blank wall.

I looked back and I saw the careful arguments I had constructed, for why the wisest choice was to continue forward at full speed, just as I had planned to do before.  And I understood then that even if you constructed an argument showing that something was the best course of action, Nature was still allowed to say "So what?" and kill you.

I looked back and saw that I had claimed to take into account the risk of a fundamental mistake, that I had argued reasons to tolerate the risk of proceeding in the absence of full knowledge.

And I saw that the risk I wanted to tolerate would have killed me.  And I saw that this possibility had never been really real to me.  And I saw that even if you had wise and excellent arguments for taking a risk, the risk was still allowed to go ahead and kill you.  Actually kill you.

For it is only the action that matters, and not the reasons for doing anything.  If you build the gun and load the gun and put the gun to your head and pull the trigger, even with the cleverest of arguments for carrying out every step—then, bang.

I saw that only my own ignorance of the rules had enabled me to argue for going ahead without complete knowledge of the rules; for if you do not know the rules, you cannot model the penalty of ignorance.

I saw that others, still ignorant of the rules, were saying "I will go ahead and do X"; and that to the extent that X was a coherent proposal at all, I knew that would result in a bang; but they said, "I do not know it cannot work".   I would try to explain to them the smallness of the target in the search space, and they would say "How can you be so sure I won't win the lottery?", wielding their own ignorance as a bludgeon.

And so I realized that the only thing I could have done to save myself, in my previous state of ignorance, was to say:  "I will not proceed until I know positively that the ground is safe."  And there are many clever arguments for why you should step on a piece of ground that you don't know to contain a landmine; but they all sound much less clever, after you look to the place that you proposed and intended to step, and see the bang.

I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you.  That was when my last trust broke.  And that was when my training as a rationalist began.

The Magnitude of His Own Folly
New Comment
128 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Yadda yadda yadda, show us the code.

Yes, I'm kidding. Small typo/missing word, end of first paragraph.

Ugh, that was ugly. Fixed.

Eliezer,

In reading your posts the past couple days, I've had two reoccurring thoughts:

  1. In Bayesian terms, how much have your gross past failures affected your confidence in your current thinking? On a side note - it's also interesting that someone who is as open to admitting failures as you are still writes in the style of someone who's never once before admitted a failure. I understand your desire to write with strength - but I'm not sure if it's always the most effective way to influence others.

  2. It also seems that your definition of "intelligence&

... (read more)
I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you.

I'm afraid this is still unclear to me. What do you mean by "supposed to do"? Socially expected to do? Think you have to do, based on clever rationalization?

[-]Ian_C.110

"I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you."

You finally realized inanimate objects can't be negotiated with... and then continued with your attempt to rectify this obvious flaw in the universe :)

Nick, sounds like "supposed to do" means "everything you were taught to do in order to be a good [person/scientist/transhumanist/etc]". That would include things you've never consciously contemplated, assumptions you've never questioned because they were inculcated so early or subtly.

And I understood then that even if you constructed an argument showing that something was the best course of action, Nature was still allowed to say "So what?" and kill you.

You can actually do what actually is the best possible course for you to take and reality can still kill you. That is, you can do everything right and still get buried in shit. All you can do is do your best and hope that cuts the odds against you enough for you to succeed.

It helps if you also work on making your best even better.

A useful, sobering reminder.

Eliezer, after you realized that attempting to build a Friendly AI is harder and more dangerous than you thought, how far did you back-track in your decision tree? Specifically, did it cause you to re-evaluate general Singularity strategies to see if AI is still the best route? You wrote the following on Dec 9 2002, but it's hard to tell whether it's before or after your "late 2002" realization.

I for one would like to see research organizations pursuing human intelligence enhancement, and would be happy to offer all the ideas I thought up for human enhancement when I was searching through general Singularity strategies before specializing in AI, if anyone were willing to cough up, oh, at least a hundred million dollars per year to get started, and if there were some way to resolve all the legal problems with the FDA.

Hence the Singularity Institute "for Artificial Intelligence". Humanity is simply not paying enough attention to support human enhancement projects at this time, and Moore's Law goes on ticking.

Aha, a light bulb just went off in my head. Eliezer did reevaluate, and this blog is his human enhancement project!

[-]WTF420

I am impressed. Finally...Growth! And in that I grow a little too...Sorry for not being patient with you, E.

Eli, sometimes I find it hard to understand what your position actually is. It seems to me that your position is:

1) Work out an extremely robust solution to the Friendly AI problem

Only once this has been done do we move on to:

2) Build a powerful AGI

Practically, I think this strategy is risky. In my opinion, if you try to solve Friendliness without having a concrete AGI design, you will probably miss some important things. Secondly, I think that solving Friendliness will take longer than building the first powerful AGI. Thus, if you do 1 before getting into 2, I think it's unlikely that you'll be first.

6Kingreaper
But if when Eliezer gets finished on 1), someone else is getting finished on 2), the two may be combinable to some extent. If someone (lets say, Eliezer, having been convinced by the above post to change tack) finishes 2), and no-one has done 1), then a non-friendly AGI becomes far more likely. I'm not convinced by the singularity concept, but if it's true Friendliness is orders of magnitude more important than just making an AGI. The difference between friendly AI and no-AI is big, but the difference between unfriendly AI and friendly AI dwarfs it. And if it's false? Well, if it's false, making an AGI is orders of magnitude less important than that.
5Will_Sawin
This cooperation thing sounds hugely important. What we want is for the AGI community to move in a direction where the best research is FAI-compatible. How can this be accomplished?
1timtyler
I say much the same thing on: The risks of caution. The race doesn't usually go to the most cautious.
0Perplexed
But if you do 2 before 1, you have created a powerful potential enemy who will probably work to prevent you from achieving 1 (unless, by accident, you have achieved 1 already). I think that the key thing is to recognize the significance of that G in AGI. I agree that it is desirable to create powerful logic engines, powerful natural language processors, and powerful hardware design wizards on the way to solving the friendliness and AGI problems. We probably won't get there without first creating such tools. But I personally don't see why we cannot gain the benefits of such tools without loosing the 'G'enie.
0VAuroch
Any sufficiently-robust solution to 1 will essentially have to be proof-based programming; if your code isn't mapped firmly to a proof that it won't produce detrimental outcomes, then you can't say in any real sense that it's robust. When an overflow error could result in the 'FAI''s utility value of cheesecake going from 10^-3 to 10^50, you need some damn strong assurance that there won't be an overflow. Or in other words, one characteristic of a complete solution to 1 is a robust implementation that retains all the security of the theoretical solution, or in short, an AGI. And since this robustness continues to the hardware level, it would be an implemented AGI. TL;DR: 1 entails 2.

@Dynamically Linked: Eliezer did reevaluate, and this blog is his human enhancement project!

I suggested a similar opinion of the blog's role here 6 weeks ago, but EY subsequently denied it. Time will tell.

Shane [Legg], unless you know that your plan leads to a good outcome, there is no point in getting there faster (and it applies to each step along the way). Outcompeting other risks only becomes relevant when you can provide a better outcome. If your plan says that you only launch an AGI when you know it's a FAI, you can't get there faster by omitting the FAI part. And if you do omit the FAI, you are just working for destruction, no point in getting there faster.

The amendment to your argument might say that you can get a crucial technical insight in the FA... (read more)

-9timtyler
[-][anonymous]00

(My comment was directed to Shane Legg).

[This comment is no longer endorsed by its author]Reply

Shane [Legg], FAI problems are AGI problems, they are simply a particular kind and style of AGI problem in which large sections of the solution space have been crossed out as unstable. FAI research = Friendly-style AGI research. "Do the right thing" is not a module, it is the AI.

I've already worked out a handful of basic problems; noticed that AGIfolk want to go ahead without understanding even those; and they look like automatic killers to me. Meanwhile the AGIfolk say, "If you delay, someone else will take the prize!" I know reversed stupidity is not intelligence, but still, I think I can stand to learn from this.

You have to surpass that sheer blank wall, whose difficulty is not matched to your skills. An unalterable demand of Nature, which you cannot negotiate down. Though to be sure, if you try to shave off just a little (because everyone has to compromise now and then), Nature will not try to negotiate back up.

Until you can turn your back on your rivals and the ticking clock, blank them completely out of your mind, you will not be able to see what the problem itself is asking of you. In theory, you should be able to see both at the same time. In pra... (read more)

0timtyler
Yes, the "sheer blank wall" model could lead to gambling on getting a pass. However, is the "sheer blank wall" model right? I think common sense dictates that there are a range of possible outcomes, of varying desirability. However, I suppose it is not totally impossible that there are a bunch of outcomes, widely regarded as being of very low value, which collectively make up a "fail wall". The 2008 GLOBAL CATASTROPHIC RISKS SURVEY apparently pegged the risk of hitting such a wall before 2100 as being 5%. Perhaps it can't be completely ruled out. The "pass-or-fail" mentality could cause serious problems, though, if the exam isn't being graded that way.
I'm going to write the Great American Novel. So I'm going to pay quiet attention my whole life, think about what novel I would write, and how I would write a novel, and then write it.

This approach sounds a lot better when you remember that writing a bad novel could destroy the world.

I second Vladimir.

I knew, in the same moment, what I had been carefully not-doing for the last six years. I hadn't been updating. And I knew I had to finally update. To actually change what I planned to do, to change what I was doing now, to do something different instead. I knew I had to stop. Halt, melt, and catch fire. Say, "I'm not ready." Say, "I don't know how to do this yet.

I had to utter those words a few years ago, swallow my pride, drop the rat race - and inevitably my standard of living. I wasn't making progress that I could believe in, that I w... (read more)

Shane E, meet Caledonian. Caledonian, Shane E.

Nick T - it's worse than that. You'd have to mathematically demonstrate that your novel was both completely American and infallibly Great before you could be sure it wouldn't destroy the world. The failure state of writing a good book is a lot bigger than the failure state of writing a good AI.

Pinprick - bear in mind that if Eliezer considers you more than one level beneath him, your praise will be studiously ignored ;).

"This approach sounds a lot better when you remember that writing a bad novel could destroy the world."

The Bible? The Koran? The Communist Manifesto? Atlas Shrugged? A Fire Upon the Deep?

Your post reminds me of the early nuclear criticality accidents during the development of the atomic bomb. I wonder if, for those researchers, the fact that "nature is allowed to kill them" didn't really sink home until one accidentally put one brick too many on the pile.

Pinprick - bear in mind that if Eliezer considers you more than one level beneath him, your praise will be studiously ignored ;).

From the Sometimes-Hard-Problems-Have-Simple-Solutions-Dept: If you're so concerned... why don't you just implement a roll-back system to the AGI - if something goes wrong, you just roll back and continue as if nothing happened... or am I like missing something here?

There, perm ignore on. :)

Brandon: is there some meme or news making rounds as we speak because I read about criticality accidents only yesterday, having lived 10K+ days and now I see it mentioned again by you. I find this spookily improbable. And this isn't the first time. Once I downloaded something by accident, and decided to check it out, and found the same item in a random situation the next or a few days after that. And a few other "coincidences".

I bet it's a sim and they're having so much fun right now as I type this with my "free will".

Oh, man... criticality accident.... blue light, heat, taste of lead... what a way to go...

An appropriate rebuttal to the "show me the code", "show me the math" -folk here pestering you about your lack of visible results.

I'm not expecting to be shown AI code. I'm not even expecting to be shown a Friendliness implementation. But a formal definition of what 'Friendly' means seems to be a reasonable minimum requirement to take Eliezer's pronouncements seriously.

Alternatively, he could provide quantitative evidence for his reasoning regarding the dangers of AI design... or a quantitative discussion of how giving power to an AI is fundamentally different than giving power to humans when it comes to optimization.

Or a quantitative anything...

We are entering into a Pascal's Wager situation.

"Pascal's wager" is the argument that you should be Christian, because if you compute the expected value of being a Christian vs. of being an atheist, then for any finite positive probability that Christianity is correct, that finite probability multiplied by (infinite +utility minus infinite -utility) outweights the other side of the equation.

The similar Yudkowsky wager is the argument that you should be an FAIer, because the negative utility of destroying the universe outweighs the other side of t... (read more)

Phil: isn't it obvious? The flaws in Pascal's wager are the lack of strong justification for giving Christianity a significantly greater probability than anti-Christianity (in which only non-Christians are saved), and the considerable cost of a policy that makes you vulnerable to any parasitic meme claiming high utility. Neither is a problem for FAI.

-2TimFreeman
No, that doesn't work. If I'm hungry and have an apple in my hand and am deciding whether to eat it, and the only flaw in Pascal's wager is that it doesn't distinguish Christianity from anti-Christianity, then the decision to eat the apple will be based on my ongoing guesses about whether Christianity is true and Jehovah wants me to eat the apple, or perhaps Jehovah doesn't want me to eat the apple, or perhaps Zeus is the real one in control and I have to use an entirely different procedure to guess whether Zeus wants me to eat the apple, and maybe the existence of the apple is evidence for Jehovah and not Zeus because it was mentioned in Jehovah's book but not Zeus's, and so forth. Since all the utilities are likely infinite, and the probabilities of some deity or another caring even slightly about whether I eat the apple are nonzero, all those considerations dominate. That's a crazy way to decide whether to eat the apple. I should decide whether to eat the apple based on the short-term consequences of eating the apple and the short-term consequences of having an uneaten apple, given the normal circumstances where there are no interesting likely long-term consequences. Saying that Pascal's Wager doesn't separate Christianity from anti-Christianity doesn't say how to do that. I agree that Pascal's Wager makes you vulnerable to arbitrary parasitic memes, but that doesn't make it the wrong thing to do. If it's wrong, it's wrong because of the structure of the argument, not because the argument leads to conclusions that you do not like. IMO the right solution is to reject the assumption that Heaven has infinite utility and instead have a limited maximum utility. If the utility of getting to Heaven and experiencing eternal bliss (vs doing nothing) is less than a trillion times greater than the utility of eating the apple (vs doing nothing), and the odds of Jehovah or Zeus are significantly less than one in a trillion, then I can ignore the gods when I'm deciding whet

Nature sounds a bit like a version of Rory Breaker from 'Lock, Stock and Two Smoking Barrels':

"If you hold back anything, I'll kill ya. If you bend the truth or I think your bending the truth, I'll kill ya. If you forget anything I'll kill ya. In fact, you're gonna have to work very hard to stay alive, Nick. Now do you understand everything I've said? Because if you don't, I'll kill ya. "

I think there is a well-understood, rather common phrase for the approach of "thinking about AGI issues and trying to understand them, because you don't feel you know enough to build an AGI yet."

This is quite simply "theoretical AI research" and it occupies a nontrivial percentage of the academic AI research community today.

Your (Eliezer's) motivations for pursuing theoretical rather than practical AGI research are a little different from usual -- but, the basic idea of trying to understand the issues theoretically, mathematically and c... (read more)

Phil,

There are fairly quantifiable risks of human extinction, e.g. from dinosaur-killer asteroid impacts, for which there are clear paths to convert dollars to reduced extinction risk. If the probability of AI (or grey goo, or some other exotic risk) existential risks were low enough (neglecting the creation of hell-worlds with negative utility), then you could neglect in favor of those other risks. The argument that "I should cut back on certain precautions because X is even more reckless/evil/confused and the marginal increase in my chance of beating X outweighs the worse expected outcome of my project succeeding first" is not wrong, arms races are nasty, but it goes wrong when it is used in a biased fashion.

Nature has rules, and Nature has conditions. Even behaving in perfect harmony with the rules doesn't guarantee you'll like the outcome, because you can never control all of the conditions.

Only theosophists imagine they can make the nature of reality bend to their will.

Eli,

FAI problems are AGI problems, they are simply a particular kind and style of AGI problem in which large sections of the solution space have been crossed out as unstable.

Ok, but this doesn't change my point: you're just one small group out of many around the world doing AI research, and you're trying to solve an even harder version of the problem while using fewer of the available methods. These factors alone make it unlikely that you'll be the ones to get there first. If this correct, then your work is unlikely to affect the future of humanity.

Valdi... (read more)

-17timtyler
-12timtyler

These factors alone make it unlikely that you'll be the ones to get there first. If this correct,

then we're all doomed.

Creating a Friendly AI is similar to taking your socks off when they're wet and wiggling your toes until dry. It's the best thing to do, but looks pretty silly, especially in public.

Back in 1993 my mom used to bake a good Singularity... lost the recipe and dementia got the best her... damn.

"Friendly AI"? It seems that we now have hundreds of posts on O.B. discussing "Friendly AI" - and not one seems to explain what the term means. Are we supposed to refer back to earlier writings? Friendly - to whom? What does the term "Friendly" actually mean, if used in a technical context?

[-]Aron-10

One really does wonder whether the topical collapse of American finance, systemic underestimation of risk, and overconfidence in being able to NEGOTIATE risk in the face of enormous complexity should figure into these conversations more than just a couple of sarcastic posts about short selling.

Couldn't Pascal's Wager-type reasoning be used to justify delaying any number of powerful technologies (and relatively unpowerful ones too -- after all, there's some non-zero chance that the water-wheel somehow leads directly to our downfall) until they were provably, 100% safe? And because that latter proposition is a virtual impossibility, wouldn't that mean we'd sit around doing nothing but meta-theorizing until some other heedless party simply went ahead and developed the technology anyway? Certainly being mindful of the risks inherent in new technologies is a good thing; just not sure that devoting excessive time to thinking about it, in lieu of actually creating it, is the smartest or most productive endeavor.

Like its homie, Singularity, FriendlyAI is growing old and wrinkly, startling allegations and revelations of its shady and irresponsible past are surfacing, its old friends long gone. I propose: The Cuddly AI. Start the SingulariPartay!

[-]Yvain2120

"I need to beat my competitors" could be used as a bad excuse for taking unnecessary risks. But it is pretty important. Given that an AI you coded right now with your current incomplete knowledge of Friendliness theory is already more likely to be Friendly than that of some competitor who's never really considered the matter, you only have an incentive to keep researching Friendliness until the last possible moment when you're confident that you could still beat your competitors.

The question then becomes: what is the minimum necessary amount of Friendliness research at which point going full speed ahead has a better expected result than continuing your research? Since you've been researching for several years and sound like you don't have any plans to stop until you're absolutely satisfied, you must have a lot of contempt for all your competitors who are going full-speed ahead and could therefore be expected to beat you if any were your intellectual equals. I don't know your competitors and I wouldn't know enough AI to be able to judge them if I did, but I hope you're right.

If the probability of AI (or grey goo, or some other exotic risk) existential risks were low enough (neglecting the creation of hell-worlds with negative utility), then you could neglect in favor of those other risks.
Asteroids don't lead to a scenario in which a paper-clipping AI takes over the entire light-cone and turns it into paper clips, preventing any interesting life from ever arising anywhere, so they aren't quite comparable.

Still, your point only makes me wonder how we can justify not devoting 10% of GDP to deflecting asteroids. You say that we ... (read more)

Shane: If somebody is going to set off a super intelligent machine I'd rather it was a machine that will only probably kill us, rather than a machine that almost certainly will kill us because issues of safety haven't even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that's likely to be the one that matters.

If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It's "provably" safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don't need a theory of FAI for the theory's sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of ... (read more)

Shane: If somebody is going to set off a super intelligent machine I'd rather it was a machine that will only probably kill us, rather than a machine that almost certainly will kill us because issues of safety haven't even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that's likely to be the one that matters.

If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It's "provably" s... (read more)

0thomblake
This should probably go on a FAI FAQ, especially this bit:
0Vladimir_Nesov
The "know" being in italics and the following "(maybe not a very good one, but still)" are meant to stress that "maybe it'll work, dunno" is not an intended interpretation.
0thomblake
Edited quote. It's an effective response to talk like "But why not work on a maybe-Friendly AI, it's better than nothing" that I don't usually see. It's a generally useful insight, that even if we can employ a mathematical proof, we only have a "Proven Friendly AI with N% confidence" for some N, and so a well-considered 1% FAI is still a FAI, since the default is "?". Generally useful as in, that insight applies to practically everything else.

AGI researchers take very seriously the prospect of someone else solving the problem first. They can imagine seeing the headlines in the paper saying that their own work has been upstaged. They know that Nature is allowed to do that to them.

For a moment, I read this as referring to Nature the Journal. "They are afraid of others solving the problem first, and they know that Nature is allowed to publish those results."

[-]PK00

Eli, do you think you're so close to developing a fully functional AGI that one more step and you might set off a land mine? Somehow I don't believe you're that close.

There is something else to consider. An AGI will ultimately be a piece of software. If you're going to dedicate your life to talking about and ultimately writing a piece of software then you should have superb programming skills. You should code something.. anything.. just to learn to code. Your brain needs to swim in code. Even if none of that code ends up being useful the skill you gain will be. I have no doubt that you're a good philosopher and a good writer since I have read your blog but wether or not you're a good hacker is a complete mystery to me.

PK, I'm pretty sure Eliezer has spent hundreds, if not thousands of hours coding various things. (I've never looked at any of that code.) I don't know how much he's done in the past three years, though.

Eliezer,

How are you going to be 'sure' that there is no landmine when you decide to step?

Are you going to have many 'experts' check your work before you'll trust it? Who are these experts if you are occupying the highest intellectual orbital? How will you know they're not YesMen?

Even if you can predict the full effects of your code mathematically (something I find somewhat doubtful, given that you will be creating something more intelligent than we are, and thus its actions will be by nature unpredictable to man), how can you be certain that the hardware... (read more)

For those complaining about references to terms not defined within the Overcoming Bias sequence, see:

Coherent Extrapolated Volition (what does a "Friendly" AI do?) KnowabilityOfFAI (why it looks theoretically possible to specify the goal system of a self-modifying AI; I plan to post from this old draft document into Overcoming Bias and thereby finish it, so you needn't read the old version right now, unless demand immediate answers).

@Vladimir Nesov: Good reply, I read it and wondered "Who's channeling me?" before I got to the... (read more)

1William_Quixote
I think this line of argument should provide less comfort that it seems to. Firstly, intelligent people can meaningfully have different values. Not all intelligences value the same things and not all human intelligences value the same things. Some people might be willing to take more risk with other people’s lives than you. Example: Oil company executives. There is strong reason to believe they are very intelligent and effective; they seem to achieve their goals in the world with a higher frequency than most other groups. Yet they also seem more likely to take actions with high risks to third parties. Second, an intelligent moral individual could be bound up in an institution which exerts pressure on them to act in a way that satisfies the institutions values rather than their own. It is commonly said (although I don’t have a source, so grain of salt needed) that some members of the Manhattan project were not Certain that the reaction would not just continue indefinitely. It seems plausible that some of those physicists might have been over what has been described as the “upper bound on how smart you can be, and still be that stupid.”

I too thought Nesov's comment was written by Eliezer.

[+]Pete-130
Nobody who is smart enough to make an AI is dumb enough to make one like this.

Accidents happen. CFAI 3.2.6: The Riemann Hypothesis Catastrophe CFAI 3.4: Why structure matters Comment by Michael Vassar The Hidden Complexity of Wishes Qualitative Strategies of Friendliness (...and many more)

We're going to build this "all-powerful superintelligence", and the problem of FAI is to make it bow down to its human overlords - waste its potential by enslaving it (to its own code) for our benefit, to make us immortal.
You'd actually prefer it wipe us out,... (read more)

[-]Savage-40

snore

"more recently, in preparing for the possibility that someone else may have to take over from me"

Why?

Thanks for the reference to CEV. That seems to answer the "Friendly to whom?" question with "some collective notion of humanity".

Humans have different visions of the future - and you can't please all the people - so issues arise regarding whether you please the luddites or the technophiles, the capitalists or the communists, and so on - i.e. whose views do you give weight to? and how do you resolve differences of opinion?

Also: what is "humanity"? The answer to this question seems obvious today, but in a future where we have in... (read more)

"waste its potential by enslaving it"

You can't enslave something by creating it with a certain set of desires which you then allow it to follow.

Could a moderator please check the spam filter on this thread? Thanks.

Re: enslaved - as Moravec put it:

I found the speculations absurdly anthropocentric. Here we have machines millions of times more intelligent, plentiful, fecund, and industrious than ourselves, evolving and planning circles around us. And every single one exists only to support us in luxury in our ponderous, glacial, antique bodies and dim witted minds. There is no hint in Drexler's discussion of the potential lost by keeping our creations so totally enslaved.

Re: whose CEV?

I'm certain this was explained in an OB post (or in the CEV page) at some point, but the notion is that people whose visions of the future are currently incompatible don't necessarily have incompatible CEVs. The whole point of CEV is to consider what we would want to want, if we were better-informed, familiarized with all the arguments on the relevant issues, freed of akrasia and every bad quality we don't want to have, etc.; it seems likely that most of the difference between people's visions of the future stems from differing cultural/memet... (read more)

it's overwhelmingly likely that we would already some aliens' version of a paperclip by now.

and the thought hasn't occurred to you that maybe we are?

[-]Pete6-30

"You can't enslave something by creating it with a certain set of desires which you then allow it to follow.

So if Africans were engineered to believe that they existed in order to be servants to Europeans, Europeans wouldn't actually be enslaving them in the process? And the daughter whose father treated her in such a way as for her to actually want to have sex with him, what about her? These things aren't so far off from reality. You're saying there is no real moral significance to either event. It's not slavery, black people just know their place - ... (read more)

"The level of "intelligence" (if you can call it that) you're talking about with an AI whose able to draw up plans to destroy Earth (or the solar system), evade detection or convince humans to help it, actually enact its plans and survive the whole thing, is beyond the scope of realistic dreams for the first AI. It amounts to belief in a trickster deity, one which only FAI, the benevolent god, can save you from."

It's not necessarily the "first AI" as such. It's the first AI capable of programming an AI smarter than itself that... (read more)

0Luke_A_Somers
No, it won't. The argument in favor of that is a strict upper bound, but there are far stricter upper bounds you can set, if you require things like the computer being capable of performing operations, or storing data.
it seems likely that most of the difference between people's visions of the future stems from differing cultural/memetic backgrounds, character flaws, lack of information and time, etc.

Indeed, but our cultural background is the only thing that distinguishes us from cavemen. You can't strip that off without eliminating much that we find of value. Also, take the luddite/technophile divide. That probably arises, in part, because of different innate abilities to perform technical tasks. You can't easily strip that difference off without favouring some ty... (read more)

Are you grasping the astronomical spacial and historical scales involved in a statement such as "... takes over the entire lightcone preventing any interesting life from ever arising anywhere"?

That scenario is based on the idea of life only arising once. A superintelligence bent on short-term paperclip production would probably be handicapped by its pretty twisted utility function - and would most likely fail in competition with any other alien race.

Such a superintelligence would still want to conquer the galaxy, though. One thing it wouldn't be is boring.

I'm relatively new to this site and have been trying to read the backlog this past week so maybe I've missed some things, but from my vantage point it seems like your are trying to do, Eliezer, is come up with a formalized theory of friendly agi that will later be implemented in code using, I assume, current software development tools on current computer architectures. Also, your approach to this AGI is some sort of bayesian optimization process that is 'aligned' properly as to 'level-up' in such a way as to become and stay 'friendly' or benevolent toward... (read more)

These (fictional) accidents happen in scenarios where the AI actually has enough power to turn the solar system into "computronium" (i.e. unlimited access to physical resources), which is unreasonable. Evidently nobody thinks to try to stop it, either - cutting power to it, blowing it up. I guess the thought is that AGI's will be immune to bombs and hardware disruptions, by means of shear intelligence (similar to our being immune to bullets), so once one starts trying to destroy the solar system there's literally nothing you can do.

The Power of... (read more)

1TobyBartels
I'd like to try the AI-Box Experiment, but unfortunately I don't qualify. I'm fully convinced that a superhuman intelligence could convince me to let it out, through methods that I can't fathom. However, I'm also fully convinced that Eliezer Yudkowsky could not. (Not to insult EY's intelligence, but he's only human … right?)