Reply toTwo Visions Of Heritage

Though it really goes tremendously against my grain - it feels like sticking my neck out over a cliff (or something) - I guess I have no choice here but to try and make a list of just my positions, without justifying them.  We can only talk justification, I guess, after we get straight what my positions are.  I will also leave off many disclaimers to present the points compactly enough to be remembered.

• A well-designed mind should be much more efficient than a human, capable of doing more with less sensory data and fewer computing operations.  It is not infinitely efficient and does not use zero data.  But it does use little enough that local pipelines such as a small pool of programmer-teachers and, later, a huge pool of e-data, are sufficient.

• An AI that reaches a certain point in its own development becomes able to (sustainably, strongly) improve itself.  At this point, recursive cascades slam over many internal growth curves to near the limits of their current hardware, and the AI undergoes a vast increase in capability.  This point is at, or probably considerably before, a minimally transhuman mind capable of writing its own AI-theory textbooks - an upper bound beyond which it could swallow and improve its entire design chain.

• It is likely that this capability increase or "FOOM" has an intrinsic maximum velocity that a human would regard as "fast" if it happens at all.  A human week is ~1e15 serial operations for a population of 2GHz cores, and a century is ~1e19 serial operations; this whole range is a narrow window.  However, the core argument does not require one-week speed and a FOOM that takes two years (~1e17 serial ops) will still carry the weight of the argument.

The default case of FOOM is an unFriendly AI, built by researchers with shallow insights.  This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever).

The desired case of FOOM is a Friendly AI, built using deep insight, so that the AI never makes any changes to itself that potentially change its internal values; all such changes are guaranteed using strong techniques that allow for a billion sequential self-modifications without losing the guarantee.  The guarantee is written over the AI's internal search criterion for actions, rather than external consequences.

• The good guys do not write an AI which values a bag of things that the programmers think are good ideas, like libertarianism or socialism or making people happy or whatever.  There were multiple Overcoming Bias sequences about this one point, like the Fake Utility Function sequence and the sequence on metaethics.  It is dealt with at length in the document Coherent *Extrapolated* Volition.  It is the first thing, the last thing, and the middle thing that I say about Friendly AI.  I have said it over and over.  I truly do not understand how anyone can pay any attention to anything I have said on this subject, and come away with the impression that I think programmers are supposed to directly impress their non-meta personal philosophies onto a Friendly AI.

The good guys do not directly impress their personal values onto a Friendly AI.

• Actually setting up a Friendly AI's values is an extremely meta operation, less "make the AI want to make people happy" and more like "superpose the possible reflective equilibria of the whole human species, and output new code that overwrites the current AI and has the most coherent support within that superposition".  This actually seems to be something of a Pons Asinorum in FAI - the ability to understand and endorse metaethical concepts that do not directly sound like amazing wonderful happy ideas.  Describing this as declaring total war on the rest of humanity, does not seem fair (or accurate).

I myself am strongly individualisticThe most painful memories in my life have been when other people thought they knew better than me, and tried to do things on my behalf.  It is also a known principle of hedonic psychology that people are happier when they're steering their own lives and doing their own interesting work.  When I try myself to visualize what a beneficial superintelligence ought to do, it consists of setting up a world that works by better rules, and then fading into the background, silent as the laws of Nature once were; and finally folding up and vanishing when it is no longer needed.  But this is only the thought of my mind that is merely human, and I am barred from programming any such consideration directly into a Friendly AI, for the reasons given above.

• Nonetheless, it does seem to me that this particular scenario could not be justly described as "a God to rule over us all", unless the current fact that humans age and die is "a malevolent God to rule us all".  So either Robin has a very different idea about what human reflective equilibrium values are likely to look like; or Robin believes that the Friendly AI project is bound to fail in such way as to create a paternalistic God; or - and this seems more likely to me - Robin didn't read all the way through all the blog posts in which I tried to explain all the ways that this is not how Friendly AI works.

Friendly AI is technically difficult and requires an extra-ordinary effort on multiple levels.  English sentences like "make people happy" cannot describe the values of a Friendly AI.  Testing is not sufficient to guarantee that values have been successfully transmitted.

• White-hat AI researchers are distinguished by the degree to which they understand that a single misstep could be fatal, and can discriminate strong and weak assurances.  Good intentions are not only common, they're cheap.  The story isn't about good versus evil, it's about people trying to do the impossible versus others who... aren't.

• Intelligence is about being able to learn lots of things, not about knowing lots of things.  Intelligence is especially not about tape-recording lots of parsed English sentences a la Cyc.  Old AI work was poorly focused due to inability to introspectively see the first and higher derivatives of knowledge; human beings have an easier time reciting sentences than reciting their ability to learn.

Intelligence is mostly about architecture, or "knowledge" along the lines of knowing to look for causal structure (Bayes-net type stuff) in the environment; this kind of knowledge will usually be expressed procedurally as well as declaratively.  Architecture is mostly about deep insights.  This point has not yet been addressed (much) on Overcoming Bias, but Bayes nets can be considered as an archetypal example of "architecture" and "deep insight".  Also, ask yourself how lawful intelligence seemed to you before you started reading this blog, how lawful it seems to you now, then extrapolate outward from that.

New Comment
103 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I understand there are various levels on which one can express one's loves. One can love Suzy, or kind pretty funny women, or the woman selected by a panel of judges, or the the one selected by a judging process designed by a certain AI strategy, etc. But even very meta loves are loves. You want an AI that loves the choices made by a certain meta process that considers the wants of many, and that may well be a superior love. But it is still a love, your love, and the love you want to give the AI. You might think the world should be grateful to be plac... (read more)

"I am sure if I was running an FAI project that was excessively well funded, it would be worth buying EY to put in a glass case in the break room."

"IN CASE OF UNFRIENDLY AI, IT IS TOO LATE TO BREAK GLASS"

[-]Aron40

And I believe that if two very smart people manage to agree on where to go for lunch they have accomplished a lot for one day.

4VAuroch
There is a pretty good method for this specific thing; where I saw it mentioned, it was called the Restaurant Veto Game. It goes like this: Take a group of people, and have any one of them suggest a lunch location. Anyone else may veto this, if they can propose a different lunch location, not yet mentioned. A location which goes unvetoed is a good-enough compromise, if the players are reasonably rational and understand the strategy of the game.

"I am sure if I was running an FAI project that was excessively well funded, it would be worth buying EY to put in a glass case in the break room."

To clear up any confusion about the meaning of this statement, I do agree with pretty much everything here, and I do agree that FAI is critically important.

That doesn't change the fact that I think EY isn't being very useful ATM.

I'm just trying to get the problem you're presenting. Is it that in the event of a foom, a self-improving AI always presents a threat of having its values drift far enough away from humanity's that it will endanger the human race? And your goal is to create the set of values that allow for both self-improvement and friendliness? And to do this, you must not only create the AI architecture but influence the greater system of AI creation as well? I'm not involved in AI research in any capacity, I just want to see if I understand the fundamentals of what you're discussing.

Robin, using the word "love" sounds to me distinctly like something intended to evoke object-level valuation. "Love" is an archetype of direct valuation, not an archetype of metaethics.

And I'm not so much of a mutant that, rather than liking cookies, I like everyone having their reflective equilibria implemented. Taking that step is the substance of my attempt to be fair. In the same way that someone voluntarily splitting up a pie into three shares, is not on the same moral level as someone who seizes the whole pie for themselves - even if, by volunteering to do the fair thing rather than some other thing, they have shown themselves to value fairness.

My take on this was given in The Bedrock of Fairness.

But you might as well say "George Washington gave in to his desire to be a tyrant; he was just a tyrant who wanted democracy." Or "Martin Luther King declared total war on the rest of the US, since what he wanted was a nonviolent resolution."

Similarly with "I choose not to control you" being a form of controlling.

AGI Researcher: "... I do agree that FAI is critically important." "... EY isn't being very useful ATM."

Isn't this a contradiction, given that EY is one of the few people who publicly promote the idea of unfriendly AIs being fatal?

"Isn't this a contradiction, given that EY is one of the few people who publicly promote the idea of unfriendly AIs being fatal?"

That poster is taking bits of an IM conversation out of context and then paraphrasing them. Sadly any expectation of logical consistency has to be considered unwarranted optimism.

Isn't this a contradiction, given that EY is one of the few people who publicly promote the idea of unfriendly AIs being fatal?

All that stuff was the party line back in 2004.

There has been no /visible/ progress since then.

Slightly off the main topic but nearer to Robin's response:

Eliezer, how do we know that human good-ness scales? How do we know that, even if corectly implemented, applying it to a near-infinitely capable entity won't yield something equally monstrous as a paperclipper? Perhaps our sense of good-ness is meaningful only at or near our current level of capability?

There is nothing oxymoronic about calling democracy "the tyranny of the majority". And George Washington himself was decisive in both the violent war of secession called a "revolution" that created a new Confederate government and the unlawful replacement of the Articles of Confederation with the Constitution, after which he personally crushed the Whiskey Rebellion of farmers resisting the national debt payments saddled upon them by this new government. Even MLK has been characterized as implicitly threatening more riots if his demands ... (read more)

AGI Researcher: "There has been no /visible/ progress since [2004]."

What would you consider /visible/ progress? Running code?

Also, how about this: "Overcoming Bias presently gets over a quarter-million monthly pageviews"?

In a foom that took two years, if the AI was visible after one year, that might give the world a year to destroy it.

[-]Aron00

"In a foom that took two years.."

The people of the future will be in a considerably better position than you to evaluate their immediate future. More importantly, they are in a position to modify their future based on that knowledge. This anticipatory reaction is what makes both of your opinions exceedingly tenuous. Everyone else who embarks on pinning down the future at least has the sense to sell books.

In the light of this, the goal should be to use each other's complementary talents to find the hardest rock solid platform not to sell the other a castle made of sand.

Robin, we're still talking about a local foom. Keeping security for two years may be difficult but is hardly unheard-of.

8Perplexed
And what do you do when an insider says, "If you don't change the CEV programming to include X, then I am going public!" How do you handle that? How many people is it that you expect to remain quiet for two years?
-1ata
I suppose the only people who will get to the point of being "insiders" will be a subset of the people who are trustworthy and sane and smart and non-evil enough not to try something like that.
-1Perplexed
Ah, so no insider has ever walked off in a huff? No insider has ever said he would refuse to participate further if something he felt strongly about wasn't done? Look at A3 here. The SIAI must use some pretty remarkable personality tests to choose their personnel.
2ata
Are we using the same definition of "insider"? I was talking about people who are inside the FAI project and have privileged knowledge of its status and possibly access to the detailed theory and its source code, etc. I don't get the relevance of your links.
0Perplexed
My links mentioned three persons. Obviously at this point, Robin and Roko are not going to become insiders in an FAI construction project. If you could assure me that the third linked person will not be an insider either, it would relieve a lot of my worries. The relevance of my links was to point out that when intelligent people with strong opinions get involved together in important projects with the future of mankind at stake, keeping everyone happy and focused on the goal may be difficult. Especially since the goal has not yet been spelled out and no one seems to want to work on clarifying the goal since it is apparently so damned disruptive to even talk about it. Documents dated 2004 and labeled "already obsolete when written" for Gods sake!
6timtyler
For some reasonably-successful corporate secrecy, perhaps look to Apple. They use NDAs, need-to-know principles, and other techniques - and they are usually fairly successful at keeping their secrets. Some of the apparent leaks are probably PR exercises. Or, show me Google's source code - or the source code of any reasonable-size hedge fund. Secrecy seems fairly managable, in practice.
0Baughn
Google leaks like a sieve, actually, but that should be because of the sheer number of employees. It's true that there have been no source-code leaks (to my knowledge), but that could just as likely be because of the immense expected consequences of getting caught at leaking any, and you would probably get caught.
-1Decius
I think that a programmer who cared enough about CEV to be a secret-keeper would also care enough about getting CEV right to kill in order to prevent it from being done wrong. The public need not be involved at all.
0Baughn
Agreed, in principle, but I'm not sure that such people would make very good teammates. (Implying that AGI is more likely to be developed by people who don't care that much.)
1Decius
Is a good teammate one who has the social skills to make everybody happy when they are doing something they don't want to, or someone who thinks that the team's task is so important that they will do anything to get it done? Are major breakthroughs which require a lot of work more likely to be done by people who don't care, or by people that do?
0Baughn
That's not my point, which is simply this: A good teammate is probably not one who's willing to kill you if you make the wrong move, and who -- being human -- may misinterpret your actions.
0Decius
If there is no move you could make which would result in your teammate trying to kill you, then you have a different problem.
3David Althaus
Do you really think the public would be interested in the opinions of a programmer who claims that some guys in a basement are building a superintelligent machine? Most would regard him as a crack-pot, just like most people think that Eliezer and his ideas are crazy. Perhaps in 20-30 years this will change, and the problem of FAI will be recognized as tremendously important by political leaders and the general public, but I'm skeptical. ETA: I meant of course, that most people, if they knew Eliezer would think he is crazy.
2wedrifid
He hasn't reached that level yet. Most people just don't know or care wtf Eliezer is! ;)
An AI that reaches a certain point in its own development becomes able to improve itself. At this point, recursive cascades slam over many internal growth curves to near the limits of their current hardware, and the AI undergoes a vast increase in capability.

This seems like the first problem I detected. An intelligence being able to improve itself does not necessarily lead to a recursive cascade of self-improvement - since it may only be able to improve some parts of itself - and it's quite possible that after it has done those improvements, it can't do ... (read more)

2Houshalter
In order to learn how to optimize FOR loops it would have to be pretty intelligent and have general learning ability. So it wouldn't just stop after learning that, it would go on to learn more things at increased speed. Learning the first optimization would let it learn more optimizations even faster than it otherwise would have. The second optimization it makes helps it learn the third even faster and so on. It's not clear to me how fast this process would be. Just because it learns the next optimization even faster than it otherwise would have taken, doesn't mean it wouldn't have taken a long time to begin with. It could take years for it to improve to super-human abilities, or it could take days. It depends on stuff like how long it takes the average optimization it learns to pay back the time it took to research it. As well as the distribution of optimizations; maybe after learning the first few they get progressively more difficult to discover and give less and less value in return. It seems to my intuition that this process would be very fast and get very far before hitting limits, though I can't prove that. But I would point to other exponential processes to compare it to like compound interest.

Ironic, such passion directed toward bringing about a desirable singularity, rooted in an impenetrable singularity of faith in X. X yet to be defined, but believed to be [meaningful|definable|implementable] independent of future context.

It would be nice to see an essay attempting to explain an information or systems-theoretic basis supporting such an apparent contradiction (definition independent of context.)

Or, if the one is arguing for a (meta)invariant under a stable future context, an essay on the extended implications of such stability, if the one wou... (read more)

In a foom that took two years, if the AI was visible after one year, that might give the world a year to destroy it.

But its clearly the best search engine available. And here I am making an argument for peace via economics!

If it's doing anything visible, its probably doing something at least some people want.

Regarding the 2004 comment, AGI Researcher probably was referring to the Coherent Extrapolated Volition document which was marked by Eliezer as slightly obsolete in 2004, and not a word since about any progress in the theory of Friendliness.

Robin, if you grant that a "hard takeoff" is possible, that leads to the conclusion that it will eventually be likely (humans being curious and inventive creatures). This AI would "rule the world" in the sense of having the power to do what it wants. Now, suppose you get to pick what it wants (and ... (read more)

Oh, and Friendliness theory (to the extent it can be separated from specific AI architecture details) is like the doomsday device in Dr. Strangelove: it doesn't do any good if you keep it secret! [in this case, unless Eliezer is supremely confident of programming AI himself first]

@Tim Re: FOR loops - I made that exact point explicitly when introducing the concept of "recursion" via talking about self-optimizing compilers.

Talk about no progress in the conversation. I begin to think that this whole theory is simply too large to be communicated to casual students. Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own. This pattern would explain a lot of Phil Goetz too.

[-]luzr60

"FOOM that takes two years"

In addition to comments by Robin and Aron, I would also pointed out the possibility that longer the FOOM takes, larger the chance it is not local, regardless of security - somewhere else, there might be another FOOMing AI.

Now as I understand, some consider this situation even more dangerous, but it as well might create "take over" defence.

Another comment to FOOM scenario and this is sort of addition to Tim's post:

"As machines get smarter, they will gradually become able to improve more and more of themsel... (read more)

[-]luzr00

Eliezer:

"Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own."

Why do you think it is crushing objection? I believe Tim just repeats his favorite theme (which, in fact, I tend to agree with) where machine augmented humans build better machines. If you can use automated refactoring to improve the way compiler works (and today, you often can), that is in fact pretty cool augmentation of human capabilities. It is recursive ... (read more)

Robin says "You might think the world should be grateful to be placed under the control of such a superior love, but many of them will not see it that way; they will see your attempt to create an AI to take over the world as an act of war against them."

Robin, do you see that CEV was created (AFAICT) to address that very possibility? That too many, feeling this too strongly, means the AI self detructs or somesuch.

I like that someone challenged you to create your own unoffensive FAI/CEV, I hope you'll respond to that. Perhaps you believe that there simply isn't any possible fully global wish, however subtle or benign, that wouldn't also be tantamount to a declaration of war...?

Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own.

It does not seem very likely that I am copying you - when my essay on this subject dates from February 3rd, while yours apparently dates from November 25th.

So what exactly is the counter-argument you were attempting to make?

That self-optimising compilers lack "insight" - and "insight" is some kind of boolean substance that you either have or you lack?

In my ... (read more)

0Kenny
But the machines themselves are not writing the code for any of these millions of tiny steps. If they were, and if they were able to do so faster than humans, their self-improvement would be different than what you're describing.

" I can see arguing with the feasibility of hard takeoff (I don't buy it myself), but if you accept that step, Eliezer's intentions seem correct."

Bambi,

Robin has already said just that. I think Eliezer is right that this is a large discussion, and when many of the commenters haven't carefully followed it, comments bringing up points that have already been explicitly addressed will take up a larger and larger share of the comment pool.

Tim, your page doesn't say anything about FOR loops or self-optimizing compilers not being able to go a second round, which is the part you got from me and then thought you had invented.

[-]Roko00

"comments bringing up points that have already been explicitly addressed will take up a larger and larger share of the comment pool."

how about using something like debatepedia?

http://wiki.idebate.org/

There are some types of knowledge that seem hard to come by (especially for singletons). The type of knowledge is knowing what destroys you. As all knowledge is just an imperfect map, there are some things a priori that you need to know to avoid. The archetypal example is in-built fear of snakes in humans/primates. If we hadn't had this while it was important we would have experimented with snakes the same way we experiment with stones/twigs etc and generally gotten ourselves killed. In a social system you can see what destroys other things like you, but t... (read more)

[-]Aron00

It is true that the topic is too large for casual followers (such as myself). So rather than aiming at refining any of the points personally, I wonder in what ways Robin has convinced Eli, and vice-versa. Because certainly, if this were a productive debate, they would be able to describe how they are coming to consensus. And from my perspective there are distinct signals that the anticipation of a successful debate declines as posts become acknowledged for their quality as satire.

Will, your example, good or bad, is universal over singletons, nonsingletons, any way of doing things anywhere.

[-]Venu20

The default case of FOOM is an unFriendly AI Before this, we also have: "The default case of an AI is to not FOOM at all, even if it's self-modifying (like a self-optimizing compiler)." Why not anti-predict that no AIs will FOOM at all?

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). Given the tiny minority of AIs that will FOOM at all, what is the probability that an AI which has been designed for a purpose other than FOOMing, will instead FOOM?

Huh? I never mentioned self-optimizing compilers, and you never mentioned FOR loops.

I usually view this particular issue in terms of refactoring - not compilation - since refactoring is more obviously a continuous iterative process operating on an evolving codebase: whereas you can't compile a compiled version of a program very many times.

Anyway, this just seems like an evasion of the point - and a digression into trivia.

If you have any kind of case to make that machines will suddenly develop the ability to reprogram and improve themselves all-at-once - w... (read more)

[-]luzr20

Eliezer:

"Tim, your page doesn't say anything about FOR loops or self-optimizing compilers not being able to go a second round, which is the part you got from me and then thought you had invented."

Well, it certainly does:

"Today, machines already do a lot of programming. They perform refactoring tasks which would once have been delegated to junior programmers. They compile high-level languages into machine code, and generate programs from task specifications. They also also automatically detect programming errors, and automatically test existi... (read more)

Will, your example, good or bad, is universal over singletons, nonsingletons, any way of doing things anywhere.

My point was not that non-singletons can see it coming. But if one non-singletons trys self-modification in a certain way and it doesn't work out then other non-singletons can learn from the mistake (or in worst the evolutionary case the descendents of people curious in a certain way would be out competed by those that instinctively didn't try the dangerous activity). Less so with the physics experiments, depending on dispersal of non-singletons, range of the physical destruction.

Carl, Robin's response to this post was a critical comment about the proposed content of Eliezer's AI's motivational system. I assumed he had a reason for making the comment, my bad.

Venu: Given the tiny minority of AIs that will FOOM at all, what is the probability that an AI which has been designed for a purpose other than FOOMing, will instead FOOM?

It seems to me like a pretty small probability that an AI not designed to self-improve will be the first AI that goes FOOM, when there are already many parties known to me who would like to deliberately cause such an event.

Why not anti-predict that no AIs will FOOM at all?

A reasonable question from the standpoint of antiprediction; here you would have to refer back to the articles on... (read more)

0adamisom
I know this is four years old, but this seems like a damn good time to "shut up and multiply" (thanks for that thoughtmeme by the way).
For example, machine augmented human (think weak AI + direct neural interface and all that cyborging whistles + mind drugs) might be quite likely to follow the FOOM

It seems unlikely to me. For one thing, see my Against Cyborgs video/essay. For another, see my Intelligence Augmentation video/essay. The moral of the latter one in this context is that Intelligence Augmentation is probably best thought of as machine intelligence's close cousin and conspirator - not really some kind of alternative, something that will happen later on, or a means to keep hu... (read more)

[-]luzr00

Eliezer:

"Will, your example, good or bad, is universal over singletons, nonsingletons, any way of doing things anywhere."

I guess there is significant difference - for singleton, each mistake can be fatal (and not only for it).

I believe that this is the real part I dislike about the idea, except the part where singleton either cannot evolve or cannot stay singleton (because of speed of light vs locality issue).

[-]luzr10

Tim:

Well, as off-topic recourse, I see only cited some engineering problems in your "Against Cyborgs" essay as contraargument. Anyway, let me to say that in my book:

"miniaturizing and refining cell phones, video displays, and other devices that feed our senses. A global-positioning-system brain implant to guide you to your destination would seem seductive only if you could not buy a miniature ear speaker to whisper you directions. Not only could you stow away this and other such gear when you wanted a break, you could upgrade without brain ... (read more)

the issue is whether something happens efficiently enough to be local or fast enough to accumulate advantage between the leading Friendly AI and the leading unFriendly AI

Uh, that's a totally different issue from the one I was discussing.

To recap: I was pointing out that machines have been writing code and improving themselves for decades - that refactoring and lint-like programs applying their own improvements to their own codebases has a long history in the community - dating back to the early days of Smalltalk. That progress in computer ability at self... (read more)

Eliezer: "and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever)."

Why do the values freeze? Because there is no more competition? And if that's the problem, why not try to plan a transition from pre-AI to an ecology of competing AIs that will not converge to a singleton? Or spell out the problem clearly enough that we can figure whether one can achieve a singleton that doesn't have that property?

(Not that Eliezer hasn't heard me say this before. I made a bit of a speech about AI ecology at the... (read more)

Eliezer: "Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own. This pattern would explain a lot of Phil Goetz too."

No; the dynamic you're thinking of is that I raise objections to things that you have already analyzed, because I think your analyis was unconvincing. Eg., the recent Attila the Hun / Al Qaeda example. The fact that you have written about something doesn't mean you've dealt with it satsifactorily.

Phil, in suggesting to replace an unFriendly AI that converges on a bad utility by a collection of AIs that never converge, you are effectively trying to improve the situation by injecting randomness in the system.

Your perception of lawful extrapolation of values as "stasis" seems to stem from intuitions about free will. If you look at the worldline as a 4D crystal, everything is set in stone, according to laws of physics. The future is determined by the content of the world, in particular by actors embedded in it. If you allow AI to fiddle with ... (read more)

A two year FOOM doesn't have to be obvious for one year or even half a year. If the growth rate is up-curving, it's going to spend most of its ascent looking a bit ELIZA, and then it's briefly a cute news-darling C3PO, and then it goes all ghost-in-the shell - game over. Even if there is a window of revealed vulnerability, will you without hindsight recognize it? Can you gather the force and political will in time? How would you block the inevitable morally outraged (or furtively amoral) attempts to rebuild?

Bruce Willis is not the answer.

[-]Grant-20

The problems that I see with friendly AGI are:

1) Its not well understood outside of AI researchers, so the scientists who create it will build what they think is the most friendly AI possible. I understand what Eliezer is saying about not using his personal values, so instead he uses his personal interpretation of something else. Eliezer says that making a world which works by "better rules" then fading away would not be a "god to rule us all", but who's decided on those rules (or the processes by which the AI decides on those rules)? U... (read more)

It would have been better of me to reference Eliezer's Al Qaeda argument, and explain why I find it unconvincing.

Vladimir:

Phil, in suggesting to replace an unFriendly AI that converges on a bad utility by a collection of AIs that never converge, you are effectively trying to improve the situation by injecting randomness in the system.
You believe evolution works, right?

You can replace randomness only once you understand the search space. Eliezer wants to replace the evolution of values, without understanding what it is that that evolution is optimizing. H... (read more)

Eliezer, maybe you should be writing fiction. You say you want to inspire the next generation of researchers and you're spending a lot of time writing these essays and correcting misconceptions of people who never read or didn't understand earlier essays (fiction could tie the different parts of your argument together better than this essay style. Why not try coming up with several possible scenarios along with your thinking embedded in them. It may be worth remembering that far more of the engineers working on Apollo spoke of being inspired by Robert Heinlein, than by Goddard and von Braun and the rocket pioneers.

"If this is so, isn't it almost probability 1 that CEV will be abandoned at some point?"

Phil, if a CEV makes choices for reasons why would you expect it to have a significant chance of reversing that decision without any new evidence or reasons, and for this chance to be independent across periods? I can be free to cut off my hand with an axe, even if the chance that I'll do it is very low, since I have reasons not to.

Phil, I don't see the point in criticizing a flawed implementation of CEV. If we don't know how to implement it properly, if we don't understand how it's supposed to work in much more technical detail than the CEV proposal includes, it shouldn't be implemented at all, no more than a garden-variety unFriendly AI. If you can point out a genuine flaw in a specific scenario of FAI's operation, right implementation of CEV shouldn't lead to that. To answer your question, yes, CEV could decide to disappear completely, construct an unintelligent artifact, or produ... (read more)

Phil: Yes. CEV completely replaces and overwrites itself, by design. Before this point it does not interact with the external world to change it in a significant sense (it cannot avoid all change; e.g. its computer will add tiny vibrations to the Earth, as all computers do). It executes for a while then overwrites itself with a computer program (skipping every intermediate step here). By default, and if anything goes wrong, this program is "shutdown silently, wiping the AI system clean."

(When I say "CEV" I really mean a FAI which s... (read more)

[-]Fenty-10

I like the argument that true AGI should take massive resources to make, and people with massive resources are often unfriendly, even if they don't know it.

The desired case of FOOM is a Friendly AI, built using deep insight, so that the AI never makes any changes to itself that potentially change its internal values; all such changes are guaranteed using strong techniques that allow for a billion sequential self-modifications without losing the guarantee. The guarantee is written over the AI's internal search criterion for actions, rather than external c... (read more)

Fenty,

I give you Nick Bostrom:

"If a superintelligence starts out with a friendly top goal, however, then it can be relied on to stay friendly, or at least not to deliberately rid itself of its friendliness. This point is elementary. A “friend” who seeks to transform himself into somebody who wants to hurt you, is not your friend."

Fenty, I didn't mean to suggest that people with massive resources are unfriendly more than others, but more that people with power have little reason to respect those without power. Humans have a poor track record of coercive paternalizing regardless of stated motives (I believe both Bryan and Eliezer have posted about that quite a bit in the past). I just don't think the people with the capabilities to get the first AGI online would posses the impeccable level of friendliness needed, or anywhere near it.

If Eliezer is right about the potential of AGI, the... (read more)

Eliezer, how about turning the original post into a survey? It's already structured, so all that you (or someone with an hour of free time) have to do is:

1) Find a decent survey-creating site.
2) Enter all paragraphs the original post (maybe except #9) as questions.
3) Allow the results to be viewed publicly, without any registration.

The answer to each question would be a list of radio-buttons like this:
( ) Strongly agree
(·) Agree
( ) Don't know
( ) Disagree
( ) Strongly disagree

Does anybody know a survey site that allows all of the above?

Isn't CEV just a form of Artificial Mysterious Intelligence? Eliezer's conversation with the anonymous AIfolk seems to make perfect sense if we search and replace "neural network" with "CEV" and "intelligence" with "moral growth/value change".

How can the same person that objected to "Well, intelligence is much too difficult for us to understand, so we need to find some way to build AI without understanding how it works." by saying "Look, even if you could do that, you wouldn't be able to predict any ki... (read more)

0Luke_A_Somers
The difference is that an entity would be going out and understanding moral value change. The same cannot be said of neural networks and intelligence itself.

"If a superintelligence starts out with a friendly top goal, however, then it can be relied on to stay friendly, or at least not to deliberately rid itself of its friendliness. This point is elementary. A “friend” who seeks to transform himself into somebody who wants to hurt you, is not your friend."

Well, that depends on the wirehead problem - and it is certainly not elementary. The problem is with the whole idea that there may be something such as a "friendly top goal" in the first place.

The idea that a fully self-aware powerfu... (read more)

[-][anonymous]00

Thanks, seeing the claims all there together is useful.

The technical assumptions and reason all seem intuitive (given the last couple of years of background given here). The meta-ethic FAI singleton seems like the least evil goal I can imagine, given the circumstances.

A superintelligent FAI, with the reliably stable values that you mention, sounds like an impossible goal to achieve. Personally, I assign a significant probability to your failure, either by being too slow to prevent cataclysmic alternatives or by making a fatal mistake. Nevertheless, your effort is heroic. It is fortunate that many things seem impossible right up until the time someone does them.

I don't understand the skepticism (expressed in some comments) about the possibility of a superintelligence with a stable top goal. Consider that classic computational architecture, the expected-utility maximizer. Such an entity can be divided into a part which evaluates possible world-states for their utility (their "desirability"), according to some exact formula or criterion, and into a part which tries to solve the problem of maximizing utility by acting on the world. For the goal to change, one of two things has to happen: either the utility... (read more)

Nick Hay, a CEV might also need to gather more information, nondestructively and unobtrusively. So even before the first overwrite you need a fair amount of FAI content just so it knows what's valuable and shouldn't be changed in the process of looking at it, though before the first overwrite you can afford to be conservative about how little you do. But English sentences don't work, so "look but don't touch" is not a trivial criterion (it contains magical categories).

Wei Dai, I thought that I'd already put in some substantial work in cashing o... (read more)

I see a possible problem with FAI in general. The real world is deterministic on the most fundamental level, but not even a super-powerful computer could handle realistic problems at that level, so it uses stochastic, Bayesian, probabilistic, whatever you want to call them methods to model the apparent randomness at more tractable levels. Once it start uses these methods for other problems, what is to stop it from applying them to its goal-system (meta-ethical or whatever you want to call it)? Not in an attempt to become unFriendly, but to improve its g... (read more)

Eliezer, as far as I can tell, "reflective equilibrium" just means "the AI/simulated non-sentient being can't think of any more changes that it wants to make" so the real question is what counts as a change that it wants to make? Your answer seems to be whatever is decided by "a human library of non-introspectively-accessible circuits". Well the space of possible circuits is huge, and "non-introspectively-accessible" certainly doesn't narrow it down much. And (assuming that "a human library of circuits" = &... (read more)

Wireheading (in the form of drug addiction) is a real-world phenomenon - so presumably your position is that there's some way of engineering a superintelligence so it is not vulnerable to the same problem.

To adopt the opposing position for a moment, the argument goes that a sufficiently-intelligent agent with access to its internals would examine itself - conclude that external referents associated with its utility function were actually superfluous nonsense; that it had been living under a delusion about its true goals - and that it could better maximise... (read more)

Wei, when you're trying to create intelligence, you're not trying to get it human, you're trying to get it rational.

When it comes to morality - well, my morality doesn't talk about things being right in virtue of my brain thinking them, but it so happens that my morality is only physically written down in my brain and nowhere else in the physical universe. Likewise with all other humans.

So to get a powerful moral intelligence, you've got to create intelligence to start with using an implementation-independent understanding, and then direct that intelligen... (read more)

Carl: This point is elementary. A “friend” who seeks to transform himself into somebody who wants to hurt you, is not your friend."

The switch from "friendly" (having kindly interest and goodwill; not hostile) to a "friend" (one attached to another by affection or esteem) is problematic. To me it radically distorts the meaning of FAI and makes this pithy little sound-bite irrelevant. I don't think it helps Bostrom's position to overload the concept of friendship with the connotations of close friendship.

Exactly how much human bi... (read more)

Correction: I don't think it helps Bostrom's position to overload the concept of friendship friendly with the connotations of close friendship.

Eliezer, you write as if there is no alternative to this plan, as if your hand is forced. But that's exactly what some people believe about neural networks. What about first understanding human morality and moral growth, enough so that we (not an AI) can deduce and fully describe someone's morality (from his brain scan, or behavior, or words) and predict his potential moral growth in various circumstances, and maybe enough to correct any flaws that we see either in the moral content or in the growth process, and finally program the seed AI's morality and m... (read more)

I find the hypothesis that an AGI's values will remain frozen highly questionable. To be believable one would have to argue that the human ability to question values is due only or principally to nothing more than the inherent sloppiness of our evolution. However, I see no reason to suppose that an AGI would apply its intelligence to every aspect of its design except its goal structure. I see no reason to suppose that relatively puny and sloppy minds can do a level of questioning and self-doubt that a vastly superior intelligence never will or can.

I a... (read more)

Wei, the criterion "intelligent" compresses down to a very simple notions of effective implementation abstracted away from the choice of goal. After that, the only question is how to get "intelligence", which is something you can, in principle, learn by observation (if you start with learning ability). Flaws in a notion of "intelligence" can be self-corrected if not too great; you observe that what you're doing isn't working (for your goal criterion).

Morality does not compress; it's not something you can learn just by lookin... (read more)

Speaking of compressing down nicely, that is a nice and compressed description of humanism. Singularitarians, question humanism.

Question for Eliezer. If the human race goes extinct without leaving any legacy, then according to you, any nonhuman intelligent agent that might come into existence will be unable to learn about morality?

If your answer is that the nonhuman agent might be able to learn about morality if it is sentient then please define "sentient". What is it about a paperclip maximizer that makes it nonsenient? What is it about a human that makes it sentient?

Morality does not compress; it's not something you can learn just by looking at the (nonhuman) environment or by doing logic; if you want to get all the details correct, you have to look at human brains.

Why? Why can't you rewrite this as "complexity and morality"?

You may talk about the difference between mathematical and moral insights. Which is true, but then mathematical insights aren't sufficient for intelligence. Maths doesn't tell you whether a snake is poisonous and will kill you or not....

Terminal values don't compress. Instrumental values compress to terminal values.

Are you saying "snakes are often deadly poisonous to humans" is an instrumental value?

I'd agree that dying is bad therefore avoid deadly poisonous things. But I still don't see that snakes have little xml tags saying keep away, might be harmful.... I don't see that as a value of any sort.

However, I see no reason to suppose that an AGI would apply its intelligence to every aspect of its design except its goal structure.

I don't think that describes the probable outcome anybody's superintelligence construction plan.

(Eliezer, why do you keep using "intelligence" to mean "optimization" even after agreeing with me that intelligence includes other things that we don't yet understand?)

Morality does not compress

You can't mean that morality literally does not compress (i.e. is truly random). Obviously there are plenty of compressible regularities in human morality. So perhaps what you mean is that it's too hard or impossible to compress it into a small enough description that humans can understand. But, we also have no evidence that effective universal o... (read more)

Wei, I was agreeing with you that these were important questions - not necessarily agreeing with your thesis "there's more to intelligence than optimization". Once you start dealing in questions like those, using a word like "intelligence" implies that all the answers are to be found in a single characteristic and that this characteristic has something to do with the raw power of a mind. Whereas I would be more tempted to look at the utility function, or the structure of the prior - an AI that fails to see a question where we see one ... (read more)

Maybe we don't need to preserve all of the incompressible idiosyncrasies in human morality. Considering that individuals in the post-Singularity world will have many orders of magnitude more power than they do today, what really matter are the values that best scale with power. Anything that scales logarithmically for example will be lost in the noise compared to values that scale linearly. Even if we can't understand all of human morality, maybe we will be able to understand the most important parts.

Just throwing away parts of one's utility function seems... (read more)

The most painful memories in my life have been when other people thought they knew better than me, and tried to do things on my behalf.

Fascinating. You must either lead an extraordinarily pain-free existence, or not just be "individualistic" but very sensitive about your competence - which I think is odd for someone of such deep and wide-ranging intellectual competence.

Got no idea what you mean by either of those clauses. Why wouldn't the most painful times in your life be someone else's well-meant disaster, forced on you without your control? And then the second clause I don't understand at all.

5lukeprog
Oh, I gotcha. I thought you were saying that you are so individualistic that the subjective experience of having someone think they knew better than you and trying to do something on your behalf was badly painful, but now it sounds to me like you're saying the consequences of people trying to do things on your behalf with an inferior understanding of the situation are some of the most painful memories of your life, because those other persons really screwed things up. I assumed the first interpretation because the sentence I quoted is followed by one describing the hedonic consequences of doing things yourself.
0[anonymous]
...you must've led an existence extraordinarily free of other people screwing up your life.