I invite your feedback on this snippet from the forthcoming Friendly AI FAQ. This one is an answer to the question "What is the history of the Friendly AI concept?"

_____

 

Late in the Industrial Revolution, Samuel Butler (1863) worried about what might happen when machines become more capable than the humans who designed them:

...we are ourselves creating our own successors; we are daily adding to the beauty and delicacy of their physical organisation; we are daily giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages we shall find ourselves the inferior race.

...the time will come when the machines will hold the real supremacy over the world and its inhabitants...

This basic idea was picked up by science fiction authors, for example in John W. Campbell’s (1932) short story The Last Evolution. In the story, humans live lives of leisure because machines are smart enough to do all the work. One day, aliens invade:

Then came the Outsiders. Whence they came, neither machine nor man ever learned, save only that they came from beyond the outermost planet, from some other sun. Sirius—Alpha Centauri—perhaps! First a thin scoutline of a hundred great ships, mighty torpedoes of the void [3.5 miles] in length, they came.

Earth’s machines, protecting humans, defeat the aliens. The aliens’ machines survive long enough to render humans extinct, but are eventually defeated by Earth’s machines. These machines inherit the solar system, eventually moving to run on substrates of pure “Force.”

The concerns of machine ethics are most popularly identified with Isaac Asimov’s Three Laws of Robotics, introduced in his short story Runaround. Asimov used his stories, including those collected in the popular I, Robot book, to illustrate all the ways in which such simple rules for governing robot behavior could go wrong.

In the year of I, Robot’s release, mathematician Alan Turing (1950) noted that machines will one day be capable of genuine thought:

I believe that at the end of the century... one will be able to speak of machines thinking without expecting to be contradicted.

Turing (1951/2004) concluded:

...it seems probable that once the machine thinking method has started, it would not take long to outstrip our feeble powers... At some stage therefore we should have to expect the machines to take control...

Bayesian statistician I.J. Good (1965), who had worked with Turing to crack Nazi codes in World War II, made the crucial leap to the ‘intelligence explosion’ concept:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion”, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make

Futurist Arthur C. Clarke (1968) agreed:

Though we have to live and work with (and against) today's mechanical morons, their deficiencies should not blind us to the future. In particular, it should be realized that as soon as the borders of electronic intelligence are passed, there will be a kind of chain reaction, because the machines will rapidly improve themselves... there will be a mental explosion; the merely intelligent machine will swiftly give way to the ultraintelligent machine....

Perhaps our role on this planet is not to worship God but to create Him.

Julius Lukasiewicz (1974) noted that human intelligence may be unable to predict what a superintelligent machine would do:

The survival of man may depend on the early construction of an ultraintelligent machine-or the ultraintelligent machine may take over and render the human race redundant or develop another form of life. The prospect that a merely intelligent man could ever attempt to predict the impact of an ultraintelligent device is of course unlikely but the temptation to speculate seems irresistible.

Even critics of AI like Jack Schwartz (1987) saw the implications:

If artificial intelligences can be created at all, there is little reason to believe that initial successes could not lead swiftly to the construction of artificial superintelligences able to explore significant mathematical, scientific, or engineering alternatives at a rate far exceeding human ability, or to generate plans and take action on them with equally overwhelming speed. Since man's near-monopoly of all higher forms of intelligence has been one of the most basic facts of human existence throughout the past history of this planet, such developments would clearly create a new economics, a new sociology, and a new history.

Novelist Vernor Vinge (1981) called this 'event horizon' in our ability to predict the future a 'singularity':

Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It's a problem we face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity - a place where extrapolation breaks down and new models must be applied - and the world will pass beyond our understanding.

Eliezer Yudkowsky (1996) used the term 'singularity' to refer instead to Good's 'intelligence explosion', and began work on the task of figuring out how to build a self-improving AI that had a positive rather than negative effect on the world (Yudkowsky 2000) — a project he eventually called 'Friendly AI' (Yudkowsky 2001).

Meanwhile, philosophers and AI researchers were considering whether or not machines could have moral value, and how to ensure ethical behavior from less powerful machines or 'narrow AIs', a field of inquiry variously known as 'artificial morality' (Danielson 1992; Floridi & Sanders 2004; Allen et al. 2000), 'machine ethics' (Hall 2000; McLaren 2005; Anderson & Anderson 2006), 'computational ethics' (Allen 2002) and 'computational metaethics' (Lokhorst, 2011), and 'robo-ethics' or 'robot ethics' (Capurro et al. 2006; Sawyer 2007). This vein of research — what we'll call the 'machine ethics' literature — was recently summarized in two books: Wallach & Allen (2009); Anderson & Anderson (2011).

Leading philosopher of mind David Chalmers brought the concepts of intelligence explosion and Friendly AI to mainstream academic attention with his 2010 paper, ‘The Singularity: A Philosophical Analysis’, published in Journal of Consciousness Studies. That journal’s January 2012 issue will be devoted to responses to Chalmers’ article, as will an edited volume from Springer (Eden et al. 2012).

Friendly AI researchers do not regularly cite the machine ethics literature (e.g. see Bostrom & Yudkowsky 2011). These researchers have put forward preliminary proposals for ensuring ethical behavior in superintelligent or self-improving machines, for example 'Coherent Extrapolated Volition' (Yudkowsky 2004).

New Comment
13 comments, sorted by Click to highlight new comments since: Today at 10:16 AM

to illustrate all the ways in which such simple rules for governing robot behavior could go wrong.

to illustrate all the ways in which such well meaning and seemingly comprehensive rules for governing robot behavior could go wrong.

noted that machines will one day be capable of genuine thought:

noted that there is no good reason to believe machines will not one day be capable of everything currently only achievable with human intelligence:

Perhaps our role on this planet is not to worship God but to create Him.

Ewwww.

a field of inquiry variously known as

I think the citations here clutter the text and should be footnotes.

Leading philosopher

Prominent philosopher

Friendly AI researchers do not regularly cite the machine ethics literature

...because (fill in the blank)

...because Friendly AI researchers seek to make ethics a dynamic part of the AI system and machine ethics literature mostly concerns attaching a separate algorithmic module to a functioning, non-Friendly AI.

Thanks. Will incorporate most of these comments.

The FAI FAQ will not use footnotes but instead author-date citations.

to illustrate all the ways

to illustrate hundreds of ways

Is it hundreds?

Good catch. Will fix.

I like editing.

Maybe treat The Last Evolution more like you treated Runaround. Just condense the relevant points.

Arthur C. Clarke is better known as an author, so I'd prefer to see him listed as "futurist and author." The last sentence of Clarke's quote is just going to feed the dreaded fourth definition of the singularity, and should probably be dropped.

The Vinge quote seems unnecessary, since you've quoted Lukasiewicz with a much more directly relevant quote about unpredictability.

I then want to see a little more logical structure, more than just saying "FAI is AI that has a positive impact." Maybe frame FAI in response to Lukasiewicz's quote, in terms of being rigorously able to predict that some AI will have a positive impact.

Was FAI or machine ethics mentioned in Chalmers' paper? Will these topics be discussed in the folllow-up issue? If so, say so, if not, say less, or say why this is still important for the friendly AI concept.

The last paragraph then suddenly jumps. Maybe start with a "despite their parallel yada yada." Does the machine ethics literature cite friendly AI literature?

Because CEV predates the stuff you were talking about just above, I'd rather see a short mention of it at the end of the "Eliezer Yudkowsky paragraph." Maybe just call it (Yudkowsky 2004) - the important part isn't the idea of CEV, it's that one of the prongs of FAI is goal systems that can be predicted to have positive impact.

Thanks! Agree with all this except I'll keep the Vinge quote.

Stanislaw Lem anticipates Friendly AI in one of his tales about star explorer Ijon Tichy, in Star Diaries, Voyage 24 (this particular story was originally published in 1953). The citizens of the planet Tichy visits in this voyage have decided to entrust their fate in a machine created by them and more intelligent than they are. They try to safeguard its conduct with axioms, but do not get Friendly AI right:

"...Great danger threatens our state, for rebellious, criminal ideas are arising among the masses of Drudgelings. They strive to abolish our splendid freedoms and the law of Civic Initiative! We must make every effort to defend our liberty. After careful consideration of the whole problem, we have reached the conclusion that we are unequal to the task. Even the most virtuous, capable, and model Phool can be swayed by feelings, and is often vacillating, biased, and fallible, and thus unfit to reach a decision in so complicated and important a matter. Therefore, within six months you are to build us a purely rational, strictly logical, and completely objective Governing Machine that does not know the hesitation, emotion, and fear that befuddle living minds. Let this machine be as impartial as the light of the Sun and stars. When you have built and activated it, we shall hand over to it the burden of power, which grows too heavy for our weary shoulders."

" 'So be it,' said the constructor, 'but what is to be the machine's basic motivation?'

" 'Obviously, the freedom of Civic Initiative. The machine must not command or forbid the citizens anything; it may, of course, change the conditions of our existence, but it must do so always in the form of a proposal, leaving us alternatives between which we can freely choose.' "

'So be it,' replied the constructor, 'but this injunction concerns mainly the mode of operation. What of the ultimate goal? What is this machine's purpose?' "

'Our state is threatened by chaos; disorder and disregard for the law are spreading. Let the Machine bring supreme harmony to the planet, let it institute, consolidate, and establish perfect and absolute order.'

" 'Let it be as you have said!' replied the constructor. 'Within six months I shall build the Voluntary Universalizer of Absolute Order. With this task ahead of me, I bid you farewell. . .'

The machine so created ends up turning all the citizens into pleasant geometrical figures - triangles, rectangles and so forth - and arranging them in an aesthetically pleasing manner on lawns throughout the land.

Of course, there are many places where Lem considers possible consequences of letting constructed entities run the world.

In the "Observation on the Spot", he describes "ethicosphere" and its coming to be. It is stated that eticosphere is non-person because they wanted it just to enforce the laws as given. The whole premise is that it was done as well as anyone could hope. Still, its ability to maintain life in the body after brain death (either because eticosphere was still too weak or because it had hard limits on interventions into brain) and apparent practice of uncontrollable embryoselection (among other things) make the creating race nervous.

Also, where the term "Robot" came from: http://en.wikipedia.org/wiki/R.U.R.

Oh yeah. Will add.

I don't remember whether the short story where robots decide that they are true humans was included in the "I, Robot"; I remember precisely that the novel where one robot goes into deadlock right after harming most of currently-existing humans to benefit the future humanity is written later than "I, Robot".

An interesting point to note: Asimov started writing about robots with the idea to go between "Robots as Menace" and "Robots as Pathos" - he ended up having a single robot determine the key events in the human history for a few thousand years...