BOOK DRAFT: 'Ethics and Superintelligence' (part 1, revised)

by lukeprog 9y22nd Feb 201138 comments


As previously announced, I plan to post the first draft of the book, Ethics and Superintelligence, in tiny parts, to the Less Wrong discussion area. Your comments and constructive criticisms are much appreciated.

This is not a book for a mainstream audience. Its style is that of contemporary Anglophone philosophy. Compare to, for example, Chalmers' survey article on the singularity.

Bibliographic references are provided here.

This "part 1" section is probably the only part of which I will post revision to Less Wrong. Revisions of further parts of the book will probably not appear publicly until the book is published.

Revised part 1 below....




1. The technological singularity is coming soon.


Every year, computers surpass human abilities in new ways. A program written in 1956 was able to prove mathematical theorems, and found a more elegant proof for one of them than Russell and Whitehead had given in Principia Mathematica (MacKenzie 1995). By the late 1990s, “expert systems” had surpassed human ability in a wide range of tasks.[i] In 1997, IBM’s Deep Blue defeated the reigning World Chess Champion Garry Kasparov (Campbell et al. 2002). In 2011, IBM’s Watson beat the best human players at a much more complicated game: Jeopardy! (Someone, 2011). Recently, a robot scientist was programmed with our scientific knowledge about yeast, then posed its own hypotheses, tested them, and assessed the results. It answered a question about yeast that had baffled human scientists for 150 years (King 2011).

Many experts think that human-level general intelligence may be created within this century.[ii] This raises an important question. What will happen when an artificial intelligence (AI) surpasses human ability at designing artificial intelligences?

I.J. Good (1965) speculated that such an AI would be able to improve its own intelligence, leading to a positive feedback loop of improving intelligence – an “intelligence explosion.” Such a machine would rapidly become intelligent enough to take control of the internet, use robots to build itself new hardware, do science on a massive scale, invent new computing technology and energy sources, or achieve similar dominating goals. As such, it could be humanity’s last invention (Bostrom 2003).

Humans would be powerless to stop such a “superintelligence” (Bostrom 1998) from accomplishing its goals. Thus, if such a scenario is at all plausible, then it is critically important to program the goal system of this superintelligence such that it does not cause human extinction when it comes to power.

Success in that project could mean the difference between a utopian solar system of unprecedented harmony and happiness, and a solar system in which all available matter (including human flesh) has been converted into parts for a planet-sized computer built to solve difficult mathematical problems.[iii]

The technical challenges of designing the goal system of such a superintelligence are daunting.[iv] But even if we can solve those problems, the question of which goal system to give the superintelligence remains. It is at least partly a question of philosophy – a question of ethics.


In this chapter I argue that a single, powerful superintelligence - one variety of what Bostrom (2006) calls a “singleton" - is likely to arrive within the next 200 years unless a worldwide catastrophe drastically impedes scientific progress.

The singleton will produce very different future worlds depending on which normative theory is used to design its goal system. In chapter two, I survey many popular normative theories, and conclude that none of them offer an attractive basis for designing the motivational system of a machine superintelligence.

Chapter three reformulates and strengthens what is perhaps the most developed plan for the design of the singleton’s goal system ­– Eliezer Yudkowsky’s (2004) “Coherent Extrapolated Volition.” Chapter four considers some outstanding worries about this plan.

In chapter five I argue that we cannot decide how to design the singleton’s goal system without considering meta-ethics, because normative theory depends on meta-ethics. The next chapter argues that we should invest little effort in meta-ethical theories that do not fit well with our emerging reductionist picture of the world, just as we quickly abandon scientific theories that don’t fit the available scientific data. I also identify several meta-ethical positions that I think are good candidates for abandonment.

But the looming problem of the technological singularity requires us to have a positive theory, too. Chapter seven proposes some meta-ethical claims about which I think naturalists should come to agree. In the final chapter, I consider the implications of these meta-ethical claims for the design of the singleton’s motivational system.


[i] For a detailed history of achievements and milestone in artificial intelligence, see Nilsson (2009).

[ii] Bainbridge (2005), Baum et al. (2010), Chalmers (2010), Legg (2008), Vinge (1993), Nielsen (2011), Yudkowsky (2008).

[iii] This particular nightmare scenario is given in Yudkowsky (2001), who believes Marvin Minsky may have been the first to suggest it.

[iv] These technical challenges are discussed in the literature on artificial agents in general and Artificial General Intelligence (AGI) in particular. Russell and Norvig (2009) provide a good overview of the challenges involved in the design of artificial agents. Goertzel and Pennachin (2010) provide a collection of recent papers on the challenges of AGI. Yudkowsky (2010) proposes a new extension of causal decision theory to suit the needs of a self-modifying AI. Yudkowsky (2001) discusses other technical (and philosophical) problems related to designing the goal system of a superintelligence.