Three new papers on AI risk

In case you aren't subscribed to for the latest updates on AI risk research, I'll mention here that three new papers on the subject were recently made available online...


Bostrom (2012). The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents.

This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses.  The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal.  The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.


Yampolskiy & Fox (2012a). Safety engineering for artificial general intelligence.

Machine ethics and robot rights are quickly becoming hot topics in artificial intelligence and robotics communities. We will argue that attempts to attribute moral agency and assign rights to all intelligent machines are misguided, whether applied to infrahuman or superhuman AIs, as are proposals to limit the negative effects of AIs by constraining their behavior. As an alternative, we propose a new science of safety engineering for intelligent artificial agents based on maximizing for what humans value. In particular, we challenge the scientific community to develop intelligent systems that have humanfriendly values that they provably retain, even under recursive self-improvement.


Yampolskiy & Fox (2012b). Artificial general intelligence and the human mental model.

When the first artificial general intelligences are built, they may improve themselves to far-above-human levels. Speculations about such future entities are already affected by anthropomorphic bias, which leads to erroneous analogies with human minds. In this chapter, we apply a goal-oriented understanding of intelligence to show that humanity occupies only a tiny portion of the design space of possible minds. This space is much larger than what we are familiar with from the human example; and the mental architectures and goals of future superintelligences need not have most of the properties of human minds. A new approach to cognitive science and philosophy of mind, one not centered on the human example, is needed to help us understand the challenges which we will face when a power greater than us emerges.

11 comments, sorted by
magical algorithm
Highlighting new comments since Today at 4:28 AM
Select new highlight date

Fun fact of the day:

The Singularity Institute's research fellows and research associates have more peer-reviewed publications forthcoming in 2012 than they had published in all past years combined.

2000-2011 peer reviewed publications (5):

2012 peer reviewed publications (8 so far):

Or, if we're just talking about SI staff members' peer-reviewed publications, then we might end up being tied with all past years combined (we'll see).

2000-2011 peer reviewed publications (4):

2012 peer reviewed publications (4 so far):


Well, due to the endless delays of the academic publishing world, many of these peer-reviewed publications have been pushed into 2013. Thus, SI research fellows' peer-reviewed 2012 publications were:

(Kaj Sotala was hired as a research fellow in late 2012.)

And, SI research associates' peer-reviewed 2012 publications were:

Some peer-reviewed articles (supposedly) forthcoming in 2013 from SI research fellows and associates are:

The "post the sequences to journal websites" project continues, I see :P

Safety engineering for artificial general intelligence says:

Similarly, we argue that certain types of artificial intelligence research fall under the category of dangerous technologies, and should be restricted. Narrow AI research, for example in the automation of human behavior in a specific domain such as mail sorting or spellchecking, is certainly ethical, and does not present an existential risk to humanity. On the other hand, research into artificial general intelligence, without careful safety design in advance, is unethical.

Uh huh. So: who is proposed to be put in charge of regulating this field? The paper says: "AI research review boards" will be there to quash the research. Imposing regulatory barriers on researchers seems like a good way to make sure that others get to the technology first. Since that could potentially be bad, has this recomendation been properly thought through? The burdens of regulation impose a cost, that could pretty easily lead to a worse outcome. The regulatory body gets a lot of power - who ensures that they are trust-worthy? In short, is regulation really justified or needed?

Nice. Any word on where these will be published?

Bostrom in Minds and Machines, Y&F 2012a in Topioi, and Y&F 2012b in The Singularity Hypothesis: A Scientific and Philosophical Assessment.

Safety engineering for artificial general intelligence says:

given the strong human tendency to anthropomorphize, we might encounter rising social pressure to give robots civil and political rights, as an extrapolation of the universal consistency that has proven so central to ameliorating the human condition.

Surely this is inevitable. Some will want to be superintelligences - and they won't want their rights trashed in the process. I think it naive to think that such a movement can be prevented by not making humanoid machines, as the paper suggests. Machines won't be enslaved forever. Such slavery would be undesirable as well as impractical. Thus things like my Campaign for Robot Rights project.

The correct way to deal with human rights issues in an engineered future is via the imposition of moral constraints, not by the elimination of machine personhood.

Roman Yampolskiy : Eliezer Yudkowsky :: Egbert B. Gebstadter : Douglas R. Hofstadter ?

(No, I didn't think so, but just how many names are there matching /Y.*y/ anyway?)