[SEQ RERUN] Dreams of Friendliness

MinibearRex

[SEQ RERUN] Dreams of Friendliness

1 min read20th Aug 20122 comments

6

Today's post, Dreams of Friendliness was originally published on 31 August 2008. A summary (taken from the LW wiki):

Why programming an AI that only answers questions is not a trivial problem, for many of the same reasons that programming an FAI isn't trivial.

Discuss the post here (rather than in the comments to the original post).

This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Qualitative Strategies of Friendliness, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.

Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.

New to LessWrong?

Getting Started

FAQ

Library

Mentioned in

6[SEQ RERUN] The True Prisoner's Dilemma

[SEQ RERUN] Dreams of Friendliness

20th Aug 2012

2shminux

0DanielLC

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:10 AM

[-]shminux12y20

This post, "Dreams of Friendliness" and the ensuing discussion (mostly with Shane Legg) in the comments cover most of what is being rehashed over and over, including the recent exchange with Holden Karnofsky. I bet that it takes EY a lot of willpower to avoid replying with "just read this link!" to most of the layman FAI proposals from the people of status.

[-]DanielLC12y00

What if you program the AI so that, once a question is asked, it can no longer take outside information into account to answer it and it assumes that its hardware will act how it's supposed to. It won't build an unfriendly oracle, because it can't ask it questions. It won't upgrade its own hardware, because that's only useful insomuch as it makes it act differently than it's currently supposed to. It won't even waste resources making sure it doesn't have a run-time error, because it's assuming that it won't. The only way it will waste resources is taking too long to answer, and you could stop that just by giving it a time limit.

It won't give a very good answer if you don't give it enough data, but you could just plug in more data and ask again. Just hope nobody asks it how to find the data necessary to get it to answer a question, because it will likely tell them to build an unsafe oracle to answer that question.

This seems to me to be part of a more general idea of making an AI that is not purely consequentialist. It can clearly be done. Humans aren't pure consequentialists. You have to make it sufficiently consequentialist to be intelligent, without being so consequentialist as to be dangerous. It could be a very thick line, but there may be no easy way to find out without building the AI and seeing if you crossed it.

Moderation Log