Supposing you have been recruited to be the main developer on an AI project. The previous developer died in a car crash and left behind an unfinished AI. It consists of:

A. A thoroughly documented scripting language specification that appears to be capable of representing any real-life program as a network diagram so long as you can provide the following:

 A.1. A node within the network whose value you want to maximize or minimize.

 A.2. Conversion modules that transform data about the real-world phenomena your network represents into a form that the program can read.

B. Source code from which a program can be compiled that will read scripts in the above language. The program outputs a set of values for each node that will optimize the output (you can optionally specify which nodes can and cannot be directly altered, and the granularity with which they can be altered).

It gives remarkably accurate answers for well-formulated questions. Where there is a theoretical limit to the accuracy of an answer to a particular type of question, its answer usually comes close to that limit, plus or minus some tiny rounding error.

 

Given that, what is the minimum set of additional features you believe would absolutely have to be implemented before this program can be enlisted to save the world and make everyone live happily forever? Try to be as specific as possible.

New Comment
23 comments, sorted by Click to highlight new comments since: Today at 3:24 AM

Can someone please let me know why this is the most down-voted I have ever been since de-lurking on this site? I'm not whining, I genuinely want to know what intellectual standards I'm not meeting or what social rules I'm violating by posting this.

My goal in posting this was to identify possible dangling units within the friendly AI concept.

Readers don't know what your post is about. Your comment explains "My goal ..." but that should be the start of the post, orienting the reader.

How does your hypothetical help identify possible dangling units? You've worked it out in your head. That should be the second part of post, working through the logic, here is my goal, here is the obstacle, here is how I get round it.

Also, Tool AI is a conclusion, possibly wrong one (still reading the Tool AI post), of a more general point I was trying to make, which I have also not found on this site:

What technical problems traditionally associated with AI do NOT need to be solved to achieve the effects of friendly AI.

It's allowed to ask it's programmers an arbitrary amount of stupid questions and just halt refusing to do anything if they are unavailable to answer them.

It's allowed to just Google things, ask about them on Stack Exchange, or enlist Mechanical Turk rather than coming up with it's own solutions to every problem.

it doesn't need to work in real time; it's OK if you have to run it for 5 minutes to answer a question a human could instantly as long as the quality is just as good.

It doesn't need to pass the aspects of the Turing test that have to do with mimicking the flaws of humans convincingly.

It can run on cloud servers the team can only afford to rent for just as long as it takes for it to FOOM. Specificaly once you have an AI that could FOOM on the computer you have currently in decades, you can rent cloud servers that do those same calculations in minutes.

I don't want to influence people with my opinion before they had a chance to express theirs.

I am starting to explain it, though, in the comments.

But here too, I'm interested in finding holes in my reasoning, not spreading an opinion that I don't yet have a sufficient reason to believe is right.

generally people have a lot more patience with people saying "i have an answer to this complicated question but I would rather you guess first" when they've built up a lot of credit first.

Why don't you read what's been said on this site and elsewhere about Holden Karnofsky's Tool AI, since that is what you are apparently trying to describe.

Thanks, that's helpful, I'll read it.

Why don't you read what's been said on this site and elsewhere

Because this is a vast site, and I don't know where to look for what's been said already in this case. It reminds me of Googling for a computer problem and turning up page after page of forum posts saying "google it you n00b".

So again, thank you for the link. But what would be even more helpful is knowing what kinds of search strategies you would pursue if you were struck by an idea that was new to you so you didn't know what keywords to query (or if there even are any keywords for it yet).

I did not mean my reply to be condescending, sorry if it came across this way.

But what would be even more helpful is knowing what kinds of search strategies you would pursue if you were struck by an idea that was new to you so you didn't know what keywords to query (or if there even are any keywords for it yet).

Well, constructing a fruitful search query is not a trivial task (I wish there were an AI helping with that!). Until then, rewriting your post as a request for information is likely to reduce the downvote, if you are worried about that. BTW, I suspect that the OP's karma will go back up into positive territory after a little while, assuming it generates an interesting discussion.

In general, it is safe to assume that any new idea you are "struck by" is only new to you, given that there are some smart regulars around here, who spent a lot of time thinking about the same issues. Asking a question in the open thread might be a good way to start.

Heh heh. I'm struck by new ideas that my colleagues have never had on a fairly regular basis unfortunately. I forget that LW-ers probably smarter than my colleagues. Maybe I should ask you guys a question about my job sometime.

[This comment is no longer endorsed by its author]Reply

Ask away.

[-][anonymous]11y20

It gives remarkably accurate answers for well-formulated questions. Where there is a theoretical limit to the accuracy of an answer to a particular type of question, its answer usually comes close to that limit, plus or minus some tiny rounding error.

Note: This is rather long. Sorry about that, if it's too long, but I was making notes on a comparable question recently, and you did say to be specific.

Well, the last time I was thinking of features in an AI, was one that could ask questions that could potentially cleave answer space for any poorly formulated request(Because as a well known problem, there is far more processing time and programmer effort spent on poorly formulated requests then well formulated ones) and show a video demonstration of some likely behaviors prior to your answering assuming what it considered the most plausible potential answers.

1: Incredibly simplistic example: you request the AI: "Make me a Line graph of these numbers."

The AI responds: "Do you want a Line Graph, with number set one as the X axis, which would look like this, or do you want a Line Graph with number set two as the X axis, which would look like this, or are you thinking of something else?"

Without that feature, a program might take that and just go "ERROR: Insufficient parameters." Or, worse, they'll give you the WRONG line graph, and they won't tell you. Both are annoying.

2: More complicated request example: you request the AI: "Bring a drink inside the fridge to me."

The AI may respond: "Do you want me to bring an entire two liter bottle of soda, which would look like this, or do you want me to bring a can of soda, which would look like this, or do you want me to drag the entire fridge to your position, which would look like this, or are you thinking of something else?"

This is where the video demonstration comes in handy. Relative to you, the AI doesn't have common sense. It doesn't necessarily know that you probably don't want it to drag the entire fridge to your position to the same level of confidence you do. It just puts together some plausible answers and lets you pick one.

(Note: There is nothing particularly special about video, other than that it shows that the AI shouldn't start actually doing things until you've narrowed the request down, and that some requests have mistakes too complicated to be picked up by text only answers.)

Of course, in the previous case, the AI was mostly on the money. It's also possible that the AI doesn't have the slightest idea what you're are talking about in a dangerous way.

3: You request the AI: "Subdue the crazed man inside the building of innocents."

The AI may respond: "Do you want me to level the building with TNT, which would look like this, or do you want me to gas the building with Chemicals, which would look like this, or do you want me to concoct an Contagious All purpose, Anti-Human Virus and infect him, which would look like this, or are you thinking of something else?"

Whoops. You forgot to tell the AI to subdue the crazed man WITHOUT hurting the innocents. Thankfully it clearly demonstrated what plans it had put together before it implemented any of them, because that would've been bad. You may have also not noted that you wanted the man subdued non-lethally.

This also helps (not prevents entirely, of course) guard against the AI just doing something completely inexplicable relative to you.

4: "What's 2+2?"

"Is the answer 4, like this, or do you want me to build a solid sphere of Computronium advancing at lightspeed and continue calculating, like this, or are you thinking of something else?"

If you wanted the AI to be better, you could network it to a cloud database so it could learn "In 1000 cases When people ask for a drink inside the fridge, they never mean drag the fridge to them, I should stop giving that as a likely answer, and add 'Would you like me to pour your drink in a glass? which seems to come up quite a bit as an answer selected from Something else."

Since danger goes up as capability does, I'd probably want to train a dumber AI on a great deal of 1 and 2 before ever getting anywhere near 3, let alone something as existential as 4.

Of course, if the AI is substantially good enough at ANY request, it can respond to a request of:

5: "Increase your computing power."

With "Do you want me to increase my computing power by building more cores like this, or do you want me to increase my computing power by switching out my existing cores with this design that I came up, like this, or are you thinking of something else?"

In which case you have a Self-Improving AI that, after a great deal of training, gets what humanity wants when they tell it things, so it can be enlisted to save the world and make everyone live happily forever.

Except, that what this DIDN'T do, is it didn't actually follow the procedure that I laid out above, and give an alternate option under the assumption that this isn't what you asked for(and it might be very dangerous), and possibly ask if you just wanted something else entirely.

I want to note that those steps should be here, but have mainly been cut for space constraints, except as rough mentions.

Obvious problem 1: the video output or descriptions can contain basilisks, including ones that cause problem 2.

Obvious problem 2: Someone could ask and then verify it to make full UFAI without realizing it.

Obvious problem 3: UFAI could arise inside the simulation used to produce the hypotheticals, and either hack it's way out directly or cause problem 1 followed by problem 2.

And most oblivious problem of all: Withe being able to repeatedly do the highly dangerous and active step of modifying it's own source code, it'll never get smart enough to be useful on 95% of queries.

[-][anonymous]11y30

Fair point. in that case, given an unknown partially complete AI, if the first action you take is "Let me just start reading the contents of these files without running it to see what it even does." then someone could say "A UFAI put a basilisk in the source code and used it to kill all of humanity, you lose."

That isn't even entirely without precedent, using this as an example: http://boingboing.net/2012/07/10/dropped-infected-usb-in-the-co.html Sometimes malicious code really is literally left physically lying around, waiting for someone to pop it into a computer out of curiosity.

So, in the general case, something that will take a natural language request, turn it into a family of optimizable models, identify the most promising ones, ask the user to choose, and then return an optimized answer?

Notice that it doesn't actually have to do anything itself-- only give answers. This makes much easier to build and creates an extra safeguard for free.

But is there anything more we can pare away? For example, a provably correct natural language parser is impossible because natural language is ambiguous and inconsistent. Humans certainly don't always parse it correctly. On the other hand it's easy for a human to learn a machine language and huge numbers of them have already done so.

So in the chain of events below, the AIs responsibility would be limited to the words in all caps and humans do the rest.

[1 articulate a need] -> [2 formulate an unambiguous query] -> [3 FIND CANDIDATE MODELS] -> [4 user chooses a model or revises step 2] -> [5 RETURN OPTIMAL MANIPULATIONS TO THE MODEL] -> [6 user implements manipulation or revises step 2]

Continued from above, to reduce TLDR-ness...

We have generic algorithms that do step 5. They don't always scale well, but that's an engineering problem, that a lot of people in fields outside AI are already working to solve. We have domain-specific algorithms some of which can do a decent job of step 3-- spam filters, recommendation engines, autocorrectors.

So, does this mean that what's really missing is a generic problem-representor?

Well, that and friendliness, but if we can articulate a coherent, unambiguous code of morality, we will still need a generic problem-representer to actually incorporate it into the optimization procedure.

[-]ikrase11y-10

This sounds like almost nothing, reminds me of the person who wrote a command-line interpreter and language interpretation/synth library, leaving only the problem of figuring out how to write code to come up with intelligent responses to questions. Frankly, what is described here this sounds like something that guy in my robotics class could write in a week in Python. In fact, this sounds suspiciously like an assignment for an elementary programming class for non-computer-science majors that I took.

(This assumes that A.1 and A.2 are NOT provided!)

You still need to write some drivers and interpretation scripts, on top of which go basic perception, on top of which go basic human thoughts, on top of which go culture and morality, on top of which go CEV or whatever other major human morality system you want to use to make the AI friendly.

It also... sounds like this thing doesn't even search the hypothesis space. Which makes it much safer and much, much less useful.

Edit: Actually, I realize that this is more substantial, and wish to apologize for condescension. But the OP still sounds like the job is not being sliced even slightly in the middle, and like it would take a lot of time, work, and additional stuff to make something simple and useless like a chatterbox.

The scripts (A) are like utility functions, and the program (B) is a general problem solver that can maximize/satisfice any utility function. So B must be powerful.

It sounds... lower level than that, more like some kind of numeric optimization thingie that needs you to code the world before you even get to utility functions.

You're right, in the sense that there's nothing here about how to generate accurate representations of the world. According to A.2, the user provides the representations. But even if the program is just a numerical optimizer, it's a powerful one, because it's supposed to be able to optimize an arbitrary function (arbitrary network of nodes, as represented in the script).

So it's as if the unfinished AI project already has the part of the code that will do the heavy lifting when problems are solved, and what remains to be done - which is still both important and difficult - is everything that involves transmitting intentions correctly to this AI core, and ensuring that all that raw power isn't used in the service of the wrong goals.

You still need to write some drivers and interpretation scripts, on top of which go basic perception,

What is the distinction between these and A.2?

on top of which go basic human thoughts,

What are those, what is the minimum set of capabilities within that space that are needed for our goals, and why are they needed?

on top of which go culture

What is it, and why is it needed?

and morality, on top of which go CEV

Is there any distinction, for the purposes of writing a world-saving AI?

If there is, it implies that the two will sometimes give conflicting answers. Is that something we would want to happen?

I'm mostly just rambling about stuff that is totally missing. Basically, I'm respectively referring to 'Don't explode the gas main to blow the people out of the burning building', 'Don't wirehead' and 'How do you utilitarianism?'.

I understand. And if/when we crack those philosophical problems in a sufficiently general way, we will still be left with the technical problem of "how do we represent the relevant parts of reality and what we want out of it in a computable form so the AI can find the optimum"?