Intelligence vs Friendliness

[-]timtyler15y40

An agent is composed of two components: a predictive model of the world, and a utility function for evaluating world states.

...and a tree-pruner - at the very least!

[-]Vaniver15y40

However, for the large part, the world model would differ, in totality, very little between a paperclip maximizer and a friendly AI. While the Friendly AI certainly has to keep track of more things which are irrelevant to the paperclip maximizer, both AIs would have to have world models which have to be able to model human behavior in order for the AIs to be effective, which one would expect would account for the bulk of the complexity of the world model in the first place.

This is, as I understand it, the crux of the argument. Perhaps it takes an AI of complexity 10 to model the world well enough to interact with it and pursue simple values, but an AI of complexity 11 to model the world well enough to understand and preserve human values. If fooming is possible, that means any AIs of complexity 10 will take over the world and not preserve human values, and the only way to get a friendly AI is for no one to make an AI of complexity 10 and the first AI to be complexity 11 (and human-friendly).

[-]timtyler15y20

What is baffling to me is the vague idea that developing the theory of friendliness has any significant synergy with developing the theory of intelligence.

It is not clear what discussions you are referring to - but there is a kind of economic synergy the other way around - machine intelligence builders will need to give humans what they want initially, or their products won't sell.

So, for example, there are economic incentives for automobile makers to figure out if humans prefer to be puked through windscreens or have airbags exploded in their faces.

To give a more machine-intelligence-oriented example, Android face recognition faces some privacy-related issues before it can be marketed - because it goes close to the "creepy line". Without a keen appreciation of the values-related issues, it can't be deployed or marketed.

[-]Dr_Manhattan15y20

It is not clear what discussions you are referring to

Yeah, though it seems I've seen statements by some people here that "Friendliness IS the AI"; I didn't understand them at face value due to the same obvious question as the OP.

[-][anonymous]15y10

"What is baffling to me is the vague idea that developing the theory of friendliness has any significant synergy with developing the theory of intelligence. "

Makes perfect sense to me. 'Friendliness' requires being able to specify a very precise utility function to an optimiser. 99% of the job of figuring that out is the job of learning how to specify any utility function precisely. That's a job that will have to be done by any AI developer.

[-]Manfred15y00

Most of the arguments I've seen that SIAI will be effective at developing AI focus more on identifying common failings of other AI research (and a bit on "look at me I'm really smart" :P ). Maybe an argument like "if you haven't figured out that friendliness is important you probably haven't put in enough thought to make a self-improving AI" inspired this post, rather than the idea that figuring out friendliness would have some causal benefit?

[-]snarles15y00

Tried to edit the article for clarity: last paragraph should be "However, for the large part, the world model would differ, in totality, very little between a paperclip maximizer and a friendly AI. It is true the Friendly AI certainly has to keep track of more things which are irrelevant to the paperclip maximizer. But this extra complexity is nothing compared to the fact that for either AI to be effective, their world models would have to be able to model human behavior on a global scale."

[-]Vladimir_Nesov15y00

Tried to edit the article for clarity: last paragraph should be [...]

Given that the current state of the last paragraph of the post doesn't match what you write in this comment, do you mean that you couldn't actually edit it for some reason? What's the purpose of the comment?

[-]snarles15y00

Yes: this site is quite buggy for me for some reason.

[-]MinibearRex15y-10

The obvious answer I can think of is that having a utility function that closely corresponds to a human's values is going to help an AI predict humans. This is perhaps analogous to mirror neurons in humans.

[-]timtyler15y30

Probably not so much. You have to figure out what agents in your environment want if you are going to try to understand them and deal with them - but you don't really have to want the same things as them to be able to do that.

[-]MinibearRex15y00

It's of course not necessary. But humans model other humans by putting ourselves in someone else's shoes and asking what we would do in that situation. I don't necessarily agree with the argument that it is necessary for an AI to have the same utility function as a human in order to predict humans. But if you did write an AI with an identical utility function, that would give it an easy way to make some predictions about humans (although you'd have problems with things like biases that prevent us from achieving our goals, etc).

[-]timtyler15y00

Some truth - but when you put yourself in someone else's shoes, "goal substitution" often takes place, to take account of the fact that they want different things from you.

Machines may use the same trick, but again, they seem likely to be able to imagine quite a range of different intentional agents with different goals.

The good news is that they will probably at least try and understand and represent human goals.

LESSWRONG
LW

LESSWRONG
LW

10

Intelligence vs Friendliness

10

10