I put "trivial" in quotes because there are obviously some exceptionally large technical achievements that would still need to occur to get here, but suppose we had an AI with a utilitarian utility function of maximizing subjective human well-being (meaning, well-being is not something as simple as physical sensation of "pleasure" and depends on the mental facts of each person) and let us also assume the AI can model this "well" (lets say at least as well as the best of us can deduce the values of another person for their well-being). Finally, we will also assume that the AI does not possess the ability to manually rewire the human brain to change what a human values. In other words, the ability for the AI to manipulate another person's values is limited by what we as humans are capable of today. Given all this, is there any concern we should have about making this AI; would it succeed in being a friendly AI?

One argument I can imagine for why this fails friendly AI is the AI would wire people up to virtual reality machines. However, I don't think that works very well, because a person (except Cypher from the Matrix) wouldn't appreciate being wired into a virtual reality machine and having their autonomy forcefully removed. This means the action does not succeed in maximizing their well-being.

But I am curious to hear what arguments exist for why such an AI might still fail as a friendly AI.

58 comments, sorted by Click to highlight new comments since: Today at 1:30 AM
New Comment
[-][anonymous]10y 10

Finally, we will also assume that the AI does not possess the ability to manually rewire the human brain to change what a human values. In other words, the ability for the AI to manipulate another person's values is limited by what we as humans are capable of today. Given all this, is there any concern we should have about making this AI; would it succeed in being a friendly AI?

AAaiiiieee!

Can you please think about the emphasised phrase by yourself for 5 minutes (as measured by a physical clock)?

That was approximately my reaction. Considering how easy it is to influence humans, it would be a nontrivial task to avoid unintentionally changing those values. I suspect that a significantly subhuman AI could change human values. One might argue that by increasing and redistributing wealth and improving medical care, our expert systems today are changing human value unintentionally. Google ads, which are automated and "learn" what the user likes, are specifically tailored to influence us, and do a remarkable job considering the relative simplicity of the algorithms used.

What is wrong with the statement? The idea I'm trying to portray is that I as a person now, cannot go and forcefully rewire another person's values. The only ability I have to try an affect them is to be persuasive in argument or perhaps being deceptive about certain things to try and get them to a different position (e.g., consider the state of politics).

In contrast, one of the concerns for the future is that an AI may have the technological ability to more directly manipulate a person. So the question I'm asking is: is the future technology at the disposal of an AI the only reason it could behave "badly?" under such a utility function?

Also, please avoid such comments. I am interested in having this discussion, but alluding to finding something wrong in what I have posted and not saying what you think it is, is profoundly unhelpful and useless to discussion.

Consider that humans have modified human values to results as different as nazism and as jainism.

Consider that every human who ever existed, was shaped purely by environment + genes.

Consider how much humans have achieved merely by controlling the environment: converting people to insane religions which they are willing to die and kill for, making torturers, "the banality of evil", etc. etc.

Now imagine what an entity could achieve with that plus 1) complete understanding of how the brain is shaped by the environment and/or 2) complete control of the environment (via VR, smart dust, whatever) for a human from age 0 onwards.

I think the conservative assumption is that any mind we would recognize as human, and many we wouldn't, could be produced by such an optimization process. You're not limiting your AI at all.

suppose we had an AI with a utilitarian utility function of maximizing subjective human well-being (meaning, well-being is not something as simple as physical sensation of "pleasure" and depends on the mental facts of each person) and let us also assume the AI can model this "well" (lets say at least as well as the best of us can deduce the values of another person for their well-being)

You've crammed all the difficulty of FAI into this sentence. An additional limit on how much it can manipulate us does little if anything to make this part easier, and adds the additional complication of how strict this limitation should be. The question of how much FAI would manipulate us is an interesting one, but either it's a small part of the problem or it's something that will be subsumed in the main question of "what do we want?". By the latter I mean that we may decide that the best way to decide how much FAI should change our values is to have it calculate our CEV, the same way that the FAI will decide what economic system to implement.

This is not meant to be a resolution to FAI since you can't stop technology. It's meant to highlight whether the bad behavior of AI ends up being due to future technology to more directly change humanity. I'm asking the question because the answer to this may give insights as to how to tackle the problem.

I'd suggest reading Failed Utopia #4-2.

One problem is that if it can create new people, any rules about changing people would be pointless. If it cannot create new people, then it ends up with a Utopia for 6 billion people, which is nothing compared to what could have been.

This could be fixed by letting it rewire human brains, but limiting it to doing what humans would be okay with, if it didn't rewire their brains. This is better, but it still runs into problems in that people wouldn't fully understand what's going on. What you need to do is program it so that it does what people would like if they were smarter, faster, and more the people they wish they were. In other words, use CEV.

Also, it's very hard to define what exactly constitutes "rewiring a human brain". If you make it too general, the AI can't do anything, because that would affect human brains. If you make it too specific, the AI would have some slight limitations on how exactly it messes with people's minds.

Thanks for the link, I'll give it a read.

Creating new people is potentially a problem, but I'm not entirely convinced. Let me elaborate. When you say:

What you need to do is program it so that it does what people would like if they were smarter, faster, and more the people they wish they were. In other words, use CEV.

Doesn't this kind of restate in different words that it models human well-being and tries to maximize that? I imagine when you phrased it this way that such an AI wouldn't create new people that are easier to maximize because that isn't what humans would want. And if that's not what humans would want doesn't that just mean it's negatively viewed in their well-being and my original definition suffices? Assuming humans don't want the AI to make new people that are simply easier to maximize, if it created a new person, all people on the earth view this negatively and their well-being drops. In fact, it may lead to humans shutting the AI down, so the AI deduces that it cannot create new people that are easier to maximize. The only possible hole in that I see is if the AI could suddenly create an enormous number of people at once..

Also, it's very hard to define what exactly constitutes "rewiring a human brain". If you make it too general, the AI can't do anything, because that would affect human brains. If you make it too specific, the AI would have some slight limitations on how exactly it messes with people's minds.

Indeed it's difficult to say precisely, that's why I used what we can do now as analogy. I can't really rewire a person's values at all except through persuasion or other such methods. Even our best neuroscientists can't do that unless I'm ignorant to some profound advances. The most we can really do is tweak pleasure centers (which as I stated isn't the metric for well-being) or effectively break the brain so the person is non-operational, but I'd argue that non-operational humans have effectively zero measure of well-being anyway (for similar reasons as to why I'd say a bug has a lower scale of well-being than a human does).

Assuming humans don't want the AI to make new people that are simply easier to maximize, if it created a new person, all people on the earth view this negatively and their well-being drops.

I'm not sure how common it is, but I at least consider total well-being to be important. The more people the better. The easier to make these people happy, the better.

Indeed it's difficult to say precisely, that's why I used what we can do now as analogy. I can't really rewire a person's values at all except through persuasion or other such methods.

An AI is much better at persuasion than you are. It would pretty much be able to convince you whatever it wants.

Even our best neuroscientists can't do that unless I'm ignorant to some profound advances.

Our best neuroscientists are still mere mortals. Also, even among mere mortals, making small changes towards someones values are not difficult, and I don't think significant changes are impossible. For example, the consumer diamond industry would be virtually non-existant if De Beers didn't convince people to want diamonds.

The more people the better.

The more people in what? Any particular moment in time? The complete timeline of any given Everett Branch? The whole multiverse?

Between an Everett branch of 10 billion people, and ten Everett branches of 1 billion people each, which do you prefer?

Between 10 billion people that live in the same century, and one billion people per century over a span of ten centuries, which do you prefer?

The whole multiverse.

I'm not sure how common it is, but I at least consider total well-being to be important. The more people the better. The easier to make these people happy, the better.

You must also consider that well-being need not be defined as a positive function. Even if it wasn't, if the gain of adding a person was less than drop in well-being of others, it wouldn't be beneficial unless the AI was able to without prevention, create many more such people.

An AI is much better at persuasion than you are. It would pretty much be able to convince you whatever it wants.

I'm sure it'd be better than me (unless I'm also heavily augmented by technology, but we can avoid that issue for now). On what grounds can you say that it'd be able to persuade me to anything it wants? Intelligence doesn't mean you can do anything and think this needs to be justified.

Our best neuroscientists are still mere mortals. Also, even among mere mortals, making small changes towards someones values are not difficult, and I don't think significant changes are impossible. For example, the consumer diamond industry would be virtually non-existant if De Beers didn't convince people to want diamonds.

I know they're mere mortals. We're operating under the assumption that the AI's methods of value manipulation are limited to what we can do ourselves, in which case rewiring is not something we can do with any great affect. The point of the assumption is to ask what the AI could do without more direct manipulation. To that end, only persuasion has been offered and as I've stated, I'm not seeing a compelling argument for why an AI could persuade anyone to anything.

Even if it wasn't, if the gain of adding a person was less than drop in well-being of others, it wouldn't be beneficial unless the AI was able to without prevention, create many more such people.

Do you honestly think a universe the size of ours can only support six billion people before reaching the point of diminishing returns?

We're operating under the assumption that the AI's methods of value manipulation are limited to what we can do ourselves, in which case rewiring is not something we can do with any great affect.

If you allow it to use the same tools but better, it will be enough. If you don't, it's likely to only try to do things humans would do, on the basis that they're not smart enough to do what they really want done.

Do you honestly think a universe the size of ours can only support six billion people before reaching the point of diminishing returns?

That's not my point. The point is people aren't going to be happy if an AI starts making people that are easier to maximize for the sole reason that they're easier to maximize. This will suggest a problem to us by the very virtue that we are discussing hypotheticals where doing so is considered a problem by us.

If you allow it to use the same tools but better, it will be enough. If you don't, it's likely to only try to do things humans would do, on the basis that they're not smart enough to do what they really want done.

You seem to be trying to break the hypothetical assumption on the basis that I have not specified a complete criteria that would prevent an AI from rewiring the human brain. I'm not interested in trying to find a set of rules that would prevent an AI from rewiring human's brain (and I never tried to provide any, that's why it's called an assumption), because I'm not posing that as a solution to the problem. I've made this assumption to try and generate discussion all the problems where it will break down since typically discussion seems to stop at "it will rewire us". Trying to assert "yeah but it would rewire because you haven't strongly specified how it couldn't" really isn't relevant to what I'm asking since I'm trying to get specifically at what it could do besides that.

"Finally, we will also assume that the AI does not possess the ability to manually rewire the human brain to change what a human values. In other words, the ability for the AI to manipulate another person's values is limited by what we as humans are capable of today."

I argue that we as humans are capable of a lot of that, and the AI may be able to think faster and draw upon a larger store of knowledge of human interaction.

Furthermore, what justifies this assumption? If we assume a limit that the AI won't manipulate me any more than Bob across the street will manipulate me, then yes the AI is safe, but that limit seems very theoretical. A higher limit that the AI won't manipulate me more than than the most manipulative person in the world isn't very reassuring, either.

Can you give examples of what you think humans capability to rewire another's values are?

As for what justifies the assumption? Nothing. I'm not asking it specifically because I don't think AIs will have it, I'm asking it so we can identify where the real problem lies. That is, I'm curious whether the real problem in terms of AI behavior being bad is entirely specific to advances in biological technology to which eventual AIs will have access, but we don't today. If we can conclude this is the case, it might help us in understanding how to tackle the problem. Another way to think of the question I'm asking is take such an AI robot and drop it into todays society. Will it start behaving badly immediately, or will it have to develop technology we don't have today before it can behave badly?

Can you give examples of what you think humans capability to rewire another's values are?

As plenty of religious figures have shown over the years, this capability is virtually unlimited. An AI would just have to start a new religion, or take over an existing one and adapt it to its liking.

I am saddened by the amount of downvotes on Alerus's well written and provocative posts. He made a positive contribution to the discussion, and should not be discouraged, IMO.

Thanks, I appreciate that. I have no problem with people disagreeing with me as confronting disagreement is how people (self included) grow. However, I was taken aback by the amount of down voting I received merely for disagreeing with people here and the fact that by merely choosing to respond to people's arguments it would effectively guarantee even more down votes—a system tied to how much you can participate in the community—made it more concerning to me. At least on the discussion board side of the site, I expected down voting to be reserved for posts that were derailing topics, flaming, ignoring arguments presented to them, etc., not for posts with which one disagreed. As someone who does academic research in AI, I thought this could be a fun lively online community to discuss that, but having my discussion board topic posting privileges removed because people did not agree with things I said (and the main post didn't even assert anything, it asked for feedback), I've reconsidered that. I'm glad to see not all people here think this was an appropriate use of down voting, but I feel like the community at large has spoken with regards to how they use that and when this thread ends I'll probably be moving on.

Thanks for you support though, I do appreciate that.