Alerus — LessWrong

Is friendly AI "trivial" if the AI cannot rewire human values?

Thanks, I appreciate that. I have no problem with people disagreeing with me as confronting disagreement is how people (self included) grow. However, I was taken aback by the amount of down voting I received merely for disagreeing with people here and the fact that by merely choosing to respond to people's arguments it would effectively guarantee even more down votes—a system tied to how much you can participate in the community—made it more concerning to me. At least on the discussion board side of the site, I expected down voting to be reserved for posts that were derailing topics, flaming, ignoring arguments presented to them, etc., not for posts with which one disagreed. As someone who does academic research in AI, I thought this could be a fun lively online community to discuss that, but having my discussion board topic posting privileges removed because people did not agree with things I said (and the main post didn't even assert anything, it asked for feedback), I've reconsidered that. I'm glad to see not all people here think this was an appropriate use of down voting, but I feel like the community at large has spoken with regards to how they use that and when this thread ends I'll probably be moving on.

Thanks for you support though, I do appreciate that.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus14y00

I appreciate that sentiment and I'll also add that I appreciate that even in your prior post you made an effort to suggest what you thought I was driving at.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus14y20

When you think of a nation conquering another, the US and Japan is really what comes to your mind? Are you honestly having trouble grasping the distinction I was making? Because personally, I'm really not interested in continuing an irrelevant semantics debate.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus14y00

Yes. I find it odd that this argument is derailed into demanding a discussion on the finer points of the semantics for "conquer."

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus14y00

Conquer is typically used to mean that you take over the government and run the country, not just win a war.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus14y00

You're missing the point of talking about opposition. The AI doesn't want the outcome of opposition because that has terrible effects on the well-being its trying to maximize, unlike the nazis. This isn't about winning the war, its about the consequence of war on the measured well-being of people and other people who live in a society where an AI would kill people for what amounted to thought-crime.

And if the machine thinks that's the best way to make people happy (for whatever horrible reason--perhaps it is convinced by the Repugnant Conclusion and wants to maximize utility by wiping out all the immiserated Russians), we're still in trouble.

This specifically violates the assumption that the AI has well modeled how any given human measures their well-being.

However, if you're trying to describe an AI that is set to maximize human value, understands the complexities of the human mind, and won't make such mistakes, then you are describing friendly AI.

It is the assumption that it models human well-being at least as well as the best a human can model the well-being function of another. However, this constraint by itself does not solve friendly AI, because in a less constrained problem than the one I outlined, the most common response for an AI trying to maximize what humans value is that it will change and rewire what humans value to something more easy to maximize. The entire purpose of this post is to question whether it could achieve this without the ability to manually rewire human values (e.g., could this be done through persuasion?). In other words, you're claiming friendly AI is solved more easily than the constrained question I posed in the post.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus14y00

And as for the others? Or are you saying the AI trying to maximize well-being will try and succeed in effectively wiping out everyone and then condition future generations to have the desired easily maximized values? If so, this behavior is conditioned on the idea that the AI could be very confident in its ability to do so, because otherwise the chance of failing and the cost of war in expected value of human well-being would massively drop the expected value. I think you should also make clear what you think these values might end up being to which it will try to change human values to more easily maximize.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus14y00

We also didn't conquer Japan, we won the war. Those are two different things.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus14y00

Considering there were many people in germany who vehemently disliked the nazis too (even ignoring jews), it seems like a pretty safe bet that after being conquered we wouldn't have suddenly viewed the nazis as great people. Why do you think otherwise?

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus14y-10

Lets lose the silly straw man arguments. I've already explicitly commented on how I don't believe the universe is fair and I think from that it should be obvious that I don't think really bad things can't happen. As far as moral progress goes, I think it happens in so far as its functional. Morals that lead to more successful societies win the competition and stick around. This often happens to move societies (not necessarily all people in the society) toward greater tolerance of peoples and less violence because oppressing people and allowing for more violence tends to have bad effects internally in the society.

If we were weaker the Nazis could have won. That's not even the central point though. For kicks, lets assume the Nazis would have won the war. What does that mean though? It still means that other humans were is huge opposition and went to war over it causing enumerable deaths. After the nazis won, there would also surely be people wildly unhappy with the situation. This presents a serious problem for the AI trying to maximize well-being. It would not want to do things that led to mass outrage and opposition because that fails its own metrics.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments