The AI is here. And it's Friendly. But due to moral uncertainty, its value function prohibits any major intervention in existing human societies.

The AI's nanobots get to work. They replace the South Pacific garbage patch with a new continent--Eudaimonia, a rat park for humans. All basic needs are met, but if you want more money, you can gamble at a positive-expected-value casino or run through a bonus area where $100 bills flutter down from the sky. Immortality drugs are under development. The AI nanoprints RollerCoaster Tycoon save files on demand. Cherries are bred to be free of pits. Spaceships full of tourists depart regularly for a new floating city on Venus. And many ignore all this in favor of immersive virtual environments which are far more wondrous.

The AI is concerned. Eudaimonia seems to satisfy many human terminal values. But other terminal values appear to be in conflict. Indeed, some people have a terminal value that others should not be allowed to achieve their terminal values! That doesn't sound like a rat park! It's impossible to please everyone, and although the AI could modify peoples' preferences to change this, it is far too corrigible for that nonsense.

The AI comes up with a compromise. Once a month, you're given the opportunity to video call someone you have a deep disagreement with. At the end of the call, each of you gets to make a choice regarding whether the other should be allowed in Eudaimonia. But there's a twist: Whatever choice you made for the other person is the choice the AI makes for you.

The plan seems to be working splendidly at first. Legions of forgiving and agreeable people flood into the new continent and enjoy a life of constant bliss. The average resident makes three new deep friendships per month while tripping on psychedelic drugs. What used to be existential conflicts are now ironic sports team rivalries.

But a problem emerges: As the forgiving and agreeable people leave, it is the unforgiving and disagreeable people who are left behind--people who are especially difficult to forgive. The world outside Eudaimonia keeps getting uglier and uglier.

The AI decides it isn't fair to hold later applicants to a higher standard. Instead of doing a call with another person outside Eudaimonia, the AI sets you up with a Eudaimonia resident who disagreed with you in a past life. The AI offers them loyalty points at their favorite positive-expected-value casino if they're able to get you to forgive them.

The new program works much better. Eventually, all of humanity has moved to Eudaimonia except a small number of people for whom immiserating their enemies really seems to be their most important terminal value. Those people destroy each other. Everyone lives happily ever after.

New to LessWrong?

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 11:08 PM

The trouble here is that deep disagreements aren't often symmetrically held with the same intensity. Consider the following situation:

Say we have Protag and Villain. Villain goes around torturing people and happens upon Protag's brother. Protag's brother is subsequently tortured and killed. Protag is unable to forgive Villain but Villain has nothing personal against Protag. Which of the following is the outcome?

  • Protag says "Villain must not go to Eudaemonia" so neither Protag nor Villain go to Eudaemonia
  • Protag says "Villain must not go to Eudaemonia" so Protag cannot go to Eudaemonia. Villain says "I don't care what happens to Protag; he can go if he wants" so Villain gets to go to Eudaemonia
  • Protag says "Villain must not go to Eudaemonia" but it doesn't matter because next month they talk to someone else they disagree with and both go to Eudaemonia anyway

The first case is sad but understandable here -- but also allows extremist purple-tribe members to veto non-extremist green-tribe members (where purple and green ideologies pertain to something silly like "how to play pool correctly"). The second case is perverse. The third case is just "violate people's preferences for retribution, but with extra steps."

Yep. Good thing a real AI would come up with a much better idea! :)

I'm confused -- please forgive me if this is a dumb comment, this is my first contribution.

What was the purpose of the post if the idea was, on its own, not durable enough to stand? I'm genuinely confused on how this would avoid harming the 'good' people in the short term.

Yep. Good thing a real AI would come up with a much better idea! :)

How does this post expand our thoughts of AI if it "would come up with a better idea"? I'm not trying to criticize you (hence why I didn't downvote this post). I just want to better understand its intention so that I can understand LW better.

Thanks

What was the purpose of the post if the idea was, on its own, not durable enough to stand?

I think there's plenty of value in showing problems with unworkable-but-attractive ideas.  Fiction is a fine medium for this, as it's entertaining in it's own right.

It's fiction ¯\_(ツ)_/¯

I guess I'll say a few words in defense of doing something like this... Supposing we're taking an ethically consequentialist stance.  In that case, the only purpose of punishment, basically, is to serve as a deterrent.  But in our glorious posthuman future, nanobots will step in before anyone is allowed to get hurt, and crimes will be impossible to commit.  So deterrence is no longer necessary and the only reason to punish people is due to spite.  But if people are feeling spiteful towards one another on Eudaimonia that would kill the vibe.  Being able to forgive one person you disagree with seems like a pretty low bar where being non-spiteful is concerned.  (Other moral views might consider punishment to be a moral imperative even if it isn't achieving anything from a consequentialist point of view.  But consequentialism is easily the most popular moral view on LW according to this survey.)

A more realistic scheme might involve multiple continents for people with value systems that are strongly incompatible, perhaps allowing people to engage in duels on a voluntary basis if they're really sure that is what they want to do.

In any case, the name of the site is "Less Wrong" not "Always Right", so I feel pretty comfortable posting something which I suspect may be flawed and letting commenters find flaws (and in fact that was part of why I made this post, to see what complaints people would have, beyond the utility of sharing a fun whimsical story.  But overall the post was more optimized for whimsy.)

>The AI comes up with a compromise. Once a month, you're given the opportunity to video call someone you have a deep disagreement with. At the end of the call, each of you gets to make a choice regarding whether the other should be allowed in Eudaimonia. But there's a twist: Whatever choice you made for the other person is the choice the AI makes for you.

 

This whole plan relies on an utterly implausible conspiracy. There's no way to avoid people knowing how this test actually works just by its nature. So if people know how this test works then there's zero reason to base your response on what you actually want for the person you disagree with.