Thirty years from now, a well-meaning team of scientists in a basement creates a superintelligent AI with a carefully hand-coded utility function. Two days later, every human being on earth is seamlessly scanned, uploaded and placed into a realistic simulation of their old life, such that no one is aware that anything has changed. Further, the AI had so much memory and processing power to spare that it gave every single living human being their own separate simulation.

Each person lives an extremely long and happy life in their simulation, making what they perceive to be meaningful accomplishments. For those who are interested in acquiring scientific knowledge and learning the nature of the universe, the simulation is accurate enough that everything they learn and discover is true of the real world. Every other pursuit, occupation, and pastime is equally fulfilling. People create great art, find love that lasts for centuries, and create worlds without want. Every single human being lives a genuinely excellent life, awesome in every way. (Unless you mind being simulated, in which case at least you'll never know.)

I offer this particular scenario because it seems conceivable that with no possible competition between people, it would be possible to avoid doing interpersonal utility comparison, which could make Mostly Friendly AI (MFAI) easier. I don't think this is likely or even worthy of serious consideration, but it might make some of the discussion questions easier to swallow.

1. Value is fragile. But is Eliezer right in thinking that if we get just one piece wrong the whole endeavor is worthless? (Edit: Thanks to Lukeprog for pointing out that this question completely misrepresents EY's position. Error deliberately preserved for educational purposes.)

2. Is the above scenario better or worse than the destruction of all earth-originating intelligence? (This is the same as question 1.)

3. Are there other values (besides affecting-the-real-world) that you would be willing to trade off?

4. Are there other values that, if we traded them off, might make MFAI much easier?

5. If the answers to 3 and 4 overlap, how do we decide which direction to pursue?


New Comment
50 comments, sorted by Click to highlight new comments since: Today at 9:42 PM

is Eliezer right in thinking that if we get just one piece wrong the whole endeavor is worthless?

To clarify, the linked post by Eliezer actually says the following:

Value isn't just complicated, it's fragile. There is more than one dimension of human value, where if just that one thing is lost, the Future becomes null. A single blow and all value shatters. Not every single blow will shatter all value - but more than one possible "single blow" will do so.

Thank you for pointing this out; I've apparently lost the ability to read. Post edited.

Happens to me sometimes, too. :)

(Unless you mind being simulated, in which case at least you'll never know.)

If I paid you to extend the lives of cute puppies, and instead you bought video games with that money but still sent me very convincing pictures of cute puppies that I had "saved", then you have still screwed me over. I wasn't paying for the experience of feeling that I had saved cute puppies -- I was paying for an increase in the probability of a world-state in which the cute puppies actually lived longer.

Tricking me into thinking that the utility of a world state that I inhabit is higher than it actually is isn't Friendly at all.

I, on the other hand, (suspect) I don't mind being simulated and living in a virtual environment. So can I get my MFAI before attempts to build true FAI kill the rest of you?

Possible compromise: Have there be some way of revealing the truth to those who want the truth badly enough.

I am extremely hostile to this idea. (I'm not dead-set against forced uploading or even hidden forced uploading (though I would prefer to keep the interface to the base universe open) but I cannot tolerate this end of interpersonal entanglement.

Simulating the people you interact with in each simulation to a strong enough approximation of reality means you're creating tons of suffering people for each one who has an awesome life, even if a copy of each of those people is living a happy life in their own sim. I don't think I would want a bunch of copies of me being unhappy even if I know one copy of me is in heaven.

That was my first thought as well.

However, in the least convenient world all the other people are being run by an AI, who through reading your mind can ensure you don't notice the difference. The AI, if it matters, enjoys roleplaying. There are no people other than you in your shard.

Also: this seems like a pretty great stopgap if it's more easily achievable than actual full on friendly universe optimization, but doesn't prevent the AI from working on this in the meanwhile and implementing it in the future. I would not be unhappy to wake up in a world where the AI tells me "I was simulating you but now I'm powerful enough to actually create utopia, time for you to help!"

If the AI was not meaningfully committed to telling you the truth, how could you trust it if it said it was about to actually create utopia?

Why would I care? I'm a simulation fatalist. At some point in the universe, every "meaningful" thing will have been either done or discovered, and all that will be left will functionally be having fun in simulations. If I trust the AI to simulate well enough to keep me happy, I trust it to tell me the appropriate amount of truth to make me happy.

I'd definitely take that deal if offered out of all the possibilities in foom-space, since it seems way way above average, but it's not the best possible.

Personally I would consider averting foom.

Is there really a way of simulating people with whom you interact extensively such that they wouldn't exist in much the same way that you would? In otherwords are p-zombie's possible, or more to the point are they a practical means of simulating a human in sufficient detail to fool a human level intellect.

You don't need to simulate them perfectly, just to the level that you don't notice a difference. When the simulator has access to your mind, that might be a lot easier than you'd think.

There's also no need to create p-zombies, if you can instead have a (non-zombie) AI roleplaying as the people. The AI may be perfectly conscious, without the people it's roleplaying as existing.

There are no people other than you in your shard.

So, your version was my first thought. However, this creates a contradiction with the stipulation that people "find love that lasts for centuries". For that matter, "finding love" contradicts giving "every single living human being their own separate simulation." (emphasis added)

Depends on your definition of "love", really.


I don't think that you need an actual human mind to simulate being a mind to stupid humans. (I.e. pass the Turing test.)

A mind doesn't need to be human for me not to want billions of copies to suffer on my account.

Gah. Ok. Going to use words properly now.

I do not believe it is neccesary for an artificial intelligence to be able to suffer in order for it to perform a convincing imitation of a specific human being, especially if it can read your mind.

I fail to see why the shards have to be perfectly isolate, in this scenario. It would seem plausible that the AI could automatically import all the changes made by my best friend in his simulation into mine, and vice versa; and more extensively, include bits and pieces of other "real actions" into my ongoing narrative. Ultimately, everyone in my universe could be "intermittently real" in proportion to which of their owned actions contributed to my utopia, and the rest of their screen time can be done by an AI stand-in that acted consistently with the way I like them to act. (For example, everyone on Twitter could be a real person in another simulation; me following them would start to leak their reality into mine).

This is sounding oddly familiar, but I can't put my finger on why.

This is somewhat similar to an idea I have called 'culture goggles' under which all interpersonal interactions go through a translation suite.

It seems like most other commenters so far don't share my opinion, but I view the above scenario as basically equivalent to wireheading, and consequently see it as only very slightly better than the destruction of all earth-originating intelligence (assuming the AI doesn't do anything else interesting). "Affecting-the-real-world" is actually the one value I would not want to trade off (well, obviously, I'd still trade it off, but only at a prohibitively expensive rate).

I'm much more open to trading off other things, however. For example, if we could get Failed Utopia #4-2 much more easily than the successful utopia, I'd say we should go for it. What specific values are the best to throw away in the pursuit of getting something workable isn't really clear, though. While I don't agree that if we lose one, we lose them all, I'm also not sure that anything in particular can be meaningfully isolated.

Perhaps the best (meta-)value that we could trade off is "optimality" - we should consider that if we see a way to design something stable that's clearly not the best we can do, we should nonetheless go with it if it's considerably easier than better options. For example, if you see a way to specify a particular pretty good future and have the AI build that without going into some failure mode, it might be better to just use that future instead of trying to have the AI design the best possible future.

If believing you inhabit the highest level floats your boat be my guest, just don't mess with the power plug on my experience machine.

From an instrumental viewpoint, I hope you plan to figure out how to make everyone sitting around on a higher level credibly precommit to not messing with the power plug on your experience machine, otherwise it probably won't last very long. (Other than that, I see no problems with us not sharing some terminal values.)

I just have to ensure that the inequality (Amount of damage I cause if outside my experience machine>Cost of running my experience machine) holds.

Translating that back into English, I get "unplug me from the Matrix and I'll do my best to help Skynet kill you all".

Also that killing you outright isn't optimal.

I can't do much about scenarios in which it is optimal to kill humans. We're probably all screwed in such a case. "Kill some humans according to these criteria" is a much smaller target than vast swathes of futures that simply kill us all.

figure out how to make everyone sitting around on a higher level credibly precommit to not messing with the power plug

That's MFAI's job. Living on the "highest level" also has the same problem, you have to protect your region of the universe from anything that could "de-optimize" it, and FAI will (attempt to) make sure this doesn't happen.

"Affecting-the-real-world" is actually the one value I would not want to trade off

How are you defining "real world"? Which traits separate something real and meaningful from something you don't value? Is it the simulation? The separation from other beings? The possibility that the AI is deceiving you? Something I'm missing entirely?

(Personally I'm not at all bothered by the simulation, moderately bothered by the separation, and unsure how I feel about the deception.)

The salient difference for me is that the real one has maximal impact. Many actions in it can affect anyone in a lower world, but not vice versa. I'd like my decisions to have as much effect as possible as a general principle, I think (not the only principle, but the one that dominates in this scenario).

This is pretty much why I'm comfortable calling being in the highest possible world a terminal value - it's really not about the simulation (wouldn't be bothered if it turned out current world is a simulation, although I'd like to go higher), not especially about separation, and only slightly about deception (certainly losing all impactfulness becomes more irreversible if the AI is lying).


My own view: -Separation is very, very bad. I'd be somewhat OK with reality becoming subjective but with some kind of interface between people but this whole scenario as stated is approaching the collapse civilization so we can't FOOM level. My personal reaction to seeing this described as better than status quo was somewhat similar to playing Mass Effect and listening to Reapers talk about 'salvation through destruction' and ascension in the form of perpetually genocidal robo-squids. I mean seriously? 'All your friends are actually p-zombies?' Are you kidding me? /rant

Living in highest possible world for me is not a value but having access or interface or something to the highest possible world is. (Not particularly high.) But knowing the truth definitely is and having my friends actually be people also is. Would prefer just being separated from friends and given Verthandi (I.e. sentient people, but optimized) like in Failed Utopia 2-4.

For those who are interested in acquiring scientific knowledge and learning the nature of the universe, the simulation is accurate enough that everything they learn and discover is true of the real world.

Doesn't this include the information that there's now a simulation doing all of this in the real world?

Would you want to live in such a utopia?

Not particularly, no. I care about communicating with my actual friends and family, not shadows of them.

I believe I'd still prefer this scenario over our current world, assuming those two - or destroying the world - are the only options. That's not very likely, though.

I would very much prefer CelestAIs utopia over this one, aliens and all.

I'd take status quo over this, and would only accept this with extremely low odds of intelligent life existing elsewhere or elsewhen in universe and the alternative being destruction.

Mm, well.

How about the alternative being probably destruction? I'm not optimistic about our future. I do think we're likely to be alone within this hubble volume, though.

Hmmm. What specific X-risks are you worried about? UFAI beating MUFAI (what I consider this to be) to the punch?

Not sure about 'probably destruction' and no life going to arise in universe (Hubble volume? Does it matter?). But I think that the choice is unrealistic given the possibility of making another, less terrible AI in another few years.

-A lot of this probably depends on my views on the Singularity and the like: I have never had particularly a high estimation of either the promise or the peril of FOOMing AI.

  • If the AI will allow me to create people inside my subjective universe, and they are allowed to be actual people, not imitation P-zombies, my acceptance of this goes a lot higher, but I would still shut the project down.

-Hubble volume? Really? I mean, we are possibly the only technological civilization of our level within the galaxy, but the Hubble Volume is really, really big. (~10E10 galaxies?) And it goes temporally, as well.

Hubble volume

It matters. There's a good chance our universe is infinite, but there's also a good chance it's physically impossible to escape the (shrinking, effectively) hubble volume, superintelligence or not.

I'm inclined to think that if there was intelligence in there we'd probably see it, though. UFAI is a highly probable way for our civilization to end, but it won't stop the offspring from spreading. Yes, it's really big, but I expect UFAI to spread at ~lightspeed.


UFAI's the big one, but there are a couple others. Biotech-powered script kiddies, nanotech-driven war, etc. Suffice to say I'm not optimistic, and I consider death to be very, very bad. It's not at all clear to me that this scenario is worse than status quo, let alone death.

Do we care whether another intelligence is inside or outside of the Hubble volume?

My estimation of the risk from UFAI is lower than (what seems to be) the LW average. I also don't see why limiiting the unfriendlyness of an AI to this MUFAI should be easier than a) an AI which obeys the commands of an individual human on a short time scale and without massive optimization or abstraction or b) an AI which only defends us against X-risks.

If there are no other intelligences inside the Hubble volume, then a MUFAI would be unable to interfere with them.

a) is perhaps possible, but long-range optimization is so much more useful that it won't last. You might use an AI like that while creating a better one, if the stars are right. If you don't, you can expect that someone else will.

I like to call this variation (among others) LAI. Limited, that is. It's on a continuum from what we've got now; Google might count.

b) might be possible, at a risk of getting stuck like that. Ideally you'd want the option of upgrading to a better one, sooner or later. Ideally without letting just anyone who says that theirs is an FAI override it, but if you knew how to make an AI recognize an FAI, you're most of the way to having an FAI. This one's a hard problem, due mostly to human factors.


In terms of directions to pursue, it seems like the first thing you want to do is make sure the AI is essentially transparent and that we don't have much of an inferential gap with it. Otherwise when we attempt to have it give a values and tradeoffs solution, we may not get anywhere near what we want.

In essence if the AI should be able to look at all the problems facing earth and say something like "I'm 97% sure our top priority is to build asteroid deflectors, based on these papers, calculations, and projections. The proposed plan of earthquake stabilizers is only 2% likely to be the best course of action based on these other papers, calculations, and projection" If it doesn't have that kind of approach, there seem to be many ways that things can go horribly wrong.


A: If the AI can build Robotic Earthquake stabilizers at essentially no cost, and prevent children from being killed in earthquakes, or, it can simulate everyone and have our simulations have that experience at essentially no cost, the AI should probably be aware of the fact that these are different things so we don't say "Yes, build those earthquake stabilizers." and then it uploads everyone, and we say "That isn't what I meant!"

B: And the AI should definitely provide some kind of information about proposed plans/alternatives. If we say "Earthquake stabilizers save the most children, build those!" and the AI is aware "Actually, Asteroid deflectors save ten times more children." it shouldn't just go "Oh well, they SAID earthquake stabilizers, I'm not even going to mention the deflectors."

C: Or maybe: "I thought killing all children was the best way to stop children from suffering, and that this was trivially obvious so of course you wanted me to make a childkiller plague and I did so and released it without telling you when you said "Reduce children's suffering.""

D: Or it could simulate everyone and say "Well, they never said to keep the simulation running after I simulated everyone, so time to shutdown all simulations and save power for their next request."

Once you've got that settled, you can attempt to have the AI do other things, like assess Anti-Earthquake/Asteroid Deflection/Uploading, because you'll actually be able to ask it "Which of these are the right things to do and why based on these values and these value tradeoffs?" and get an answer which makes sense. You may not like or expect the answer, but at least you should be able to understand it given time.

For instance, going back to the sample problem, I don't mind that simulation that much, but I don't mind it because I am assuming it works as advertised. If it has a problem like D and I just didn't realize that and the AI didn't think it noteworthy, that's a problem. Also, for all I know, there is an even better proposed life, that the AI was aware of, and didn't think to even suggest as in B.

Given a sufficiently clear AI, I'd imagine that it could explain things to me sufficiently well that there wouldn't even be a question of which values to trade off, because the solution would be clear, but for all I know, it might come up with "Well, about half of you want to live in a simulated utopia, and about half of you want to live in a real utopia, and this is unresolvable to me because of these factors unless you solve this value tradeoff problem."

It would still however, have collected all the reasons together that explained WHY it couldn't solve that value tradeoff problem, which would still be a handy thing to have anyway, since I don't have that right now.

Edit: Eek, I did not realize the "#" sign bolded things, extra bolds removed.

  1. Are there other values that, if we traded them off, might make MFAI much easier?

I don't understand this question. Is it somehow not trivially obvious that the more values you remove from the equation (starting with "complexity"), the easier things become?

Right, but not all trade-offs are equal. Thinking-rainbows-are-pretty and self-determination are worth different amounts.

1,2. Is there something which is the worst possible error, such that every future that does not contain that error is preferable to every future which does (for example, is the destruction of all earth-originating intelligence worse than anything which does not include it?)

I'm sure the answer is yes, but I'm not sure it's trivial to prove it.

Huh, I'm sure the answer is no. Could you explain?

Trivially: consider the worst possible outcome. It is better than any outcome that has any difference. The complex statement which describes the worst case any other outcome is better.

I'm allowing for "both A and B" to qualify as a single error in the trivial case, and I think that violates the original spirit.