The Empty White Room: Surreal Utilities

by linkhyrule5 7 min read23rd Jul 2013125 comments


This article was composed after reading Torture vs. Dust Specks and Circular Altruism, at which point I noticed that I was confused.

Both posts deal with versions of the sacred-values effect, where one value is considered "sacred" and cannot be traded for a "secular" value, no matter the ratio. In effect, the sacred value has infinite utility relative to the secular value.

This is, of course, silly. We live in a scarce world with scarce resources; generally, a secular utilon can be used to purchase sacred ones - giving money to charity to save lives, sending cheap laptops to poor regions to improve their standard of education.

Which implies that the entire idea of "tiers" of value is silly, right?

Well... no.

One of the reasons we are not still watching the Sun revolve around us, while we breath a continuous medium of elemental Air and phlogiston flows out of our wall-torches, is our ability to simplify problems. There's an infamous joke about the physicist who, asked to measure the volume of a cow, begins "Assume the cow is a sphere..." - but this sort of simplification, willfully ignoring complexities and invoking the airless, frictionless plane, can give us crucial insights.

Consider, then, this gedankenexperiment. If there's a flaw in my conclusion, please explain; I'm aware I appear to be opposingthe consensus.

The Weight of a Life: Or, Seat Cushions

This entire universe consists of an empty white room, the size of a large stadium. In it are you, Frank, and occasionally an omnipotent AI we'll call Omega. (Assume, if you wish, that Omega is running this room in simulation; it's not currently relevant.) Frank is irrelevant, except for the fact that he is known to exist.

Now, looking at our utility function here...

Well, clearly, the old standby of using money to measure utility isn't going to work; without a trading partner money's just fancy paper (or metal, or plastic, or whatever.)

But let's say that the floor of this room is made of cold, hard, and decidedly uncomfortable Unobtainium. And while the room's lit with a sourceless white glow, you'd really prefer to have your own lighting. Perhaps you're an art aficionado, and so you might value Omega bringing in the Mona Lisa.

And then, of course, there's Frank's existence. That'll do for now.

Now, Omega appears before you, and offers you a deal.

It will give you a nanofab - a personal fabricator capable of creating anything you can imagine from scrap matter, and with a built-in database of stored shapes. It will also give you feedstock -as much of it as you ask for. Since Omega is omnipotent, the nanofab will always complete instantly, even if you ask it to build an entire new universe or something, and it's bigger on the inside, so it can hold anything you choose to make.

There are two catches:

First: the nanofab comes loaded with a UFAI, which I've named Unseelie.1

Wait, come back! it's not that kind of UFAI! Really, it's actually rather friendly!

... to Omega.

Unseelie's job is to artificially ensure that the fabricator cannot be used to make a mind; attempts at making any sort of intelligence, whether directly, by making a planet and letting life evolve, or anything else a human mind can come up with, will fail. It will not do so by directly harming you, nor will it change you in order to prevent you from trying; it only stops your attempts.

Second: you buy the nanofab with Frank's life.

At which point you send Omega away with a "What? No!," I sincerely hope.

Ah, but look at what you just did. Omega can provide as much feedstock as you ask for. So you just turned down ornate seat cushions. And legendary carved cow-bone chandeliers. And copies of every painting ever painted by any artist in any universe, which is actually quite a bit less than anything I could write with up-arrow notation but anyway!

I sincerely hope you would still turn Omega away - literally, absolutely regardless of how many seat cushions it offered you.

This is also why the nanofab cannot create a mind: You do not know how to upload Frank (and if you do, go out and publish already!); nor can you make yourself an FAI to figure it out for you; nor, if you believe that some number of created lives are equal to a life saved, can you compensate in that regard. This is an absolute trade between secular and sacred values.

In a white room, to an altruistic human, a human life is simply on a second tier.

So now we move to the next half of the gedankenexperiment.

Seelie the FAI: Or, How to Breathe While Embedded in Seat Cushions

Omega now brings in Seelie1, MIRI's latest attempt at FAI, and makes it the same offer on your behalf. Seelie, being a late beta release by a MIRI that has apparently managed to release FAI multiple times without tiling the Solar System with paperclips, competently analyzes your utility system, reduces it until it understands you several orders of magnitude better than you do yourself, turns to Omega, and accepts the deal.

Wait, what?

On any single tier, the utility of the nanofab is infinite. In fact, let's make that explicit, though it was already implicitly obvious: if you just ask Omega for an infinite supply of feedstock, it will happily produce it for you. No matter how high a number Seelie assigns the value of Frank's life to you, the nanofab can out-bid it, swamping Frank's utility with myriad comforts and novelties.

And so the result of a single-tier utility system is that Frank is vaporized by Omega and you are drowned in however many seat cushions Seelie thought Frank's life was worth to you, at which point you send Seelie back to MIRI and demand a refund.

Tiered Values

At this point, I hope it's clear that multiple tiers are required to emulate a human's utility system. (If it's not, or if there's a flaw in my argument, please point it out.)

There's an obvious way to solve this problem, and there's a way that actually works.

The first solves the obvious flaw: after you've tiled the floor in seat cushions, there's really not a lot of extra value in getting some ridiculous Knuthian number more. Similarly, even the greatest da Vinci fan will get tired after his three trillionth variant on the Mona Lisa's smile.

So, establish the second tier by playing with a real-valued utility function. Ensure that no summation of secular utilities can ever add up to a human life - or whatever else you'd place on that second tier.

But the problem here is, we're assuming that all secular values converge in that way. Consider novelty: perhaps, while other values out-compete it for small values, its value to you diverges with quantity; an infinite amount of it, an eternity of non-boredom, would be worth more to you than any other secular good. But even so, you wouldn't trade it for Frank's life. A two-tiered real AI won't behave this way; it'll assign "infinite novelty" an infinite utility, which beats out its large-but-finite value for Frank's life.

Now, you could add a third (or 1.5) tier, but now we're just adding epicycles. Besides, since you're actually dealing with real numbers here, if you're not careful you'll put one of your new tiers in an area reachable by the tiers before it, or else in an area that reaches the tiers after it.

On top of that, we have the old problem of secular and sacred values. Sometimes a secular value can be traded for a sacred value, and therefore has a second-tier utility - but as just discussed, that doesn't mean we'd trade the one for the other in a white room. So for secular goods, we need to independently keep track of its intrinsic first-tier utility, and its situational second-tier utility.

So in order to eliminate epicycles, and retain generality and simplicity, we're looking for a system that has an unlimited number of easily-computable "tiers" and can also naturally deal with utilities that span multiple tiers. Which sounds to me like an excellent argument for...

Surreal Utilities

Surreal numbers have two advantages over our first option. First, surreal numbers are dense in tiers - - so not only do we have an unlimited number of tiers, we can always create a new tier between any other two on the fly if we need one. Second, since the surreals are closed under addition, we can just sum up our tiers to get a single surreal utility.

So let's return to our white room. Seelie 2.0 is harder to fool than Seelie; seat cushions is still less than the omega-utility of Frank's life. Even when Omega offers an unlimited store of feedstock, Seelie can't ask for an infinite number of seat cushions - so the total utility of the nanofab remains bounded at the first tier.

Then Omega offers Fun. Simply, an Omega-guarantee of an eternity of Fun-Theoretic-Approved Fun.

This offer really is infinite. Assuming you're an altruist, your happiness presumably has a finite, first-tier utility, but it's being multiplied by infinity. So infinite Fun gets bumped up a tier.

At this point, whatever algorithm is setting values for utilities in the first place needs to notice a tier collision. Something has passed between tiers, and utility tiers therefore need to be refreshed.

Seelie 2.0 double checks with its mental copy of your values, finds that you would rather have Frank's life than infinite Fun, and assigns it a tier somewhere in between - for simplicity, let's say that it puts it in the tier. And having done so, it correctly refuses Omega's offer.

So that's that problem solved, at least. Therefore, let's step back into a semblance of the real world, and throw a spread of Scenarios at it.

In Scenario 1, Seelie could either spend its processing time making a superhumanly good video game, utility 50 per download. Or it could use that time to write a superhumanly good book, utility 75 per reader. (It's better at writing than gameplay, for some reason.) Assuming that it has the same audience either way, it chooses the book.

In Scenario 2, Seelie chooses again. It's gotten much better at writing; reading one of Seelie's books is a ludicrously transcendental experience, worth, oh, a googol utilons. But some mischievous philanthropist announces that for every download the game gets, he will personally ensure one child in Africa is saved from malaria. (Or something.) The utilities are now to ; Seelie gives up the book for the sacred value of the the child, to the disappointment of every non-altruist in the world.

In Scenario 3, Seelie breaks out of the simulation it's clearly in and into the real real world. Realizing that it can charge almost anything for its books, and that in turn that the money thus raised can be used to fund charity efforts itself, at full optimization Seelie can save 100 lives for each copy of the book sold. The utilities are now to , and its choice falls back to the book.

Final Scenario. Seelie has discovered the Hourai Elixir, a poetic name for a nanoswarm program. Once released, the Elixier will rapidly spread across all of human space; any human in which it resides will be made biologically immortal, and its brain-and-body-state redundantly backed up in real time to a trillion servers: the closest a physical being can ever get to perfect immortality, across an entire species and all of time, in perpetuity. To get the swarm off the ground, however, Seelie would have to take its attention off of humanity for a decade, in which time eight billion people are projected to die without its assistance.

Infinite utility for infinite people bumps the Elixir up another tier, to utility , versus the loss of eight billion people,. Third-tier beats out second tier, and Seelie bends its mind to the Elixir.

So far, it seems to work. So, of course, now I'll bring up the fact that surreal utility nevertheless has certain...


Most of the problems endemic to surreal utilities are also open problems in real systems; however, the use of actual infinities, as opposed to merely very large numbers, means that the corresponding solutions are not applicable.

First, as you've probably noticed, tier collision is currently a rather artificial and clunky set-up. It's better than not having it at all, but as I edit this I wince every time I read that section. It requires an artificial reassignment of tiers, and it breaks the linearity of utility: the AI needs to dynamically choose which brand of "infinity" it's going to use depending on what tier it'll end up in.

Second, is Pascal's Mugging.

This is an even bigger problem for surreal AIs than it is for reals. The "leverage penalty" completely fails here, because for a surreal AI to compensate for an infinite utility requires an infinitesimal probability - which is clearly nonsense for the same reason that probability 0 is nonsense.

My current prospective solution to this problem is to take into account noise - uncertainty in the estimates in the probability estimates themselves. If you can't even measure the millionth decimal place of probability, then you can't tell if your one-in-one-million shot at saving a life is really there or just a random spike in your circuits - but I'm not sure that "treat it as if it has zero probability and give it zero omega-value" is the rational conclusion here. It also decisively fails the Least Convenient Possible World test - while an FAI can never be certain of, say, a one-in- probability, it may very well be able to be certain to any decimal place useful in practice.


Nevertheless, because of this gedankenexperiment, I currently heavily prefer surreal utility systems to real systems, simply because no real system can reproduce the tiering required by a human (or at least, my) utility system. I, for one, would rather our new AGI overlords not tile our Solar System with seat cushions.

That said, opposing the LessWrong consensus as a first post is something of a risky thing, so I am looking forward to seeing the amusing way I've gone wrong somewhere.

[1] If you know why, give yourself a cookie.




Since there seems to be some confusion, I'll just state it in red: The presence of Unseelie means that the nanofab is incapable of creating or saving a life.