The Hidden Complexity of Wishes

by Eliezer Yudkowsky7 min read24th Nov 2007135 comments

89

Complexity of ValueAI Risk
Frontpage

"I wish to live in the locations of my choice, in a physically healthy, uninjured, and apparently normal version of my current body containing my current mental state, a body which will heal from all injuries at a rate three sigmas faster than the average given the medical technology available to me, and which will be protected from any diseases, injuries or illnesses causing disability, pain, or degraded functionality or any sense, organ, or bodily function for more than ten days consecutively or fifteen days in any year..."
            -- The Open-Source Wish Project, Wish For Immortality 1.1

There are three kinds of genies:  Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.

Suppose your aged mother is trapped in a burning building, and it so happens that you're in a wheelchair; you can't rush in yourself.  You could cry, "Get my mother out of that building!" but there would be no one to hear.

Luckily you have, in your pocket, an Outcome Pump.  This handy device squeezes the flow of time, pouring probability into some outcomes, draining it from others.

The Outcome Pump is not sentient.  It contains a tiny time machine, which resets time unless a specified outcome occurs.  For example, if you hooked up the Outcome Pump's sensors to a coin, and specified that the time machine should keep resetting until it sees the coin come up heads, and then you actually flipped the coin, you would see the coin come up heads.  (The physicists say that any future in which a "reset" occurs is inconsistent, and therefore never happens in the first place - so you aren't actually killing any versions of yourself.)

Whatever proposition you can manage to input into the Outcome Pump, somehow happens, though not in a way that violates the laws of physics.  If you try to input a proposition that's too unlikely, the time machine will suffer a spontaneous mechanical failure before that outcome ever occurs.

You can also redirect probability flow in more quantitative ways using the "future function" to scale the temporal reset probability for different outcomes.  If the temporal reset probability is 99% when the coin comes up heads, and 1% when the coin comes up tails, the odds will go from 1:1 to 99:1 in favor of tails.  If you had a mysterious machine that spit out money, and you wanted to maximize the amount of money spit out, you would use reset probabilities that diminished as the amount of money increased.  For example, spitting out $10 might have a 99.999999% reset probability, and spitting out $100 might have a 99.99999% reset probability.  This way you can get an outcome that tends to be as high as possible in the future function, even when you don't know the best attainable maximum.

So you desperately yank the Outcome Pump from your pocket - your mother is still trapped in the burning building, remember? - and try to describe your goal: get your mother out of the building!

The user interface doesn't take English inputs.  The Outcome Pump isn't sentient, remember?  But it does have 3D scanners for the near vicinity, and built-in utilities for pattern matching.  So you hold up a photo of your mother's head and shoulders; match on the photo; use object contiguity to select your mother's whole body (not just her head and shoulders); and define the future function using your mother's distance from the building's center.  The further she gets from the building's center, the less the time machine's reset probability.

You cry "Get my mother out of the building!", for luck, and press Enter.

For a moment it seems like nothing happens.  You look around, waiting for the fire truck to pull up, and rescuers to arrive - or even just a strong, fast runner to haul your mother out of the building -

BOOM!  With a thundering roar, the gas main under the building explodes.  As the structure comes apart, in what seems like slow motion, you glimpse your mother's shattered body being hurled high into the air, traveling fast, rapidly increasing its distance from the former center of the building.

On the side of the Outcome Pump is an Emergency Regret Button.  All future functions are automatically defined with a huge negative value for the Regret Button being pressed - a temporal reset probability of nearly 1 - so that the Outcome Pump is extremely unlikely to do anything which upsets the user enough to make them press the Regret Button.  You can't ever remember pressing it.  But you've barely started to reach for the Regret Button (and what good will it do now?) when a flaming wooden beam drops out of the sky and smashes you flat.

Which wasn't really what you wanted, but scores very high in the defined future function...

The Outcome Pump is a genie of the second class.  No wish is safe.

If someone asked you to get their poor aged mother out of a burning building, you might help, or you might pretend not to hear.  But it wouldn't even occur to you to explode the building.  "Get my mother out of the building" sounds like a much safer wish than it really is, because you don't even consider the plans that you assign extreme negative values.

Consider again the Tragedy of Group Selectionism: Some early biologists asserted that group selection for low subpopulation sizes would produce individual restraint in breeding; and yet actually enforcing group selection in the laboratory produced cannibalism, especially of immature females.  It's obvious in hindsight that, given strong selection for small subpopulation sizes, cannibals will outreproduce individuals who voluntarily forego reproductive opportunities.  But eating little girls is such an un-aesthetic solution that Wynne-Edwards, Allee, Brereton, and the other group-selectionists simply didn't think of it.  They only saw the solutions they would have used themselves.

Suppose you try to patch the future function by specifying that the Outcome Pump should not explode the building: outcomes in which the building materials are distributed over too much volume, will have ~1 temporal reset probabilities.

So your mother falls out of a second-story window and breaks her neck.  The Outcome Pump took a different path through time that still ended up with your mother outside the building, and it still wasn't what you wanted, and it still wasn't a solution that would occur to a human rescuer.

If only the Open-Source Wish Project had developed a Wish To Get Your Mother Out Of A Burning Building:

"I wish to move my mother (defined as the woman who shares half my genes and gave birth to me) to outside the boundaries of the building currently closest to me which is on fire; but not by exploding the building; nor by causing the walls to crumble so that the building no longer has boundaries; nor by waiting until after the building finishes burning down for a rescue worker to take out the body..."

All these special cases, the seemingly unlimited number of required patches, should remind you of the parable of Artificial Addition - programming an Arithmetic Expert Systems by explicitly adding ever more assertions like "fifteen plus fifteen equals thirty, but fifteen plus sixteen equals thirty-one instead".

How do you exclude the outcome where the building explodes and flings your mother into the sky?  You look ahead, and you foresee that your mother would end up dead, and you don't want that consequence, so you try to forbid the event leading up to it.

Your brain isn't hardwired with a specific, prerecorded statement that "Blowing up a burning building containing my mother is a bad idea."  And yet you're trying to prerecord that exact specific statement in the Outcome Pump's future function.  So the wish is exploding, turning into a giant lookup table that records your judgment of every possible path through time.

You failed to ask for what you really wanted.  You wanted your mother to go on living, but you wished for her to become more distant from the center of the building.

Except that's not all you wanted.  If your mother was rescued from the building but was horribly burned, that outcome would rank lower in your preference ordering than an outcome where she was rescued safe and sound.  So you not only value your mother's life, but also her health.

And you value not just her bodily health, but her state of mind. Being rescued in a fashion that traumatizes her - for example, a giant purple monster roaring up out of nowhere and seizing her - is inferior to a fireman showing up and escorting her out through a non-burning route.  (Yes, we're supposed to stick with physics, but maybe a powerful enough Outcome Pump has aliens coincidentally showing up in the neighborhood at exactly that moment.)  You would certainly prefer her being rescued by the monster to her being roasted alive, however.

How about a wormhole spontaneously opening and swallowing her to a desert island?  Better than her being dead; but worse than her being alive, well, healthy, untraumatized, and in continual contact with you and the other members of her social network.

Would it be okay to save your mother's life at the cost of the family dog's life, if it ran to alert a fireman but then got run over by a car?  Clearly yes, but it would be better ceteris paribus to avoid killing the dog.  You wouldn't want to swap a human life for hers, but what about the life of a convicted murderer?  Does it matter if the murderer dies trying to save her, from the goodness of his heart?  How about two murderers?  If the cost of your mother's life was the destruction of every extant copy, including the memories, of Bach's Little Fugue in G Minor, would that be worth it?  How about if she had a terminal illness and would die anyway in eighteen months?

If your mother's foot is crushed by a burning beam, is it worthwhile to extract the rest of her?  What if her head is crushed, leaving her body?  What if her body is crushed, leaving only her head?  What if there's a cryonics team waiting outside, ready to suspend the head?  Is a frozen head a person?  Is Terry Schiavo a person?  How much is a chimpanzee worth?

Your brain is not infinitely complicated; there is only a finite Kolmogorov complexity / message length which suffices to describe all the judgments you would make.  But just because this complexity is finite does not make it small.  We value many things, and no they are not reducible to valuing happiness or valuing reproductive fitness.

There is no safe wish smaller than an entire human morality.  There are too many possible paths through Time.  You can't visualize all the roads that lead to the destination you give the genie.  "Maximizing the distance between your mother and the center of the building" can be done even more effectively by detonating a nuclear weapon.  Or, at higher levels of genie power, flinging her body out of the Solar System.  Or, at higher levels of genie intelligence, doing something that neither you nor I would think of, just like a chimpanzee wouldn't think of detonating a nuclear weapon.  You can't visualize all the paths through time, any more than you can program a chess-playing machine by hardcoding a move for every possible board position.

And real life is far more complicated than chess.  You cannot predict, in advance, which of your values will be needed to judge the path through time that the genie takes.  Especially if you wish for something longer-term or wider-range than rescuing your mother from a burning building.

I fear the Open-Source Wish Project is futile, except as an illustration of how not to think about genie problems.  The only safe genie is a genie that shares all your judgment criteria, and at that point, you can just say "I wish for you to do what I should wish for."  Which simply runs the genie's should function.

Indeed, it shouldn't be necessary to say anything.  To be a safe fulfiller of a wish, a genie must share the same values that led you to make the wish. Otherwise the genie may not choose a path through time which leads to the destination you had in mind, or it may fail to exclude horrible side effects that would lead you to not even consider a plan in the first place.  Wishes are leaky generalizations, derived from the huge but finite structure that is your entire morality; only by including this entire structure can you plug all the leaks.

With a safe genie, wishing is superfluous.  Just run the genie.

89

135 comments, sorted by Highlighting new comments since Today at 3:41 PM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Is there a safe way to wish for an unsafe genie to behave like a safe genie? That seems like a wish TOSWP should work on.

-11matty8y
-1themusicgod17yA sufficiently powerful genie might make safe genies by definition more unsafe. Then your wish could be granted. edit (2015) caution: I think this particular comment is harmless in retrospect... but I wouldn't give it much weight [http://www.overcomingbias.com/2008/02/arguing-by-defi.html]
0billy_the_kid6yI wish for you to interpret my wishes how I interpret them. Can anyone find a problem with that?
1ike6yWhat if you never think about the interpretation? Or is this how you would interpret them? Define would, then. If you think about the interpretation, then you can already explain it. The problem is because you don't actually think about every aspect and possibility while wishing.
2Jiro6yEven if you never think about the interpretation, most aspects of wishes will have an implicit interpretation based on your values. You may never have thought about whether wishing for long life should turn you into a fungal colony, but if you had been asked "does your wish for long life mean you'd want to be turned into a fungal colony", you'd have said "no".
1RichardKennaway6yEven when making requests of other people, they may fulfil them in ways you would prefer they hadn't. The more powerful the genie is at divining your true intent, the more powerfully it can find ways of fulfilling your wishes that may not be what you want. It is not obvious that there is a favorable limit to this process. Your answers to questions about your intent may depend on the order the questions are asked. Or they may depend on what knowledge you have, and if you study different things you may come up with different answers. Given a sufficiently powerful genie, there is no real entity that is "how I interpret the wish". How is the genie supposed to know your answers to all possible questions of interpretation? Large parts of "your interpretation" may not exist until you are asked about some hypothetical circumstance. Even if you are able to answer every such question, how is the genie to know the answer without asking you? Only by having a model of you sufficiently exact that you are confident it will give the same answers you would, even to questions you have not thought of and would have a hard time answering. But that is wishing for the genie to do all the work of being you. A lot of transhumanist dreams seem to reduce to this: a Friendly AGI will do for us all the work of being us.
1Jiro6yIf I ask the genie for long life, and the genie is forced to decide between a 200 year lifespan with a 20% chance of a painful death and a 201 year lifespan with a 21% chance of a painful death, it is possible that the genie might not get my preferences exactly correct, or that my preferences between those two results may depend on how I am asked or how I am feeling at the time. But if the genie messed up and picked the one that didn't really match my preferences, I would only be slightly displeased. I observe that this goes together: in cases where it would be genuinely hard or impossible for the genie to figure out what I prefer, the fact that the genie might not get my preferences correct only bothers me a little. In cases where extrapolating my preferences is much easier, the genie getting them wrong would matter to me a lot more (I would really not like a genie that grants my wish for long life by turning me into a fungal colony). So just because the genie can't know the answer to every question about my extrapolated preferences doesn't mean that the genie can't know it to a sufficient degree that I would consider the genie good to ask for wishes.
1Epictetus6yIf the genie merely alters the present to conform to your wishes, you can easily run into unintended consequences. The other problem is that divining someone's intent is tricky business. A person often has a dozen impulses at cross-purposes to one another and the interpretation of your wish will likely vary depending on how much sleep you got and what you had for lunch. There's a sci-fi short story Oddy and Id that examines a curious case of a man with luck so amazing that the universe bends to satisfy him. I won't spoil it, but I think it brings up a relevant point.
2TheWakalix2yIf you can rigorously define Safety, you've already solved the Safety Problem. This isn't a shortcut.

"I wish for a genie that shares all my judgment criteria" is probably the only safe way.

This might be done by picking an arbitrary genie, and then modifying your judgement criteria to match that genie's.

3CuriousMeta1yWhich is perhaps most efficiently achieved by killing the wisher and returning an arbitrary inanimate object.
1AndHisHorse7yWhat if your judgement criteria are fluid - depending, perhaps, on your current hormonal state, your available knowledge, and your particular position in society?
3CynicalOptimist4yI see where you're coming from on this one. I'd only add this: if a genie is to be capable of granting this wish, it would need to know what your judgements were. It would need to understand them, at least as well as you do. This pretty much resolves to the same problem that Eliezer already discussed. To create such a genie, you would either need to explain to the genie how you would feel about every possible circumstance, or you would need to program the genie so as to be able to correctly figure it out. Both of these tasks are probably a lot harder than they sound.

Sounds like we need to formalize human morality first, otherwise you aren't guaranteed consistency. Of course formalizing human morality seems like a hopeless project. Maybe we can ask an AI for help!

4wizzwizz49moFormalising human morality is easy! 1. Determine a formalised morality system close enough to the current observed human morality system that humans will be able to learn and accept it, 2. Eliminate all human culture (easier than eliminating only parts of it). 3. Raise humans with this morality system (which by the way includes systems for reducing value drift, so the process doesn't have to be repeated too often). 4. When value drift occurs, goto step 2.

On further reflection, the wish as expressed by Nick Tarleton above sounds dangerous, because all human morality may either be inconsistent in some sense, or 'naive' (failing to account for important aspects of reality we aren't aware of yet). Human morality changes as our technology and understanding changes, sometimes significantly. There is no reason to believe this trend will stop. I am afraid (genuine fear, not figure of speech) that the quest to properly formalize and generalize human morality for use by a 'friendly AI' is akin to properly formalizing and generalizing Ptolemean astronomy.

This generalises. Since you don't know everything, anything you do might wind up being counterproductive.

Like, I once knew a group of young merchants who wanted their shopping district revitalised. They worked at it and got their share of federal money that was assigned to their city, and they got the lighting improved, and the landscaping, and a beautiful fountain, and so on. It took several years and most of the improvements came in the third year. Then their landlords all raised the rents and they had to move out.

That one was predictable in hindsight, b... (read more)

0HungryHobo5yI think the unlimited potential for bad outcomes may be a problem there. After all, the house might not explode, instead a military transport plane nearby might suffer a failure and the nuclear weapon on board might suffer a very unlikely set of failures and trigger on impact killing everyone for miles and throwing your mothers body far far far away. The pump isn't just dangerous to those involved and nearby. Most consequences are limited in scope. You have a slim chance of killing many others through everyday accident but a pump would magnify that terribly.
1nyralech5yThat depends entirely on how the pump works. If it picks uniformly among bad outcomes, your point might be correct. However, it might still be biased towards narrow local effects for sheer sake of computability. If this is the case, I don't see why it would necessarily shift towards bigger bad outcomes rather than more limited ones.
0HungryHobo5yIn the example I gave the nuke exploding would be a narrow local effect which bleeds over into a large area. I agree that a pump which needed to monitor everything might very well choose only quite local direct effects but that could still have a lot of long range bad side effects. Bursting the damn a few hundred meters upriver might have the effect of carrying your mother, possibly even alive, far from the center of the building and it may also involve extinguishing the fire if you've thought to add that in as a desirable element of the outcome yet lead to wiping out a whole town ten miles downstream. The sort of the point is that the pump wouldn't care about those side effects.
1nyralech5yBut those outcomes which have a limited initial effect yet have a very large overall effect are very sparsely distributed among all possible outcomes with a limited initial effect. I still do not see why the pump would magnify the chance of those outcomes terribly. The space of possible actions which have a very large negative utility grows by a huge amount, but so does the space of actions which have trivial consequences beside doing what you want.
0CynicalOptimist4yI agree, just because something MIGHT backfire, it doesn't mean we automatically shouldn't try it. We should weigh up the potential benefits and the potential costs as best we can predict them, along with our best guesses about the likelihood of each. In this example, of course, the lessons we learn about "genies" are supposed to be applied to artificial intelligences. One of the central concepts that Eliezer tries to express about AI is that when we get an AI that's as smart as humans, we will very quickly get an AI that's very much smarter than humans. At that point, the AI can probably trick us into letting it loose, and it may be able to devise a plan to achieve almost anything. In this scenario, the potential costs are almost unlimited. And the probability is hard to work out. Therefore figuring out the best way to program it is very very important. Because that's a genie... {CSI sunglasses moment} ... that we can't put back in the bottle.

Wonderfully provocative post (meaning no disregard toward the poor old woman caught in the net of a rhetorical and definitional impasse). Obviously in reference to the line of thought in the "devil's dilemma" enshrined in the original Bedazzled, and so many magic-wish-fulfillment folk tales, in which there is always a loophole exploited by a counter-force, probably IMO in response to the motive to shortcut certain aspects of reality and its regulatory processes, known or unknown. It would be interesting to collect real life anecdotes about peop... (read more)

It seems contradictory to previous experience that humans should develop a technology with "black box" functionality, i.e. whose effects could not be foreseen and accurately controlled by the end-user. Technology has to be designed and it is designed with an effect/result in mind. It is then optimized so that the end user understands how to call forth this effect. So positing an effective equivalent of the mythological figure "Genie" in technological form ignores the optimization-for-use that would take place at each stage of developing... (read more)

0CynicalOptimist4y"if the Pump could just be made to sense the proper (implied) parameters." You're right, this would be an essential step. I'd say the main point of the post was to talk about the importance, and especially the difficulty, of achieving this. Re optimisation for use: remember that this involves a certain amount of trial and error. In the case of dangerous technologies like explosives, firearms, or high speed vehicles, the process can often involve human beings dying, usually in the "error" part of trial and error. If the technology in question was a super-intelligent AI, smart enough to fool us and engineer whatever outcome best matched its utility function? Then potentially we could find ourselves unable to fix the "error". Please excuse the cheesy line, but sometimes you can't put the genie back in the bottle. Re the workings of the human brain? I have to admit that I don't know the meaning of ceteris paribus, but I think that the brain mostly works by pattern recognition. In a "burning house" scenario, people would mostly contemplate the options that they thought were "normal" for the situation, or that they had previously imagined, heard about, or seen on TV Generating a lot of different options and then comparing them for expected utility isn't the sort of thing that humans do naturally. It's the sort of behaviour that we have to be trained for, if you want us to apply it.

Eric, I think he was merely attempting to point out the futility of wishes. Or rather, the futility of asking something for something you want that does not share your judgments on things. The Outcome pump is merely, like the Genie, a mechanism by which to explain his intended meaning. The problem of the outcome pump is, twofold: 1. Any theory that states that time is anything other than a constant now with motion and probability may work mathematically but has yet to be able to actually alter the thing which it describes in a measurable way, and 2. The pr... (read more)

On further reflection, the wish as expressed by Nick Tarleton above sounds dangerous, because all human morality may either be inconsistent in some sense, or 'naive' (failing to account for important aspects of reality we aren't aware of yet).

You're right. Hence, CEV.

Eliezer, you read Home on the Strange?

So positing an effective equivalent of the mythological figure "Genie" in technological form ignores the optimization-for-use that would take place at each stage of developing an Outcome-Pump. The technology-falling-from-heaven which is the Outcome Pump demands that we reverse engineer the optimization of parameters which would have necessarily taken place if it had in fact developed as human technologies do.

Unfortunately, Eric, when you build a powerful enough Outcome Pump, it can wish more powerful Outcome Pumps into existence, which can in tur... (read more)

"Unfortunately, Eric, when you build a powerful enough Outcome Pump, it can wish more powerful Outcome Pumps into existence, which can in turn wish even more powerful Outcome Pumps into existence."

Yes, technology that develops itself, once a certain point of sophistication is reached.

My only acquaintance with AI up to now has been this website: http://www.20q.net Which contains a neural network that has been learning for two decades or so. It can "read your mind" when you're thinking of a character from the TV show The Simpsons. Pretty incredible actually!

Eliezer, I clicked on your name in the above comment box and voila- a whole set of resources to learn about AI. I also found out why you use the adjective "unfortunately" in reference to the Outcome Pump, as its on the Singularity Institute website. Fascinating stuff!

"It seems contradictory to previous experience that humans should develop a technology with "black box" functionality, i.e. whose effects could not be foreseen and accurately controlled by the end-user."

Eric, have you ever been a computer programmer? That technology becomes more and more like a black box is not only in line with previous experience, but I dare say is a trend as technological complexity increases.

"Eric, have you ever been a computer programmer? That technology becomes more and more like a black box is not only in line with previous experience, but I dare say is a trend as technological complexity increases."

No I haven't. Could you expand on what you mean?

In the first year of law school students learn that for every clear legal rule there always exists situations for which either the rule doesn't apply or for which the rule gives a bad outcome. This is why we always need to give judges some discretion when administering the law.

James Miller, have you read The Myth of the Rule of Law? What do you think of it?

Every computer programmer, indeed anybody who uses computers extensively has been surprised by computers. Despite being deterministic, a personal computer taken as a whole (hardware, operating system, software running on top of the operating system, network protocols creating the internet, etc. etc.) is too large for a single mind to understand. We have partial theories of how computers work, but of course partial theories sometimes fail and this produces surprise.

This is not a new development. I have only a partial theory of how my car works, but in th... (read more)

1danlowlite10yMaterial sciences can give us an estimate on the shattering of a given material given certain criteria. Just because you do not know specific things about it doesn't make it a black box. Of course, that doesn't make the problems with complex systems disappear, it just exposes our ignorance. Which is not a new point here.

TGGP,

I have not read the Myth of the Rule of Law.

Given that it's impossible for the someone to know your total mind without being it, the only safe genie is yourself.

From the above it's easy to see why it's never possible to define the "best interests" of anyone but your own self. And from that it's possible to show that it's never possible to define the best interests of the public, except through their individually chosen actions. And from that you can derive libertarianism.

Just an aside :-)

0Roko11yWhat about a genie that knows what you would do (and indeed what everyone else in the world would do), but doesn't have subjective experiences, so isn't actually anybody?
0JulianMorrison11yNot enough information. The genie is programmed to do what with that knowledge? If it's CEV done right, it's safe.

"Ultimately, most objects, man-made or not are 'black boxes.'"

OK, I see what you're getting at.

Three questions about black boxes:

1) Does the input have to be fully known/observable to constitute a black box? When investigating a population of neurons, we can give stimulus to these cells, but we cannot be sure that we are aware of all the inputs they are receiving. So we effectively do not entirely understand the input being given.

2) Does the output have to be fully known/observable to constitute a black box? When we measure the output of a popula... (read more)

0CynicalOptimist4yI like this style of reasoning. Rather than taking some arbitrary definition of black boxes and then arguing about whether they apply, you've recognised that a phrase can be understood in many ways, and we should use the word in whatever way most helps us in this discussion. That's exactly the sort of rationality technique we should be learning. A different way of thinking about it though, is that we can remove the confusing term altogether. Rather than defining the term "black box", we can try to remember why it was originally used, and look for another way to express the intended concept. In this case, I'd say the point was: "Sometimes, we will use a tool expecting to get one result, and instead we will get a completely different, unexpected result. Often we can explain these results later. They may even have been predictable in advance, and yet they weren't predicted." Computer programming is especially prime to this. The computer will faithfully execute the instructions that you gave it, but those instructions might not have the net result that you wanted.

TGGP: What did you think of it? I agree till the Socrates Universe, but thought the logic goes downhill from there.

tggp, that paper was interesting, although I found its thesis unremarkable. You should share it with our pal Mencius.

Upon some reflection, I remembered that Robin has showed that two Bayesians who share the same priors can't disagree. So perhaps you can get your wish from an unsafe genie by wishing, "... to run a genie that perfectly shares my goals and prior probabilities."

As long as you're wishing, wouldn't you rather have a genie whose prior probabilities correspond to reality as accurately as possible? I wouldn't pick an omnipotent but equally ignorant me to be my best possible genie.

"As long as you're wishing, wouldn't you rather have a genie whose prior probabilities correspond to reality as accurately as possible?"

Such a genie might already exist.

In the first year of law school students learn that for every clear legal rule there always exists situations for which either the rule doesn't apply or for which the rule gives a bad outcome.

If the rule doesn't apply, it's not relevant in the first place. I doubt very much you can establish what a 'bad' outcome would involve in such a way that everyone would agree - and I don't see why your personal opinion on the matter should be of concern when we consider legal design.

Such a genie might already exist.
You mean GOD? From the good book? It's more plausible than some stories I could mention.

GOD, I meta-wish for an ((...Emergence-y Re-get) Emergence-y Re-get) Emergency Regret Button.

Recovering Irrationalist said:

I wouldn't pick an omnipotent but equally ignorant me to be my best possible genie.

Right. It's silly to wish for a genie with the same beliefs as yourself, because the system consisting of you and an unsafe genie is already such a genie.

I discussed "The Myth of the Rule of Law" with Mencius Moldbug here. I recognize that politics alters the application of law and that as long as it is written in natural language there will be irresolvable differences over its meaning. At the same time I observe that different countries seem to hold different levels of respect for the "rule of law" that the state is expected to obey, and it appears to me that those more prone to do so have more livable societies. I think the norm of neutrality on the part of judges applying law with obj... (read more)

"You cannot predict, in advance, which of your values will be needed to judge the path through time that the genie takes.... The only safe genie is a genie that shares all your judgment criteria."

Is a genie that does share all my judgment criteria necessarily safe?

Maybe my question is ill-formed; I am not sure what "safe" could mean besides "a predictable maximizer of my judgment criteria". But I am concerned that human judgment under ordinary circumstances increases some sort of Beauty/Value/Coolness which would not be incr... (read more)

"Whatever proposition you can manage to input into the Outcome Pump, somehow happens, though not in a way that violates the laws of physics. If you try to input a proposition that's too unlikely, the time machine will suffer a spontaneous mechanical failure before that outcome ever occurs."

So, a kind of Maxwell's demon? :)

Rather than designing a genie to exactly match your moral criteria, the simple solution would be to cheat and use yourself as the genie. What the Outcome Pump should solve for is your own future satisfaction. To that end, you would omit all functionality other than the "regret button", and make the latter default-on, with activation by anything other than a satisfied-you vanishingly improbable. Say, with a lengthy password.

Of course, you could still end up in a universe where your brain has been spontaneously re-wired to hate your mother. However, I think that such an event is far less likely than a proper rescue.

You have a good point about the exhaustiveness required to ensure the best possible outcome. In that case the ability of the genie to act "safely" would depend upon the level of the genie's omniscience. For example, if the genie could predict the results of any action it took, you could simply ask it to select any path that results in you saying "thanks genie, great job" without coercion. Therefore it would effectively be using you as an oracle of success or failure.

A non-omniscient genie would either need complete instructions, or woul... (read more)

With a safe genie, wishing is superfluous. Just run the genie.

But while most genies are terminally unsafe, there is a domain of "nearly-safe" genies, which must dwarf the space of "safe" genies (examples of a nearly-safe genie: one that picks the moral code of a random living human before deciding on an action or a safe genie + noise). This might sound like semantics, but I think the search for a totally "safe" genie/AI is a pipe-dream, and we should go for "nearly safe" (I've got a short paper on one approach to this here).

I am worried that properties P1...Pk are somehow valuable.

In what sense can they be valuable, if they are not valued by human judgment criteria (even if not consciously most of the time)?

For example, if the genie could predict the results of any action it took, you could simply ask it to select any path that results in you saying "thanks genie, great job" without coercion.

Formalizing "coercion" is itself an exhaustive problem. Saying "don't manipulate my brain except through my senses" is a big first step, but it doesn't exclude, e.g., powerful arguments that you don't really want your mother to live.

Nick,

Are you thinking of magically strong arguments, or ones that convince because they provide good reasons?

I'd think the latter would be valuable even if it leads to a result you'd initially suppose to be bad.

"In what sense can [properties P1...Pk] be valuable, if they are not valued by human judgment criteria (even if not consciously most of the time)?"

I don't know. It might be that the only sense in which something can be valuable is to look valuable according to human judgment criteria (when thoroughly implemented, and well informed, and all that). If so, my concern is ill-formed or irrelevant.

On the other hand, it seems possible that human judgments of value are an imperfect approximation of what is valuable in some other (external?) sense. Im... (read more)

Nick,

What makes you think that magically strong arguments are possible? I can imagine arguments that work better than they should because they indulge someone's unconscious inclinations or biases, but not ones that work better than their truthfulness would suggest and cut against the grain of one's inclinations.

I don't know that they are, but it's the conservative assumption, in that it carries less risk of the world being destroyed if you're wrong. Also, see the AI-box experiments.

I think the best way is to believe you and the genie are one. and therefore it is necessary to be grateful for everything you currently have ..this creates a loop. then you can be grateful for things you "will" have right now. For instance you can begin by affirming and feeling within yourself the gratitude for your financial wealth. Financial wealth...starts to appear!

Excellent post.

Damn, it took me a long time to make the connection between the Outcome Pump and quantum suicide reality editing. And the argument that proves the unsafety of the Outcome Pump is perfectly isomorphic to the argument why quantum immortality is scary.

"I wish that the genie could understand a programming language."

Then I could program it unambiguously. I obviously wouldn't be able to program my mother out of the burning building on the spot, but at least there would be a host of other wishes I could make that the genie won't be able to screw up.

"I wish that wishes would be granted as the wisher would interpret them".

0FAWS10yDoesn't protect against unforeseen consequences and is possibly underspecified (How should the wish work when it needs to affect things the wisher doesn't understand? Create a version of the wisher that does understand? What if there are multiple possible versions that don't agree on interpretations among each other?).
1pengvado10yDoesn't protect against a reflectively-consistent misinterpretation of "as the wisher would interpret them".

You wouldn't want to swap a human life for hers, but what about the life of a convicted murderer?

Are convicted murderers not human?

So if I specified to the Outcome Pump, that I want the outcome, where the person, that is future version of me (by DNA, and by physical continuity of the body), will write "ABRACADABRA, This outcome I good enough and I value it for $X" on the paper and put in on the outcome pump, and the $X is how much I value the outcome. And if this won't happen in one year, I don't want this outcome, either).

Are there any loopholes?

3Qiaochu_Yuan8yGenie takes over your body.

If the genie is clueless but not actively malicious, then you can ask the genie to describe how it will fulfill your wish. If it describes making the building explode and having your mother's dead body fly out, you correct the genie and tell it to try again. If it gives an inadequate description (says the building explodes and fails to mention what happens to the mother's body at all), you can ask it to elaborate. If it gives a description that is inadequate in exactly the right way to make you think it's describing it adequately while still leaving a huge loophole, there's not much you can do, but that's not a clueless genie, that's an actively malicious genie pretending to be a clueless one.

1shminux7ySo your recommendation is to use a human as a part of the genie's outcome utility evaluator, relying on human intelligence when deciding between multiple low-probability (i.e. miraculous) events? Even though people have virtually no intuition when dealing with them? I suspect the results would be pretty grave, but on a larger scale, since the negative consequences would be non-obvious and possibly delayed.
0Jiro7yA genie asked to rescue my mother from a burning building would do it by performing acts that, while miraculous, will be part of a chain of events that is comprehensible by humans. If the genie throws my mother out of the building at 100 miles per hour, for instance, it is miraculous that anyone can throw her out at that speed, but I certainly understand what it means to do that and am able to object. Even if the genie begins by manipulating some quantum energies in a way I can't understand, that's part of a chain of events that leads to throwing, a concept that I do understand. Yes, it is always possible that there are delayed negative consequences. Suppose it rescues my mother by opening a door and I have no idea that 10 years from now the mayor is going to be saved from an assassin by the door of a burned out wreck being in the closed position and blocking a bullet. But that kind of negative consequence is not unique to genies, and humans go around all their lives doing things with such consequences. Maybe the next time I donate to charity I have to move my arm in such a way that a cell falls in the path of an oncoming cosmic ray, thus giving me cancer 10 years later. As long as the genie isn't actively malicious and just pretending to be clueless, the risk of such things is acceptable for the same reason it's acceptable for non-genie human activities. Furthermore, if the genie is clueless, it won't hide the fact that its plan would kill my mother--indeed, it doesn't even know that it would need to hide that, since it doesn't know that that would overall displease me. So I should be able to figure out that that's its plan by talking to it.
1shminux7yRight, when humans do the usual human things, they put up with the butterfly effect and rely on their intuition and experience to reduce the odds of screwing things up badly in the short term. However, when evaluating the consequences of miracles we have nothing to guide us, so relying on a human evaluator in the loop is no better than relying on a three-year old to stay away from a ledge or candy box. Neither has a clue.
1MugaSofer7yThis is, of course, not true of superintelligence ... is that your point? Not really. The genie will look in parts of solution-space you wouldn't (eg setting off the gas main, killing everyone nearby.) Well, if it can talk. And it doesn't realise that you would sabotage the plan if you knew.
0Jiro7yWhy would this not be true of superintelligence, assuming the intelligence isn't actively malicious? "Talk to the genie" doesn't require that I be able to understand the solution space, just the result. If the genie is going to frazmatazz the whatzit, killing everyone in the building, I would still be able to discover that by talking to the genie. (Of course, I can't reduce the chance of disaster to zero this way, but I can reduce it to an acceptable level matching other human activities that don't have genies in them.) If it realizes I would sabotage the plan, then it knows that the plan would not satisfy me. If it pushes for the plan knowing that it won't satisfy me, then it's an actively malicious genie, not a clueless one.
1MugaSofer7ySuperintelligence can use strategies you can't undertstand. That was in response to the claim that genies' actions are no more likely to have unforeseen side-effects than human ones. ... no, that's kind of the definition of a clueless genie. A malicious one would be actively seeking out solutions that annoy you. (Also, some Good solutions might require fooling you for your own good, if only because there's no time to explain.)
0Jiro7yThere's a contradiction between "the superintelligence will do something you don't want" and "the superintelligence will do something you don't understand". Not wanting it implies I understand enough about it to not want it (even if I don't understand every single step). I would consider a clueless genie to be a genie that tries to grant my wishes, but because it doesn't understand me, grants my wishes in a way that I wouldn't want. A malicious genie is a genie that grants my wishes in a way that it knows I wouldn't want. Reserving that term for genies that intentionally annoy while excluding genies that merely knowingly annoy is hairsplitting and only changes the terminology anyway. If I would in fact want genies to fool me for my own good in such situations, this isn't a problem. On the other hand, if I think that genies should not try to fool me for my own good in such situations, and the genie knows this, and it fools me for my own good anyway, it's a malicious genie by my standards. The genie has not failed to understand me; it understands what I want perfectly well, but knowingly does something contrary to its understanding of my desires. In the original example, the genie would be asked to save my mother from a building, it knows that I don't want it to explode the building to get her out, and it explodes the building anyway.
1MugaSofer7yWell, firstly, there might be things you wouldn't want if you could only understand them. But actually, I was thinking of actions that would affect society in subtle, sweeping ways. Sure, if the results were explained to you, you might not like them, but you built the genie to grant wishes, not explain them. And how sure are you that's even possible, for all possible wish-granting methods? Well, that's what the term usually means. And, honestly, I think there's good reason for that; it takes a pretty precise definition of "non-malicious genie", AKA FAI, not to do Bad Things, which is kind of the point of this essay.
2Jiro7yThat's why I suggested you can talk to the genie. Provided the genie is not malicious, it shouldn't conceal any such consequences; you just need to quiz it well. It's sort of like the Turing test, but used to determine wish acceptability instead of intelligence. If a human can talk to it and say it is a person, treat it like a person. If a human can talk to it and decide the wish is good, treat the wish as good. And just like the Turing test, it relies on the fact that humans are better at asking questions during the process than writing long lists of prearranged questions that try to cover all situations in advance. Really? A clueless genie is a genie that is asked to do something, knows that the way it does it is displeasing to you, and does it anyway? I wouldn't call that a clueless genie. What terms would you use for -- a genie that would never knowingly displease you in granting wishes, but may do so out of ignorance -- a genie that will knowingly displease you in granting wishes -- a genie that will deliberately displease you in granting wishes?
2MugaSofer7yMore full response coming soon to a comment box near you. For now, terms! Everyone loves terms. Here's how I learned it: A "genie" will grant your wishes, without regard to what you actually want. A malicious genie will grant your wishes, but deliberately seek out ways to do so that will do things you don't actually want. A helpful - or Friendly - genie will work out what you actually wanted in the first place, and just give you that, without any of this tiresome "wishing" business. Sometimes called a "useful" genie - there's really no one agreed-on term. Essentially, what you're trying to replicate with carefully-worded wishes to other genies.
0Jiro7yI want to know what terms you would use that would distinguish between a genie that grants wishes in ways I don't want because it doesn't know any better, and a genie that grants wishes in ways I don't want despite knowing better. By your definitions above, these are both just "genie" and you don't really have terms to distinguish between them at all.
1MugaSofer7yWell, since the whole genie thing is a metaphor for superintelligence, "this genie is trying to be Friendly but it's too dumb to model you well" doesn't really come up. If it did, I guess you would need to invent a new term (Friendly Narrow AI?) to distinguish it, yeah.
0Jiro7yIt's my impression that the typical scenario of a superintelligence that kills everyone to make paperclips, because you told it to make paperclips, falls into the first category. It's trying to follow your request; it just doesn't know that your request really means "I want to make paperclips, subject to some implicit constraints such as ethics, being able to stop when told to stop, etc." If it does know what your request really means, yet it still maximizes paperclips by killing people, it's disobeying your intention if not your literal words. (And then there's always the possibility of telling it "make paperclips, in the way that I mean when I ask that". If you say that, and the AI still kills people, it's unfriendly by both our standards--since your request explicitly told it to follow your intention, disobeying your intention also disobeys your literal words.)
1MugaSofer7yWell, sure it is. That's the point of genies (and the analogous point about programming AIs): they do what you tell them, not what you wanted.
1private_messaging7yWhat you tell is a pattern of pressure changes in the air, it's only the megaphones and tape recorders that literally "do what you tell them". The genie that would do what you want would have to use the pressure changes as a clue for deducing your intent. When writing a story about a genie that does "what you tell them, not what you wanted" you have to use the pressure changes as a clue for deducing some range of misunderstandings of those orders, and then pick some understanding that you think makes the best story. It may be that we have an innate mechanism for finding the range of possible misunderstandings, to be able to combine following orders with self interest.
5ArisKatsaris7y"What you tell them" in the context of programs is meant in the sense of "What you program them to", not in the sense of "The dictionary definition of the word-noises you make when talking into their speakers".
0private_messaging7yThey were talking of genies, though, and the sort of failure that tends to arise from how a short sentence describes multitude of diverse intents (i.e. ambiguity). Programming is about specifying what you want in extremely verbose manner, the verbosity being a necessary consequence of non-ambiguity.
-4Jiro7yThe genie is a metaphor for programming the AI. The problem is that the people describing the nightmare AI scenario are being vague about exactly why the AI is killing people when told to make paperclips. If the AI doesn't know that you really mean "make paperclips without killing anyone", that's not a realistic scenario for AIs at all--the AI is superintelligent; it has to know. If the AI knows what you really mean, then you can fix this by programming the AI to "make paperclips in the way that I mean". The whole genie argument fails because the metaphor fails. It makes sense that a genie who is asked to save your mother might do so by blowing up the building, because the genie is clueless. You can't tell the genie "you know what I really mean when I ask you to save my mother, so do that". You can tell this to an AI. Furthermore, you can always quiz either the genie or the AI on how it is going to fulfill your wish and only make the wish once you are satisfied with what it's going to do.
3ArisKatsaris7yHow does that follow? Even if the AI (at some point in its existence) knows what you really "mean", that doesn't mean that at that point you know how to make it do what you mean.
-3Jiro7yIt's not hard. "Do what I mean, to the best of your knowledge."
1gattsuru7yEven what you really mean may not be what you should be wishing for, if you don't have complete information, but that's honestly the least of the relevant problems. We've got a hell of a time just getting computers to understand human speech : it's taken decades to achieve the idiot-listeners on telephone lines. By the point where you can point an AGI at yourself and tell it to do what I mean, you've either programmed it with a non-trivial set of human morality or taught it to program itself with a non-trivial portion of human morality. You might as well skip the wasted breath and opaqueness. That's a genie that's safe enough to simply ask to do as you should wish, aka Friendly-AI-complete. ((On top of /that/, the more complex the utility function, the more likely you are to get killed by value drift down the road, when some special-case patch or rule doesn't correctly transfer from your starting FAI to its next generation, and eventually you end up with a very unfriendly AI, or when the scales get large enough that your initial premises no longer survive.))
-2Jiro7yRemember the distinction between an AI that doesn't understand what you mean, and an AI that does understand what you mean but doesn't always follow that. These are two different things. In order to be safe, an AI must be in neither category, but different arguments apply to each category. When I point out that a genie might fail to understand you but a superintelligent AI should understand you because it is superintelligent (which I took from MugaSofer, I am addressing the first category. When I suggest explicitly asking the AI "do what I mean", I am addressing the second category. Since I am addressing a category in which the AI does understand my intentions, the objection "you can't make an AI understand your intentions without programming it with morality" is not a valid response.
4ArisKatsaris7yYour response was to my objection: "that doesn't mean that at that point you know how to make it do what you mean." The superintelligent AI doesn't have an issue with understanding your intentions, it simply doesn't have any reason to care about your intentions. In order to program it to care about your intentions, you, the programmer need to know how to codify the concept of "your intentions" (Perhaps not the specific intention, but the concept of what it means to have an intention). How do you do that?
0Kawoomba7yFunny, I would've phrased that the other way around.

That's not programming, that's again just word-noises.

To your request, the AI can just say "I have not been programmed to do what you mean, I have been programmed to execute procedure doWhatYouMean() , which doesn't actually do what you mean". (or more realistically nothing at all, and just ignore you)

I don't think you understand the difference between programming and sensory input. The word-noises "Do what I mean" will only affect the computer if it's already been programmed to be so affected.

1Eliezer Yudkowsky7yCan I ask about your background in computer science, math, or cognitive science, if any?
-1Jiro7yIf I claim to have a degree, at some point someone will demand I prove it. Of course I will be unable to do so without posting personally identifiable information. (I have no illusions, of course, that with a bit of effort you couldn't find out who I am, but I'm darned well not going to encourage it.) Also, either having or not having a degree in such a subject could subject me to ad hominem attacks.
4Eliezer Yudkowsky7yWhether you have a background in computer science is relevant to ongoing debates at MIRI about "How likely are people to believe X?" That no superintelligence could be dumb enough to misinterpret what we mean is the particular belief in question, but if one tries to cite your case as an example of what people believe, others shall say, "But Jiro is not a computer scientist! Perhaps computer scientists, as opposed to the general population, are unlikely to believe that." Of course if you are a computer scientist they will say, "But Jiro is not an elite computer scientist!", and if you were an elite computer scientist they would say, "Elite computer scientists don't currently take the issue seriously enough to think about it properly, but this condition will reverse after X happens and causes everyone to take AI more seriously after which elite computer scientists will get the question right" but even so it would be useful data.
-1Jiro7yI didn't come up with that myself, I got it from MugaSofer: 'Well, since the whole genie thing is a metaphor for superintelligence, "this genie is trying to be Friendly but it's too dumb to model you well" doesn't really come up.' Under reasonable definitions of "superintelligence" it does follow that a superintelligence must know what you mean, but if you pick some other definition and state so outright, I won't argue with it. (It is, however, still subject to "talk to the intelligence to figure out what it's going to do".) I think you're making my case for me. PS: If you want to reply please post a new reply to the root message since I can't afford the karma hits to respond to you.
0Kawoomba7ySome off-the-cuff thoughts on why "a superintelligence dumb enough to misinterpret what we mean" may be a contradiction in terms, given the usual meaning of superintelligence: Intelligence is near-synonymous with "able to build accurate models and to update those models accurately", with 'higher intelligence' denoting a combination of "faster model-building / updating" and/or "less prone to systematic / random errors". 'Super' as a qualifier is usually applied on both dimensions, i.e. "faster and more accurately". While this seems more like a change in degree (one intelligence hypothesis, a devoted immortal fool with an endless supply of paper and pencils could simulate the world), it also often is a change in kind, since in practice there always are resource-constraints (unless Multivax reverses entropy), often relevant enough to bar a slower-modeling agent from achieving its goals within the given constraints. "Able to build accurate models and to update those models accurately", then, proportionally increases "powerful, probably able to pursue its goals effectively, conditional on those goals being related to the accurate models". Given a high degree of the former, by definition it is not exactly very hard to acquire and emulate the shared background on which inter-human understanding is built. For an AI, understanding humans would be relevant near-regardless of its actual goals; accurate models of humans as the sine-qua-non for e.g. breaking out of the AI box. Being able to build such models quickly and accurately is what classifies the agent as "superintelligent" in the first place! If there was no incentive for the agent to model humans at all, why would there be interactions with humans, such as the human asking the agent to "rescue grandma from the burning building"? The agent, when encountering rocks and precious minerals, will probably seek models reflecting a deep understanding of those. It will do the same when encountering humans. See, I'm d'accord
5Eliezer Yudkowsky7yIt'd work great if 'affecting' wasn't secretly a Magical Category based on how you partition physical states into classes that are instrumentally equivalent relative to your end goals.
0Kawoomba7yPoint. I'd still expect some variant of "keep (general) interference minimal / do not perturb human activity / build your models using the minimal actions possible" to be easier to formalize than human friendliness, wouldn't you?
1shminux7yOne usual caveat is reflective consistency: are you OK with creating a faithful representation of humans in these models and then terminating them? If so, how do you know you are not one of those models?
1Rob Bensinger7yA relatively non-scary possibility: The AI destroys itself, because that's the best way to ensure it doesn't positively 'affect' others in the intuitive sense you mean. (Though that would still of course have effects, so this depends on reproducing in AI our intuitive concept of 'side-effect' vs. 'intended effect'....) Scarier possibilities, depending on how we implement the goal: * the AI doesn't kill you and then simulate you; rather, it kills you and then simulates a single temporally locked frame of you, to minimize the possibility that it (or anything) will change you. * the AI just kills everyone, because a large and drastic change now reduces to ~0 the probability that it will cause any larger perturbations later (e.g., when humans might have a big galactic civilization that it would be a lot worse to perturb). * the AI has a model of physics on which all of its actions (eventually) have a roughly equal effect on the atoms that at present compose human beings. So it treats all its possible actions (and inactions) as equivalent, and ignores your restriction in making decisions.
0Kawoomba7yYes, implementing such a goal is not easy and has pitfalls of its own, however it's probably easi-er than the alternative, since a metric for "no large scale effects" seems easier to formalize than "human friendliness", where we have little idea of what's that even supposed to mean.
2Eliezer Yudkowsky7yThe trouble is that communicating with a human or helping them build the real FAI in any way is going to strongly perturb the world. So actually getting anything useful this way requires solving the problem of which changes to humans, and consequent changes to the world, are allowed to result from your communication-choices.
0Kawoomba7yExcept it's not, as far as the artificial agent is concerned: Its goals are strictly limited to "develop your models using the minimal actions possible [even 'just parse the internet, do not use anything beyond wget' could suffice], after x number of years have passed, accept new goals from y source." The new goals could be anything. (It could even be a boat! [http://www.youtube.com/watch?v=Ir-s3Cn52FU]). The usefulness regarding FAI becomes evident only at that latter stage, stemming from the foom'ed AI's models being used to parse the new goals of "do that which I'd want you to do". It's sidestepping the big problem (aka "cheating"), but so what?
6Eliezer Yudkowsky7yIt's allowed to emit arbitrary HTTP GETs? You just lost the game.
1Kawoomba7yAh, you mean because you can invoke e.g. php functions with wget / inject SQL code, thus gaining control of other computers etc.? A more sturdy approach to just get data would be to only allow it to passively listen in on some Tier 1 provider's backbone (no manipulation of the data flow other than mirroring packets, which is easy to formalize). Once that goal is formulated, the agent wouldn't want to circumvent it. Still seems plenty easier to solve than "friendliness", as is programming it to ask for new goals after x time. Maintaining invariants under self-modification remains, as a task. It's not fruitful for me to propose implementations (even though I just did, heh) and for someone else to point out holes (I don't mean to solve that task in 5 minutes), same as with you proposing full-fledged implementations for friendliness and for someone else to point out holes. Both are non-trivial tasks. My question is this: given your current interpretation of both approaches ("passively absorb data, ask for new goals after x time" vs. "implement friendliness in the pre-foomed agent outright"), which seems more manageable while still resulting in an FAI?
1private_messaging7yYour mistake here is that you buy into the overall idea of fairly specific notion of an "AI" onto which you bolt extras. The outcome pump in the article makes a good example. You have this outcome pump coupled with some advanced fictional 3D scanners that see through walls and such, and then, within this fictional framework, you are coaxed into thinking about how to specify the motion of your mother. Meanwhile, the actual solution is that you do not add those 3D scanners in the first place, you add a button, or better yet, a keypad for entering the pin code, and a failsafe random source (that will serve as a limit on the improbability that this device causes), and enter the password when you are satisfied with the outcome, only risking perhaps a really odd form of stroke that makes you enter the password even though your mother didn't get saved (or perhaps risking that someone ideologically opposed to the outcome pump points a gun at your head and demands you enter the password, that general sort of thing). Likewise, actual software, or even (biological) neural networks, consist of multitude of components that serve different purposes - creating representations of the real world (which is really about optimizing a model to fit), optimizing on those, etc. You don't ever face the problem of how you make the full blown AI just sit and listen and build a model while having a goal not to wreck stuff. As a necessary part of the full blown AI, you have the world modelling thing, which you use to that purpose, without it doing any "finding the optimal actions using a model, applying those to the world" in the first place. Likewise, "self optimization" is not in any way helped by an actual world model, grounding of concepts like paperclips and similar stuff, you just use the optimization algorithm, which works on mathematical specifications, on fairly abstract specification of the problem of making a better such optimization algorithm. It's not in any way like having a f
0ESRogs7yIf you already know what you're going to tell it when it asks for new goals, couldn't you just program that in from the beginning? So the script would be, "work on your models for X years, then try to parse this statement ..." Also, re: Eliezer's HTTP GET objection, you could just give it a giant archive of the internet and no actual connection to the outside world. If it's just supposed to be learning and not affecting anything external, that should be sufficient (to ensure learning, not necessarily to preclude all effects on the outside world). At this point, I think we've just reinvented the concept of CEV.
-5private_messaging7y
0Rob Bensinger7yI cited this comment in a new post [http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/] as an example of a common argument against the difficulty of Friendliness Theory; letting you know here in case you want to continue part of this conversation there.

There was a story with an "outcome pump" like this, I do not remember the name. Essentially, a chemical had to get soaked with water due to some time travel related handwave. You could do minor things like getting your mom out of the building by pouring water on the chemical if you are satisfied with the outcome, with some risk that a hurricane would form instead and soak the chemical. It would produce the least improbable outcome (in the sense that all probabilities would become as if it is given that the chemical got soaked, so naturally the le... (read more)

3David_Gerard7yIsaac Asimov's thiotimoline [https://en.wikipedia.org/wiki/Thiotimoline] stories. The last turned it into a space drive.
1Erhannis4moThis is my objection to the conclusion of the post: yes, you're unlikely to be able to patch all the leaks, but the more leaks you patch, the less likely it is that a bad solution occurs. The way the Device was described was such that "things happen, and time is reset until a solution occurs". This favors probable things over improbable things, since probable things will more likely happen before improbable things. If you add caveats - mother safe, whole, uninjured, mentally sound, low velocity - at some point the "right" solutions become significantly more probable than the "wrong" ones. As for the stated "bad" solutions - how probable is a nuclear bomb going off, or aliens abducting her, compared to firefighters showing up? I don't even think the timing of the request matters, since the device isn't actively working to bring the events to fruition - meaning, any outcome where the device resets will have always been prohibited, from the beginning of time. Which means that the firefighters may have left the building five minutes ago, having seen some smoke against the skyline. Etc. ...Or, perhaps more realistically, the device was never discovered in the first place, considering the probabilistic weight it would have to bear over all its use, compared to the probability of its discovery.

Indeed, it shouldn't be necessary to say anything. To be a safe fulfiller of a wish, a genie must share the same values that led you to make the wish. Otherwise the genie may not choose a path through time which leads to the destination you had in mind, or it may fail to exclude horrible side effects that would lead you to not even consider a plan in the first place.

No, the genie need not share the values. If it only needs to want to give you what you would are really wishing for, ie what you would give yourslef if you had its powers. It can do that ... (read more)

1TheOtherDave7yA genie who gives me what I would give myself is far from being a safe fulfiller of a wish.
-1TheAncientGeek7yBecause?
1TheOtherDave7yBecause I am not guaranteed to only give myself things that are safe.
-1TheAncientGeek7yYou would give yourself what you like. Maybe you like danger. People voluntarily parachute and mountain-climb. If the unsafe thing you get is what you want, where is the problem?
1TheOtherDave7ySure, if all I care about is whether I get what I want, and I don't care about whether my wishes are fulfilled safely, then there's no problem.
0CynicalOptimist4yBut if you do care about your wishes being fulfilled safely, then safety will be one of the things that you want, and so you will get it. So long as your preferences are coherent, stable, and self-consistent then you should be fine. If you care about something that's relevant to the wish then it will be incorporated into the wish. If you don't care about something then it may not be incorporated into the wish, but you shouldn't mind that: because it's something you don't care about. Unfortunately, people's preferences often aren't coherent and stable. For instance an alcoholic may throw away a bottle of wine because they don't want to be tempted by it. Right now, they don't want their future selves to drink it. And yet they know that their future selves might have different priorities. Is this the sort of thing you were concerned about?
0TheOtherDave4yYes, absolutely. And yes, the fact that my preferences are not coherent, stable, and self-consistent is probably the sort of thing I was concerned about... though it was years ago.

It has been stated that this post shows that all values are moral values (or that there is no difference between morality and valuation in general, or..) in contrast with the common sense view that there are clear examples of morally neutral preferences, such as prefences for differnt flavours of ice cream.

I am not convinced by the explanation, since it also applies ot non-moral prefrences. If I have a lower priority non moral prefence to eat tasty food, and a higher priority preference to stay slim, I need to consider my higher priority preference when wi... (read more)

I think that a great example of exploring the flaws in wish-making can be found whilst playing a game called Corrupt A Wish. The whole premise of the game is to receive the wish of another person and ruin it while still granting the original wish.

Ex.

W: I wish for a ton of money.

A: Granted, but the money is in a bank account you'll never gain access to.

The legendary Monkey's Paw is an unsafe genie - indeed, an actively malevolent one.