Followup toSorting Pebbles Into Correct Heaps, Invisible Frameworks

Background: There's a proposal for Friendly AI called "Coherent Extrapolated Volition" which I don't really want to divert the discussion to, right now.  Among many other things, CEV involves pointing an AI at humans and saying (in effect) "See that?  That's where you find the base content for self-renormalizing morality."

Hal Finney commented on the Pebblesorter parable:

I wonder what the Pebblesorter AI would do if successfully programmed to implement [CEV]...  Would the AI pebblesort?  Or would it figure that if the Pebblesorters got smarter, they would see that pebblesorting was pointless and arbitrary?  Would the AI therefore adopt our own parochial morality, forbidding murder, theft and sexual intercourse among too-young people?  Would that be the CEV of Pebblesorters?

I imagine we would all like to think so, but it smacks of parochialism, of objective morality.  I can't help thinking that Pebblesorter CEV would have to include some aspect of sorting pebbles.  Doesn't that suggest that CEV can malfunction pretty badly?

I'm giving this question its own post, for that it touches on similar questions I once pondered - dilemmas that forced my current metaethics as the resolution.

Yes indeed:  A CEV-type AI, taking Pebblesorters as its focus, would wipe out the Pebblesorters and sort the universe into prime-numbered heaps.

This is not the right thing to do.

That is not a bug.

A primary motivation for CEV was to answer the question, "What can Archimedes do if he has to program a Friendly AI, despite being a savage barbarian by the Future's standards, so that the Future comes out right anyway?  Then whatever general strategy Archimedes could plausibly follow, that is what we should do ourselves:  For we too may be ignorant fools, as the Future measures such things."

It is tempting to further extend the question, to ask, "What can the Pebblesorters do, despite wanting only to sort pebbles, so that the universe comes out right anyway?  What sort of general strategy should they follow, so that despite wanting something that is utterly pointless and futile, their Future ends up containing sentient beings leading worthwhile lives and having fun?  Then whatever general strategy we wish the Pebblesorters to follow, that is what we should do ourselves:  For we, too, may be flawed."

You can probably see in an intuitive sense why that won't work.  We did in fact get here from the Greek era, which shows that the seeds of our era were in some sense present then - albeit this history doesn't show that no extra information was added, that there were no contingent moral accidents that sent us into one attractor rather than another.  But still, if Archimedes said something along the lines of "imagine probable future civilizations that would come into existence", the AI would visualize an abstracted form of our civilization among them - though perhaps not only our civilization.

The Pebblesorters, by construction, do not contain any seed that might grow into a civilization valuing life, health, happiness, etc. Such wishes are nowhere present in their psychology.  All they want is to sort pebble heaps.  They don't want an AI that keeps them alive, they want an AI that can create correct pebble heaps rather than incorrect pebble heaps.  They are much disturbed by the question of how such an AI can be created, when different civilizations are still arguing about heap sizes - though most of them believe that any sufficiently smart mind will see which heaps are correct and incorrect, and act accordingly.

You can't get here from there.  Not by any general strategy.  If you want the Pebblesorters' future to come out humane, rather than Pebblish, you can't advise the Pebblesorters to build an AI that would do what their future civilizations would do.  You can't advise them to build an AI that would do what Pebblesorters would do if they knew everything the AI knew.  You can't advise them to build an AI more like Pebblesorters wish they were, and less like what Pebblesorters are. All those AIs just sort the universe into prime heaps.  The Pebblesorters would celebrate that and say "Mission accomplished!" if they weren't dead, but it isn't what you want the universe to be like.  (And it isn't right, either.)

What kind of AI would the Pebblesorters have to execute, in order to make the universe a better place?

They'd have to execute an AI did not do what Pebblesorters would-want, but an AI that simply, directly, did what was right - an AI that cared directly about things like life, health, and happiness.

But where would that AI come from?

If you were physically present on the scene, you could program that AI.  If you could send the Pebblesorters a radio message, you could tell them to program it - though you'd have to lie to them about what the AI did.

But if there's no such direct connection, then it requires a causal miracle for the Pebblesorters' AI to do what is right - a perpetual motion morality, with information appearing from nowhere.  If you write out a specification of an AI that does what is right, it takes a certain number of bits; it has a Kolmogorov complexity.  Where is that information appearing from, since it is not yet physically present in the Pebblesorters' Solar System?  What is the cause already present in the Pebble System, of which the right-doing AI is an eventual effect?  If the right-AI is written by a meta-right AI then where does the meta-right AI come from, causally speaking?

Be ye wary to distinguish between yonder levels.  It may seem to you that you ought to be able to deduce the correct answer just by thinking about it - surely, anyone can see that pebbles are pointless - but that's a correct answer to the question "What is right?", which carries its own invisible framework of arguments that it is right to be moved by.  This framework, though harder to see than arguments, has its physical conjugate in the human brain.  The framework does not mention the human brain, so we are not persuaded by the argument "That's what the human brain says!" But this very event of non-persuasion takes place within a human brain that physically represents a moral framework that doesn't mention the brain.

This framework is not physically represented anywhere in the Pebble System.  It's not a different framework in the Pebble System, any more than different numbers are prime here than there.  So far as idealized abstract dynamics are concerned, the same thing is right in the Pebble System as right here. But that idealized abstract framework is not physically embodied anywhere in the Pebble System.  If no human sends a physical message to the Pebble System, then how does anything right just happen to happen there, given that the right outcome is a very small target in the space of all possible outcomes?  It would take a thermodynamic miracle.

As for humans doing what's right - that's a moral miracle but not a causal miracle.  On a moral level, it's astounding indeed that creatures of mere flesh and goo, created by blood-soaked natural selection, should decide to try and transform the universe into a place of light and beauty.  On a moral level, it's just amazing that the brain does what is right, even though "The human brain says so!" isn't a valid moral argument.  On a causal level... once you understand how morality fits into a natural universe, it's not really all that surprising.

And if that disturbs you, if it seems to smack of relativism - just remember, your universalizing instinct, the appeal of objectivity, and your distrust of the state of human brains as an argument for anything, are also all implemented in your brain.  If you're going to care about whether morals are universally persuasive, you may as well care about people being happy; a paperclip maximizer is moved by neither argument.  See also Changing Your Metaethics.

It follows from all this, by the way, that the algorithm for CEV (the Coherent Extrapolated Volition formulation of Friendly AI) is not the substance of what's right.  If it were, then executing CEV anywhere, at any time, would do what was right - even with the Pebblesorters as its focus.  There would be no need to elaborately argue this, to have CEV on the left-hand-side and rightness on the r.h.s.; the two would be identical, or bear the same relation as PA+1 and PA.

So why build CEV?  Why not just build a do-what's-right AI?

Because we don't know the complete list of our own terminal values; we don't know the full space of arguments we can be moved by.  Human values are too complicated to program by hand. We might not recognize the source code of a do-what's-right AI, any more than we would recognize a printout of our own neuronal circuitry if we saw it.  Sort of like how Peano Arithmetic doesn't recognize itself in a mirror.  If I listed out all your values as mere English words on paper, you might not be all that moved by the list: is it more uplifting to see sunlight glittering off water, or to read the word "beauty"?

But in this art of Friendly AI, understanding metaethics on a naturalistic level, we can guess that our morals and metamorals will be physically represented in our brains, even though our morality (considered as an idealized abstracted dynamic) doesn't attach any explicit moral force to "Because a brain said so."

So when we try to make an AI whose physical consequence is the implementation of what is right, we make that AI's causal chain start with the state of human brains - perhaps nondestructively scanned on the neural level by nanotechnology, or perhaps merely inferred with superhuman precision from external behavior - but not passed through the noisy, blurry, destructive filter of human beings trying to guess their own morals.

The AI can't start out with a direct representation of rightness, because the programmers don't know their own values (not to mention that there are other human beings out there than the programmers, if the programmers care about that).  The programmers can neither brain-scan themselves and decode the scan, nor superhumanly precisely deduce their internal generators from their outward behavior.

So you build the AI with a kind of forward reference:  "You see those humans over there?  That's where your utility function is."

As previously mentioned, there are tricky aspects to this.  You can't say:  "You see those humans over there?  Whatever desire is represented in their brains, is therefore right."  This, from a moral perspective, is wrong - wanting something doesn't make it right - and the conjugate failure of the AI is that it will reprogram your brains to want things that are easily obtained in great quantity.  If the humans are PA, then we want the AI to be PA+1, not Self-PA... metaphorically speaking.

You've got to say something along the lines of, "You see those humans over there?  Their brains contain the evidence you will use to deduce the correct utility function, even though right-ness is not caused by those brains, so that intervening to alter the brains won't alter the correct utility function."  Here, the "correct" in "correct utility function" is relative to a meta-utility framework that points to the humans and defines how their brains are to be treated as information.  I haven't worked out exactly how to do this, but it does look solvable.

And as for why you can't have an AI that rejects the "pointless" parts of a goal system and only keeps the "wise" parts - so that even in the Pebble System the AI rejects pebble-sorting and keeps the Pebblesorters safe and warm - it's the problem of the invisible framework again; you've only passed the recursive buck. Humans contain the physical representations of the framework that we appeal to, when we ask whether a goal is pointless or wise. Without sending a message to the Pebble System, the information there cannot physically materialize from nowhere as to which goals are pointless or wise. This doesn't mean that different goals are pointless in the Pebble System, it means that no physical brain there is asking that question.

The upshot is that structurally similar CEV algorithms will behave differently depending on whether they have humans at the focus, or Pebblesorters.  You can infer that CEV will do what's right in the presence of humans, but the general algorithm in CEV is not the direct substance of what's right.  There is no moral imperative to execute CEVs regardless of their focus, on any planet.  It is only right to execute CEVs on decision systems that contain the seeds of rightness, such as humans.  (Again, see the concept of a moral miracle that is not a causal surprise.)

Think of a Friendly AI as being like a finely polished mirror, which reflects an image more accurately than any painting drawn with blurred eyes and shaky hand.  If you need an image that has the shape of an apple, you would do better to put an actual apple in front of the mirror, and not try to paint the apple by hand.  Even though the drawing would inherently be apple-shaped, it wouldn't be a good one; and even though the mirror is not inherently apple-shaped, in the presence of an actual apple it is a better picture than any painting could be.

"Why not just use an actual apple?" you ask.  Well, maybe this isn't a merely accurate mirror; it has an internal camera system that lightens the apple's image before displaying it.  An actual apple would have the right starting shape, but it wouldn't be bright enough.

You may also want a composite image of a lot of apples that have multiple possible reflective equilibria.

As for how the apple ended up apple-shaped, when the substance of the apple doesn't define apple-shaped-ness - in the very important sense that squishing the apple won't change what's apple-shaped - well, it wasn't a miracle, but it involves a strange loop through the invisible background framework.

And if the whole affair doesn't sound all that right... well... human beings were using numbers a long time before they invented Peano Arithmetic.  You've got to be almost as smart as a human to recognize yourself in a mirror, and you've got to be smarter than human to recognize a printout of your own neural circuitry.  This Friendly AI stuff is somewhere in between.  Would the rightness be easier to recognize if, in the end, no one died of Alzheimer's ever again?

New Comment
42 comments, sorted by Click to highlight new comments since:

"You may also want a composite image of a lot of apples that have multiple possible reflective equilibria."

That was the part I was waiting for. Good post!

I'm not sure it makes sense to talk about morality being "amazed" by anything, because morality doesn't predict, but certainly morality is high-fiving the human brain for being so awesome compared to say rocks.

"The real question is when "Because Eliezer said so!" became a valid moral argument."

You're confusing the algorithm Eliezer is trying to approximate with the real, physical Eliezer. If Eliezer was struck by a cosmic ray tomorrow and became a serial killer, me, you, and Eliezer would all agree that this doesn't make being a serial killer right.

If Eliezer was struck by a cosmic ray tomorrow and became a serial killer, me, you, and Eliezer would all agree that this doesn't make being a serial killer right.
Eliezer's positions from several years ago seem very different when compared to his positions now. It's not being a serial killer, and I rather doubt any cosmic rays were involved in the process, but what you've suggest seems to have already occurred: a relatively sudden change in attitudes and a wholesale, enthusiastic adoption of the new attitude.

You've got to be almost as smart as a human to recognize yourself in a mirror...

Quite recently, research has shown that the above statement may not actually be true.

As previously mentioned, there are tricky aspects to this. You can't say: "You see those humans over there? Whatever desire is represented in their brains, is therefore right." This, from a moral perspective, is wrong - wanting something doesn't make it right - and the conjugate failure of the AI is that it will reprogram your brains to want things that are easily obtained in great quantity. If the humans are PA, then we want the AI to be PA+1, not Self-PA... metaphorically speaking.

Before reading this post, if I had been programming a friendly AI I would have attempted to solve this issue by programming the AI it to take into account only minds existing at the moment it makes its decisions. (the AI still cares about the future, but only to the extent that currently existing minds, if extrapolated, would care about the future). This technique has the flaw that it would be likely to fail in the event that time travel is easy (the AI invents it before it reprograms itself to eliminate the bug). But I think this would be easier to get right, and what is the chance of time travel being easy compared to the chance of getting the "right" solution wrong?

What exactly is the point in deciding that 'right' is what humans brains are designed to approximate, and then being amazed that human brains approximate 'rightness'?

It's no good saying that rightness is an algorithm, and human opinions about rightness merely a crude guide. We don't have any way to refer to or define this algorithm besides looking at humans; any models we generate will always be human-behavior-derived.

Besides, talking about the algorithm is wrong. There are infinitely many algorithms that will be compatible with a finite predetermined output, and there are no restrictions on what any of them do beyond the supplied data.


I vaguely remember from the last time I visited this site that you are in the inductivist camp. In several articles you seemed to express a deep belief in Bayesian reasoning.

I think that while you are an intelligent guy but I think your abandonment of falsification in favor of induction is one of your primary mistakes. Falsification subsumes induction. Popper wins over Bayes.

Any presumed inductivism has foundations in trial and error, and not the other way around. Poppers construction is so much more straightforward than this convoluted edifice you are creating.

Once you understand falsification there is no problem explaining why science isn’t based on “faith”. That’s because once you accept falsification as the basis for science it is clear that one is not using mere induction.

At this point I’m wondering if you are a full blown inductionist. Do you believe that my beliefs are founded upon induction? Do you believe that because you believe I have no way to avoid the use of induction? I had a long discussion once with an inductivist and for the life of me I couldn’t get him to understand the difference between being founded upon and using.

I don’t even believe that I am using induction in many of the cases where inductivists claim that I am. I don’t assume the floor will be there when I step out of bed in the morning because of induction, nor do I believe sun will rise tomorrow because of induction.

I believe those things because I have well tested models. Models about how wood behaves, and models about how objects behave. Often I don’t even believe what is purported to be my belief.

The question, “will the sun rise tomorrow” has a broader meaning than “The sun will rise on August 24, 2008” in this discussion. In fact, I don’t explicitly and specifically hold such beliefs in any sort of long term storage. I don’t have a buffer for whether the sun is going to rise on the 24th, the 25th, and so forth. I don’t have enough memory for that. Nor do I determine the values to place in each of those buffers by an algorithm of induction.

I only think the question refers to August the 24th with further clarification by the speaker. I think he means “how do we know the sun will keep rising” and not that the questioner had any particular concern about the 24th.

I did run into a guy at a park who asked me if I believed the world would end on December 21, 2012. I had no idea what he was on about till he mentioned something about the Mayan calendar.

So in fact, in this discussion, when we are talking about the question of “will the sun will rise tomorrow” we aren’t concerned about whether any single new observation will match priors we are concerned about the principles upon which the sun operates. We are talking models, not observations.

As a child I remember just assuming the sun would rise. I don’t in fact remember any process of induction I went through to justify it. Of course that doesn’t mean my brain might not be operating via induction unbeknownst to me. The same could be said of animals. They two operate on the assumption that the sun will rise tomorrow.

They even have specific built in behaviors that are geared towards this. It’s pretty clear that where these assumption are encoded outside the brain, that the encoding was done by evolutionary processes and we know natural selection does not operate via induction.

What about the mental processes of animals. Must the fact that animals mentally operate on the presumption that “the sun will rise tomorrow” mean that they much have somewhere deep inside an inductive module to deal with the sun rising. I don’t think so. It isn’t even clear that they believe that they believe “the sun will rise tomorrow” either specifically or generally.

Even if they do it is not clear that induction plays a part in such a belief. It may be that natural selection has built up a many different possible mental models for operational possibilities and that observation is only used to classify things as fitting one of these predefined models.

Heck, I can even build new categories of models on the fly this way, this too on the basis of trial and error. A flexible mind finding that the behavior of some object in the real world does not quite fit one of the categories can take guesses at ways to tweak the model to better fit.

So it is not at all clear that anything has been foundationally been arrived at via induction.

In fact, if my memory serves me when I first inquired about the sun I was seeking a more sophisticated model. I knew I already had it categorized as the kind of object that behaved the same way as it did in the past, but was concerned that perhaps I was mistaken and that it might be categorized in some other way. Perhaps as something that doesn’t follow such a simple rule.

Now I’m not even sure I asked the question precisely as “will the sun rise tomorrow” but I do remember my mental transitions. At first I don’t remember even thinking about it. Later I modified my beliefs in various ways and I don’t recall in what order, or why. I came to understand the sun rose repetitively, on a schedule, etc.

I do remember certain specific transitions. Like the time I realized because of tweaking of other models that, in fact, the statement “The sun will rise tomorrow” taken generally is not true. That I know certainly came to mind when I learned the sun was going to burn out in six billion years. My model, in the sense I believed the “sun will rise tomorrow” meaning the next day would come on schedule, was wrong.

In my view, “things that act Bayesian” is just another model. Thus, I never found the argument that Bayes refutes Popper very compelling. Reading many of the articles linked off this one I see that you seem to be spinning your wheels. Popper covered the issue of justification much more satisfactorily than you have with your article,”">“Where Recursive Justification Hits Rock Bottom”.

The proper answer is that justification doesn’t hit rock bottom and that science isn’t about absolute proof. Science is about having tentative beliefs that are open to change given more information based on models that are open to falsification by whatever means.

Pursuing a foundationalist philosophical belief system is a fools errand once you understand that there is no base foundation to knowledge. The entire question of whether knowledge is based on faith vs. empiricism evaporates with this understanding. Proper knowledge is based on neither.

I could go on with this. I have thought these things through to a very great extent but I know you have a comment length restriction here and I’ve probably already violated it. That’s a shame because it limits the discussion and allows you to continue in your biases.

You are definitely on the wrong track here with your discussions on morality also. You are missing the fundamentality of natural selection in all this, both to constrain our creations and to how it arises. In my view, the Pebblesorters morality is already divorced from survival and therefore it should be of no concern to themselves whatever if their AI becomes uncontrollable, builds it’s own civilization, etc. Fish, in fact, do create piles of pebbles despite their beliefs and you expressed no belief on their part that they must destroy incorrectly piled pebbles created by nature. So why should they have moral cares if their AI wins independence and goes of and does the “wrong” thing.

For them to be concerned about the AI requires broader assumptions than you have made explicit in your assumption. Assumptions like feeling responsible for chains of events you have set in place. There are assumptions that are objectively required to even consider something a morality. Otherwise we have classified incorrectly. In fact, the pebble sorters are suffering from an obsessive delusion and not a true morality. Pebblesorting fails to fit even the most simplistic criteria for a morality.

Since I am limited in both length and quantity of posts and I don’t feel like splitting this into multiple posts over multiple articles. This is in response to many of your articles. Invisible Frameworks, Mirrors and Paintings, Pebblesorters, When Recursive Justification Hits Rock Bottom, etc.

I could post it on an older thread to be buried a hundred comments deep but that two isn’t a rational choice as I’d like people to actually see it. To see that this abandonment of falsification for induction is based on faulty reasoning. I’m concerned about this because I have been watching science become increasingly corrupted by politics over my lifetime and one of the main levers used to do this is the argument that real scientists don’t use falsification (while totally misunderstanding what the term means) but induction.

The pebble sorters must have some wiring to compare values. So they can decide if a 5 pile is better than two piles of two and three, so they can decide if buying a new shovel or more pebbles will advance their goal the most, and to strategize for the upcoming war between the pebblesorters and the humans.

If they pebblesorters build an AI to tell them how to win the war, that AI will quickly see that any single peblesorter has more strategic value than any pile, even if the piles are the only things it is told to defend.

The origin of my style of absolute morality in the peblesorter solar system is that if they don't make good choices about what is important, then what is important will not be served.

An aside to Eliezer: Up until this post, I felt like you were arguing that FAI was impossible or too hard for us, even though I knew you were promoting FAI. Hearing all the ways FAI won't work left me strongly considering the notion that there is no reliable way we can get FAI to work. You might want to reorganize things for the book so that readers don't get the wrong idea.

In my humble opinion, Caledonian nailed it when he said that Eliezer is drawing the target around where the arrow landed.

I see, Hollerith, that you have not yet read "The Gift We Give To Tomorrow" where I define "moral miracle" as a miracle that isn't really a causal miracle because you got to a particular place and then declared it the destination; so it only appears as a miracle if you assume the invisible background framework as a fixed constant. Caledonian is simply engaging in his usual practice of taking points I have directly and repeatedly made, claiming I haven't made them, stupiding them up a bit, and claiming them as his own.

I guess I'll start being stricter about how much of Caledonian's posts I delete, since this apparently really is fooling people.

You keep emphasizing that this isn't a "relative" morality; is that really necessary? I think it's been a very interesting series of posts, but I disagree with that claim, most likely because we don't see eye to eye on what is meant by a "relative" versus an "absolute" morality, 'cause what you're describing seems so clearly a relative morality. I don't see anything wrong with that, and don't think that detracts at all from your main points, but you insist on bringing it up...

Eliezer, while I think that Caledonian (and perhaps also Richard Hollerith) has apparently missed the whole point of a number of your posts (in particular the recent ones on Löb's theorem, etc), I'm not sure why you are so concerned about people being "fooled". These are comments, which happen to be clearly labeled as not being authored by you. Would anyone really assume that a particular commenter, be it Caledonian or anyone else, has necessarily summarized your views accurately?

Furthermore, for every one such "misrepresentative" comment, there are undoubtedly several lurkers suffering from an honest misunderstanding similar to the one being articulated (whether in good faith or not) by the commenter. It may be worthwhile to simply correct these misunderstandings as often as possible, even at the risk of repetition. These are, after all, subtle points, and it may take time (and reinforcement) for people to understand them.

On the topic of universalizing... I wonder if it would be more or less "good" if you pointed a CEV solely at the human species, versus if you additionally asked it to consider other semi-smart lifeforms, but only if the result didn't substantially cramp human CEV. Ought a FAI to violate the cuttlefish norm "things not intended to express territorial anger should not have black and white stripes" where nothing is lost by following it?

Do you think a human+CEV would do that sort of thing anyway, based on reading "universalizing " and "consideration for others" as values?

Or would that be a dangerous approach? Ought it to be strictly bounded to human CEV? I am not good enough at playing "out-guess the genie" to decide.

(I have removed references to the notion of the moral miracle from my blog entry because I have not had time to make sure I understand what you mean by the notion.)

Eliezer, in this blog entry and in others, you appeal to human qualities like love, laughter and humor to support a claim that a sufficiently powerful intelligence can learn all it needs to know about rightness or about proper terminal values by examining the human brain. If the appeals do not support your claim then please explain why they occur often in your posts about your claim.

I humbly submit that that is drawing the target where the arrow landed because the criteria (love, laughter, etc) you are using to score the goodness of the human brain (as a sort of reference book about terminal values) are an effect of the human brain.

If another species (a dinosaur, say) had evolved to our stage and could understand your blog entries, there is a good chance that it would be quite unmoved by your appeals to love, laughter, etc. In other words, if the arrow landed over there (where the tool-using mental-model-making dinosaur species is) then the causal chain that would have led a mind to draw the target where you drew the target would not exist.

Heck, you explicitly made the point about a week ago that you would not have expected the first tool-using mental-model-making species to have a sense of humor. In other words, if the arrow had landed somewhere else, there would exist no mind able to comprehend humor, so the target would not have been drawn around the points in the space of possible minds that have a sense of humor.

In summary, that one sentence by Caledonian is a sound and relevant criticism (which should not be taken as an endorsement of his other comments).

Eliezer, I'm starting to think you're obsessed with Caledonian.

It's pretty astonishing that you would censor him and then accuse him of misrepresenting you. Where are all these false claims by Caledonian about your past statements? I haven't seen them.

For what it's worth, the censored version of Caledonian's comment didn't persuade me.

Brian, I was at a conference and couldn't repair the blog commentary until well after Caledonian had succeeded in diverting the discussion. The comments have now been cleaned, and hence, of course, no longer appear objectionable. But if you look at some of the other comments that replied to Caledonian - itself usually a mistake - you should see quotes of the objectionable parts.

Richard, see "Invisible Frameworks". In thinking that a universal morality is more likely to be "correct", and that the unlikeliness of an alien species having a sense of humor suggests that humor is "incorrect", you're appealing to human intuitions of universalizability and moral realism. If you admit those intuitions - not directly as object-level moral propositions, but as part of the invisible framework used to judge between moral propositions - you may as well also admit intuitions like "if a moral proposition makes people happier when followed, that is a point in its favor" into the invisible framework as well. In fact, you may as well admit laughter. I see no basis for rejecting laughter and accepting universalizability.

In fact, while I accept "universalizability among humans" as a strong favorable property where it exists, I reject "universalizability among all possible minds" because this is literally impossible of fulfillment.

And moral realism is outright false, if interpreted to mean "there is an ontologically fundamental property of should-ness", rather than "once I ask a well-specified question the idealized abstracted answer is as objective as 2 + 2 = 4".

Laughter and happiness survive unchanged. Universalizability and moral realism must be tweaked substantially in their interpretation, to fit into a naturalist and reductionist universe. But even if this is not the case, I see no reason to grant the latter two moral instincts an absolute right-of-way over the first, as we use them within the invisible background framework to argue which moral propositions are likely to be "correct".

If you want to grant universalizability and realism absolute right-of-way, I can but say "Why?" and "Most human minds won't find that argument convincing, and nearly all possible minds won't find that argument even persuasive, so isn't it self-undermining?"

"Most human minds won't find that argument convincing."

If this discussion occured in Europe 600 years ago, you could use that as a reply to any argument that Genesis is wrong or that the God of the Old Testament does not exist. I would have no good reply more accessible than a very long and tedious textbook on such topics as celestial mechanics, geology, natural selection and the Big Bang, which of course most human minds in Europe 600 years ago would not have had the patience to try to understand.

So, in a matter of this import, I suggest not being satisfied with an appeal to what most human minds think.

Remember that we live under a political system in which anyone whose career is enhanced by evidence that he or she can influence voters has an incentive to promote the central tenent of America's civic religion (now established or influential over the entire globe): that decisions or judgements arrived at by majorities are often or always correct. The incentive exists because the more entrenched the central tenet becomes, the more people will try to use their "political rights" (to vote, to lobby the government) to solve their problems, which increases the importance of and the demand for those who can influence voters. Your awareness of that constant chorus from people who are pursuing their own self-interest (aided by people whose motivation is the essentially that of religious zealots) should cause you to decrease your confidence in arguments that begin with "most human minds".

Now let us consider why most human minds won't find my argument convincing. It is because of their tendency to approach everything from an attitude of "What is in it for me?" A good example is Robin Hanson's paper on the possibility that we are living in a simulation. If you think you are living in a simulation, writes Robin, then make your behavior more interesting so that the simulator will devote more computational resources to simulating you. That is the sort of answer you get when you ask, "What is in it for me?" If instead you ask, "What is my responsibility in this situation?" you get a different answer -- at least I do -- namely, I would try to help the simulator. Absent any other information about the simulator or the simulation besides a suspicion I am living in a simulation, I would go about my life as if the suspicion that I am living in a simulation had not occurred to me because that is the course of action that maximizes the predictive power the simulator derives from the simulation -- prediction being the most important purpose I can imagine for running a simulation.

When we ask of the prospect of an engineered explosion of intelligence, "What is in it for me?" then the answer naturally tends towards things like living a billion years and all the fun we could have. But I try to avoid asking that question because it might distract me from what I consider the important question, which is, "How can I help?"

Note that we all have a natural human desire to help another human -- in certain situations -- because that helped our ancestors to win friends and maximize reproductive fitness. When I ask, "How can I help?" my interest is not in helping the humans -- the humans will probably be obsoleted by the explosion of engineered intelligence -- but rather in helping along the most important process I can identify. I believe that importance entails persisting indefinitely, which leads to my interest in indefinitely long chains of cause and effect.

But in this time before the intelligence explosion, when humans are not yet obsolete, helping other humans is a very potent means of maximizing the "creativity" (the ability to get things done) of the parts of reality under my control, which in my way of thinking is the purpose of life. So for now I try to help other humans, particularly those whose creative potential is high.

Why does importance entail persisting indefinitely? Why is "How can I help?" in your particular general sense, the important question? How do you know these things? I know where my morality comes from, and I know why I believe what I believe about it. Whence yours?

Thinking about this post leads me to conclude that CEV is not the most right thing to do. There may be a problem with my reasoning, in that it could also be used by pebble-sorters to justify continued pebble-sorting. However, my reasoning includes the consequence that pebble-sorters are impossible, so that is a non-issue.

Think about our assumption that we are in fact better than pebble-sorters. It seems impossible for us to construct an argument concluding this, because any argument we make presumes the values we are trying to conclude.

Yet we continue to use the pebble-sorters, not as an example of another, equally-valid ethical system, but as an example of something wrong.

We can justify this by making a meta-level argument that the universe is biased to produce organisms with relatively valuable values. (I'm worried about the semantics of that statement, but let me continue.) Pebble-sorting, and other futile endeavors, are non-adaptive, and will lose any evolutionary race to systems that generate increased complexity (from some energy input).

We MUST make this meta-level argument that the universe inherently produces creatures with pretty-valuable values. We have no other way of claiming to be better than pebble-sorters.

Given this, we could use CEV to construct AIs... but we can also try to understand WHY the universe produces good values. Once we understand that, we can use the universe's rules to direct the construction of AIs. This could result in AIs with wildly different values than our own, but it may be more likely to result in non-futile AIs, or to produce more-optimal AIs (in terms of their values).

It may, in fact, be difficult or impossible to construct AIs that aren't eventually subject to the universe's benevolent, value-producing bias - since these AIs will be in the universe. But we have seen in human history that, although there are general forces causing societies with some of our values to prosper, we nonetheless find societies in local minima in which they are in continual warfare, pain, and poverty. So some effort on our part may increase the odds of, or the decrease the time until, a good result.

Eliezer, why is one of your most common responses to someone disagreeing with you saying that they obviously haven't read a previous post?

People have been disagreeing with every post you've put up. More specifically, people have disagreed with your assertions about what follows logically from previous assertions you've made. It's not just your points that they reject, it's the structure and validity of your arguments that they have problems with.

Richard Hollerith is a regular commentor who can be reasonably presumed to have read your previous posts - in fact, we not only can presume that but should. If he makes an argument that you feel is ruled out by your previous posts, the reasonable conclusion is that he disagrees with your arguments, not that he is ignorant of them - especially when the arguments in question were posted recently.

@Eliezer I mostly agree with Caledonian here. I disagree with much of what you say, and it has nothing to do with being 'fooled'. Censoring the few dissenters who actually comment is not a good idea if you have any interest in avoiding an echo chamber. You're already giving off the Louis Savain vibe pretty hard.

Richard Hollerith's blog post explicitly stated that he hadn't read the post where "moral miracle" is defined ("The Gift We Give To Tomorrow"). As Caledonian has read this post, his misrepresentation was malicious, while Hollerith's status is that of an innocent victim of Caledonian.

Thom Blake, you've just been another victim of Caledonian, this time by his implication that I censor him for disagreement rather than malicious misrepresentation. Plenty of people here disagree without getting censored, such as, for example, Richard Hollerith.

Phil Goetz, why should I care what sort of creatures the universe "tends to produce"? What makes this a moral argument that should move me? Do you think that most creatures the universe produces must inevitably evolve to be moved by such an argument?

Sadly I did not yet have time to consider Phil Goetz's comment.

"Why does importance entail persisting indefinitely?"

Because if you take a sufficiently long-term perspective, Eliezer, a non-persisting causal chain has no effects.

Again it is, How can I help reality as whole? not How can I help my neighbor or my fellow human being? Actually How can I affect reality? is a better phrasing of the question. I chose the phrasing, How can I help? to make a nice contrast to, What is in it for me?

One big reason How can I affect reality? strikes me as the right question is that reality as a whole is much more important than I am.

Note that Kepler, Galileo, Newton and the people who discovered that the universe is 10 billion years old were instrumental in helping me arrive at that conclusion. Darwin, Wallace and brain scientists helped a lot, too, by helping me understand that no part of me extends beyond the unitary reality that is the object of the study of the physicists.

My guess is that these answers will seem potentially satisfactory only to the rare reader who has silenced the voice that constantly asks, But what will become of me? -- the rare reader who has silenced his ego in other words. If the ego has not been silenced, it easily drowns out the kind of answers I give here.

The trick in this game, IMHO, is silencing or ignoring motivations and sources of answers within one's mind that are not as reliable as the most reliable parts of one's mind. Some voices I consider unreliable are my motives of personal survival, of not being ostracized, of gaining or keeping status and of making a better living -- particularly if those entail managing the impression I am making on others. In contrast, what I consider the most reliable part of my mind is the part that delights to learn the modern theory of rationality a la E.T. Jaynes and Judea Pearl. The scientist in me rather than, e.g., the political animal.

Richard Hollerith's blog post explicitly stated that he hadn't read the post where "moral miracle" is defined ("The Gift We Give To Tomorrow").

It did indeed state that. (Then I deleted it because I am new at blogging, revise too much after I publish and did not consider that the statement might become a subject of conversation).

And it is indeed the case that Eliezer has censored none of my dozens of dissents, some of which are quite long. I once wrote that I am horrified and appalled by Eliezer's CEV, which is strong language.

Richard Hollerith is a regular commentor who can be reasonably presumed to have read your previous posts

For several months now, Caledonian, I have not been able to keep up with Eliezer's posts (and I have announced that on the blog and in private email to Eliezer). Note that writing down what I already believe requires much less mental energy than reading and learning new information, so I have continued to comment when I feel confident I understand the point I am commenting on.

I still think CEV is dangerously vague. I can't really hold up anything as an alternative, and I agree that all the utility functions that have been offered so far have fatal flaws in them, but pointing at some humans with brains and saying "do what's in there, kind of! but, you know, extrapolate..." doesn't give me a lot of confidence.

I've asked this before without getting an answer, but can you break down CEV into a process with discrete ordered steps that transforms the contents of my head into the utility function the AI uses? Not just a haphazard pile of modifiers (knew more, thought faster, were more the people we would wish we were if we knew what we would know if we were the people we wanted to be), but an actual flowchart or something.

Phil Goetz, why should I care what sort of creatures the universe "tends to produce"? What makes this a moral argument that should move me? Do you think that most creatures the universe produces must inevitably evolve to be moved by such an argument?

I stated the reason:

We MUST make this meta-level argument that the universe inherently produces creatures with pretty-valuable values. We have no other way of claiming to be better than pebble-sorters.

I don't think that we can argue for our framework of ideas from within our framework of ideas. If we continue to insist that we are better than pebble-sorters, we can justify it only by claiming that the processes that lead to our existence tend to produce good outcomes, whereas the hypothetical pebble-sorters are chosen from a much larger set of possible beings, with a much lower average moral acceptability.

A problem with this is that all sorts of insects and animals exist with horrifying "moral systems". We might convince ourselves that morals improve as a society becomes more complex. (That's just a thought in postscript.)

One possible conclusion - not one that I have reached, but one that you might conclude if the evidence comes out a certain way - is that the right thing to do is not to make any attempt to control the morals of AIs, because general evolutionary processes may be better at designing morals than we are.

I will reiterate the point:

It may or may not be a good idea to redefine the word 'right', but such is permitted in reasoned argument. But it is meaningless to speak of humanity evolving to approximate an algorithm if the only way we can refer to the algorithm is by extrapolating from humanity's behavior.

If we had some objective criteria for what 'rightness' was, we could talk about the degree to which any individual's evaluation was similar to or different from the function. We could do the same for humanity in general, or an averaged or extrapolated function derived from humanity's behavior.

Without such, though, our only source of content for the linguistic category we've established is human actions. It is then meaningless to talk about 'hitting a small target in a large search space', or marvel at the 'moral miracle' of humanity evolving to do what is 'right'. There's nothing there but a self-referential loop, a verbal short circuit. There's no way to define the target, and no feature of reality that could have induced the evolution of humanity to proceed in any particular way.

For those lines of discussion to be meaningful, we'd need to establish what properties 'rightness' would have to possess beyond defining it by referring to our opinions, and we'd be right back where we started.

This discussion of 'meta-ethics' resolves nothing. It in fact introduced a systematic error into our thinking by creating a causal loop - we can't talk about the algorithm guiding evolution and simultaneously define that algorithm in terms of what humans do and prefer.

I side with Caledonian and Richard in these things - CEV is actually just begging the question. You start with human values and end up with human values.

Well, human values have given us war, poverty, cruelty, oppression, what have you...and yes, it was "values" that gave us these things. Very few humans want to do evil things, most actually think they are doing good when they do bad onto others. (See for instance: Baumeister, Roy F. Evil: Inside Human Violence and Cruelty).

Apart from that, I have to plug Nietzsche again: he has criticized morality as no other before him. Having read Nietzsche, I must say that CEV gives me the shivers - it smacks of the herd, and the herd tramples both weed and flower indiscriminately.

Incidentally, via Brian Leiter's Blog I happened upon the dissertation (submitted in Harvard) by Paul Katsafanas: Practical Reason and the Structure of Reflective Agency who draws largely on Nietzsche. I have not read it (but plan to), but it sounds quite interesting and relevant.

From the abstract:

Confronted with normative claims as diverse as “murder is wrong” and “agents have reason to take the means to their ends,” we can ask how these claims might be justified. Constitutivism is the view that we can justify certain normative claims by showing that agents become committed to these claims simply in virtue of acting. I argue that the attractions of constitutivism are considerable. However, I show that the contemporary versions of constitutivism encounter insurmountable problems, because they operate with inadequate conceptions of action. I argue that we can generate a successful version of constitutivism by employing a more promising theory of action, which I develop by mining Nietzsche’s work on agency.

A "right" morality should not concentrate on humans or extrapolated humans, but on agency (this would then encompass all kinds of agents, not only primate descendants). Where there are no agents, there is no (necessity of) morality. Morality arises where agents interact, so focusing on "agents" seems the right thing to do, as this is where morality becomes relevant.

Günther on CEV: "You start with human values and end up with"

--transhuman values.

Günther: Well, human values have given us war, poverty, cruelty, oppression, what have you...and yes, it was "values" that gave us these things. Very few humans want to do evil things, most actually think they are doing good when they do bad onto others.

I think Eliezer answered to this in No License To Be Human. You don't follow "values" because they are "values", some values may be discarded in the course of moral progress if they turn out to be wrong, and some can be studied if they explain the structure of right, contributing to the resulting moral dynamic.


Caledonian nailed it when he said that Eliezer is drawing the target around where the arrow landed.

  • Seconded.

Hitting a small target in a large searchspace is only impressive if you can define the target beforehand. It's the basis of the Fertilization Fallacy: what are the chances that precisely the right sperm fertilized the egg to produce me? The fallacy lies in the fact that I'm looking backward from the present, a present in which my particular configuration already exists.

If I flip a coin a hundred times, I will produce a specific sequence out of a sequencespace of 2^100. That always happens when I flip the coin that many times. This is not marvelous because I had no way of determining ahead of time what the sequence would be - the flipping wasn't a search. Likewise, without the ability to predefine a target genomic configuration, the fact that I resulted from the spermatic lottery is not significant. Someone was going to result, and it happened to be me.

Let's say we screen out all of the differences in human morality and look at the result, however large or minute it may be. What do the agreed-upon principles tell us about the nature of what is right?

An inordinate fondness for beetles.


thanks for pointing me to that post, I must admit that I don't have the time to read all of Eli's posts at the moment so maybe he has indeed addressed the issues I thought missing.

The title of the post at least sounds very promising grin.

Thanks again, Günther

OK, I think I'm following you so far. But geez, the discussion gets muddy.

Three things.

  1. Consider the proposition (P) "Humanity embeds the right values."

I think you'd agree that P isn't necessarily true... that is, it was possible that humanity might have evolved to embed the wrong values. If I released a brain-altering nanovirus that rewrote humanity's moral code to resemble the Pebblesorters', P would become false (although no human would necessarily notice).

You assert that P is true, though. So a few questions arise:

  • Why do you believe that P is true?
  • What would you expect to perceive differently, were P false?

I think you'd agree that these are important questions. But I cannot figure out, based on the series of posts so far, what your answers are.

Perhaps I've missed something important. Or perhaps this gets clearer later.

  1. You say:

The AI can't start out with a direct representation of rightness, because the programmers don't know their own values

Wait, wait, what?

By your own account, it's possible for an optimizing process to construct a system that embeds the right values even if the process itself doesn't know its own values... even, in fact, if the process itself lacks the right values.

After all, by your account natural selection did precisely this: it is not itself right, nor does it know its own values, and it nevertheless constructed humans out of unthinking matter, and humans are right.

In fact, not only is it demonstrably possible, it happened in the ONLY instance of natural selection constructing a human-level intelligence that we know of. So either: A: we were impossibly lucky, B: it is likelier than it seems that evolved intelligences are right, or C: some kind of anthropic principle is at work.

A seems like a non-starter.

I suppose C could be true, though I don't quite see what the argument would be. Anthropic arguments generally depend on some version of "if it weren't true you wouldn't be here to marvel at it," and I'm not sure why that would be.

I suppose B could be true, though I don't quite see why I would expect it. Still, it seems worth exploring.

If B were the case, it would follow that allowing a process of natural selection to do most of the heavy lifting to promote candidate AIs to our consideration, and then applying our own intelligence to the problem of choosing among candidate AIs, might be a viable strategy. (And likely an easier one to implement, as natural selection is a far simpler optimizing process to model than a human mind is, let alone the CEV of a group of human minds.)

  1. (not to mention that there are other human beings out there than the programmers, if the programmers care about that).

Which, by your own account, they ought not. What matters is whether their AI is right, not whether the human beings other than the programmers are in any way involved in its creation. (Right?)

The metaethics sequence is highly controversial, and I have problems with it myself. ( e.g. Löb's Theorem and its implications for morality.) Furthermore I was concerned by the many dissenting comments, but then I realized that probably the smartest guys ( like Herreshof, Salamon, Rayhawk, Shulman,Tarleton,etc.. ( biased sample, I know)) already agree with Eliezer and do not comment at all. So although CEV is controversial, not everyone disagrees with Eliezer! Remember this. ( And yeah, it sounds like propaganda, but no, I am not paid for saying this....)

During my short visit to SIAI, I noticed that Eliezer clearly had much higher status than others there, so their relative lack of publicly-visible disagreements with Eliezer may be due to that. (You do realize that the people you listed are all affiliated with SIAI?) Also, Marcello Herreshof did have a significant disagreement with Eliezer about CEV here.

Hm, you're right, I didn't notice that all are affiliated with SIAI. But: Probably there is a reason why Eliezer has high status...

Marcello writes:

So, what do we do if there is more than one basin of attraction a moral reasoner considering all the arguments can land in? What if there are no basins?

Crap. This is really a problem. So, who else disagrees with Eli's CEV? What does e.g. Bostrom think? And does anyone have better proposals? I ( and probably many others) would be really interested in the opinion of other "famous lesswrongers" such as Yvain, Alicorn, Kaj Sotala, or you, Wei Dai. See, I have the feeling that in regard to metaethics I have nothing relevant to say due to cognitive limitations. Therefore I have to rely on the opinion of people, which convinced me of their mental superiority in many other areas. I know that such line of thoughts can easily be interpreted as conformistic sycophancy and lead to cultish, fanatic behavior, and I usually disdain this kind of reasoning, but in my position this seems to be best strategy.

What does e.g. Bostrom think?

He hasn't taken a position on CEV, as far as I can tell.

I ( and probably many others) would be really interested in the opinion of other "famous lesswrongers" such as Yvain, Alicorn, Kaj Sotala, or you, Wei Dai.

I'm curious enough about this to look up the answers for you, but next time try "Google".

Yvain: Coherent extrapolated volition utilitarianism is especially interesting; it says that instead of using actual preferences, we should use ideal preferences - what your preferences would be if you were smarter and had achieved more reflective equilibrium - and that instead of having to calculate each person's preference individually, we should abstract them into an ideal set of preferences for all human beings. This would be an optimal moral system if it were possible, but the philosophical and computational challenges are immense.

Kaj: Some informal proposals for defining Friendliness do exist. The one that currently seems most promising is called Coherent Extrapolated Volition. In the CEV proposal, an AI will be built (or, to be exact, a proto-AI will be built to program another) to extrapolate what the ultimate desires of all the humans in the world would be if those humans knew everything a superintelligent being could potentially know; could think faster and smarter; were more like they wanted to be (more altruistic, more hard-working, whatever your ideal self is); would have lived with other humans for a longer time; had mainly those parts of themselves taken into account that they wanted to be taken into account. The ultimate desire - the volition - of everyone is extrapolated, with the AI then beginning to direct humanity towards a future where everyone's volitions are fulfilled in the best manner possible. The desirability of the different futures is weighted by the strength of humanity's desire - a smaller group of people with a very intense desire to see something happen may "overrule" a larger group who'd slightly prefer the opposite alternative but doesn't really care all that much either way. Humanity is not instantly "upgraded" to the ideal state, but instead gradually directed towards it.

CEV avoids the problem of its programmers having to define the wanted values exactly, as it draws them directly out of the minds of people. Likewise it avoids the problem of confusing ends with means, as it'll explictly model society's development and the development of different desires as well. Everybody who thinks their favorite political model happens to objectively be the best in the world for everyone should be happy to implement CEV - if it really turns out that it is the best one in the world, CEV will end up implementing it. (Likewise, if it is the best for humanity that an AI stays mostly out of its affairs, that will happen as well.) A perfect implementation of CEV is unbiased in the sense that it will produce the same kind of world regardless of who builds it, and regardless of what their ideology happens to be - assuming the builders are intelligent enough to avoid including their own empirical beliefs (aside for the bare minimum required for the mind to function) into the model, and trust that if they are correct, the AI will figure them out on its own.

Alicorn: But I'm very dubious about CEV as a solution to fragility of value, and I think there are far more and deeper differences in human moral beliefs and human preferences than any monolithic solution can address. That doesn't mean we can't drastically improve things, though - or at least wind up with something that I like!

See also Criticisms of CEV (request for links).

Thanks, this is awesome!

but next time try "Google".

I'm sorry....