Taken from some old comments of mine that never did get a satisfactory answer.

1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?

2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?

New Comment
141 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?

Yes, we do. First, we have an understanding of the mechanisms processes that produced old and modern values, and many of the same mechanisms and processes used for "ought" questions are also used for "is" questions. Our ability to answer "is" questions accurately has improved dramatically, so we know the mechanisms have improved. S... (read more)

Finally, it's hard to make an AGI if the rest of humanity thinks you're a supervillain, and anyone making an AGI based on a value system other than CEV most certainly is, so you're better off being the sort of researcher who would incorporate all humanity's values than the sort of researcher who wouldn't.

If you're openly making a fooming AGI, and if people think you have a realistic chance of success and treat you seriously, then I'm very sure that all major world governments, armies, etc. (including your own) as well as many corporations and individuals will treat you as a supervillain - and it won't matter in the least what your goals might be, CEV or no.

I think you didn't really engage with the questions. This is exactly the kind of reasoning I mocked in the post. All such desiderata get satisfied automatically if your comment was generated by your sincere volition and not something else :-)
No, you mocked finding the values themselves repugnant, not their underlying mechanisms. If we find out that a value only exists because of a historical accident plus status quo bias, and that any society where it wasn't the status quo would reject it when it was explained to them, then we should reject that value. The fact that my volition might just consist of a pointer to CEV does not seem like much of an argument for choosing it over CEV, given that my volition also includes lots of poorly-understood other stuff, which I won't get a chance to inspect if there's no extrapolation, and which is more likely to make things worse than to make them better. Also, consider the worst case scenario: I have a stroke shortly before the AI reads out my volition.
I think your arguments, if they worked, would prove way too much. This standard allows us to throw away all values not directly linked to inclusive genetic fitness, and maybe even those that are. There's no objective morality. This argument works just as well for defending concrete wishes ("volcano lair with catgirls") over CEV.
Huh? We must have a difference of definitions somewhere, because that's not what I think my argument says at all. No, it doesn't. This was a counterargument to the could-be-a-pointer argument, not a root-level argument; and if you expand it out, it actually favors CEV over concrete wishes, not the reverse. The could-be-a-pointer argument is that since one person's volition might just be the desire to have CEV implemented, so that one person's volition is at least as good as CEV. But this is wrong, because that person's volition will also include lots of other stuff, which is substantially random and so at least some of it will be bad. So you need to filter (extrapolate) those desires to get only the good ones. One way we could filter them is by throwing out everything except for a few concrete wishes, but that is not the best possible filter because it will throw out many aspects of volition that are good (and probably also necessary for preventing disastrous misinterpretations of the concrete wishes).
How confident are you that what's left of our values, under that rule, would be enough to be called a volition at all?
This does not mean that people from the old societies which had those values would also find them repugnant if they understood these causal mechanisms. Understanding isn't the problem. Values are often top-level goals and to that extent arbitrary. For instance, many people raised to believe in God #1 have values of worshipping him. They understand that the reason they feel that is because they were taught it as children. They understand that if they, counterfactually, were exchanged as newborns and grew up in a different society, they would worship God #2 instead. This does not cause them to hold God #1's values any less strongly.
My reading of society is that such understanding does move values, at least if the person starts in a universalist religion, like Christianity. But such understanding is extremely rare.
I would rephrase that as "such understanding moves values extremely rarely". I think it's not very rare for the understanding to exist but be compartmentalized.
Good point. But that can only work if your research is transparent. Otherwise, why would one believe you are not just signaling this attitude while secretly pursuing your selfish goals? That is the reason why governments get the complete source code of software products from companies like Microsoft.
In the context of machine intelligence, I reckon that means open-source software. I figure, if you try and keep your source code secret, only fools will trust you. More to the point, competing organisations - who are more willing to actually share their results - are likely to gain mindshare, snowball, and succeed first. Of course, it doesn't always work like that. There's a lot of secret sauce out there - especially server-side. However, for ethical coders, this seems like a no-brainer to me.
Are you claiming that it is intrinsically unethical to have closed-source code?
No. Keeping secrets is not normally considered to be "unethical" - but it is a different goal from trying to do something good.

... change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that?

I'm pretty sure that it is not purely progress, that 'drift' plays a big part. I see (current) human values as having three sources.

  • Our innate moral intuitions - which arose by way of an evolutionary process with a good deal of drift in the mechanism. (To say nothing of a good deal of ecological and climatic contingency in the sequence of environments presented.)
  • Our moral training, which depends on historical facts about the succ
... (read more)
Who said anything about pure progress? A mixture of progress and drift is still a net improvement, unless there's something special about the starting point that makes drift a net negative.
Oh, I agree. But some people worry that if drift is not completely suppressed, over time it can completely reverse your goal structure

How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition?

For the same reason they voluntarily do anything which doesn't perfectly align with their own personal volition. Because they understand that they can accomplish more of their own desires by joining a coalition and cooperating. Even though that means having to work to fulfill other people's desires to the same extent that you work to fulfill your own.

A mad scientist building an AI in his basement doesn't have to compromise with anyone, ... until he has to go out and get funding, that is.

So he'll get funding for one thing, and then secretly build something else. Or he'll wait and in another 20 years the hardware will be cheap enough that he won't need external funding. Or he'll get funding from a rich individual, which would result in a compromise between a total of 2 people - not a great improvement.
Or, even more likely, some other team of researchers, more tolerant of compromise, will construct a FOOMing AGI first.

On 2: maybe CEV IS EY's own personal volition :)

More seriously, probably game theoretic reasons. Why would anyone want to work with/fund EY if it was his own volition that was being implemented?

*Disclaimer: I didn't read any other comments, so this might just echo what someone else said

If 10 people work together with EY or fund him, they can agree on a combination of their utility functions, and each will get 1/11th of the universe to rule over. That's a far cry from implementing the CEV of all humans who have ever lived.
It's much easier to get donations and avoid potential political problems (like the US government intervening) if EY is implementing the CEV of all humans rather than the CEV of everyone who donates. If EY seems like a mad scientist hellbent on taking over the world for himself and a small group of people, many people will treat him appropriately. Just think about your gut-level reaction to hearing "EY wants to implement CEV for only SIAI volunteers and donors" and to "EY wants to implement CEV for all of humanity." Note: not that there won't be any political problems with CEV for all humans. Rather that pushing for CEV for a small group of people will cause more problems in this arena.
"Just think about your gut-level reaction to hearing "EY wants to implement CEV for only SIAI volunteers and donors" and to "EY wants to implement CEV for all of humanity."" The first actually sounds better to me. I am fairly certain most SIAI-involved people are well-meaning, or at very least would not choose to cause J Random Stranger any harm if they could help it. I'm not so certain about 'all of humanity'.
The relevant comparison isn't what 'all of humanity' would choose, but rather what all of humanity would choose once CEV is done with their preferences.

This has been a source of confusion to me about the theory since I first encountered it, actually.

Given that this hypothetical CEV-extracting process gets results that aren't necessarily anything that any individual actually wants, how do we tell the difference between an actual CEV-extracting process and something that was intended as a CEV-extracting process but that, due to a couple of subtle bugs in its code, is actually producing something other than its target's CEV?

Is the idea that humanity's actual CEV is something that, although we can't necessarily come up with it ourselves, is so obviously the right answer once it's pointed out to us that we'll all nod our heads and go "Of course!" in unison?

Or is there some other testable property that only HACEV has? What property, and how do we test for it?

Because without such a testable property, I really don't see why we believe flipping the switch on the AI that instantiates it is at all safe.

I have visions of someone perusing the resulting CEV assembled by the seed AI and going "Um... wait. If I'm understanding this correctly, the AI you instantiate to implement CEV will cause us all to walk around with watermelons on their feet."

"Yes," replies the seed AI, "that's correct. It appears that humans really would want that, given enough time to think together about their footwear preferences."

"Oh... well, OK," says the peruser. "If you say so..."

Surely I'm missing something?

In light of some later comment-threads on related subjects, and in the absence of any direct explanations, I tentatively (20-40% confidence) conclude that the attitude is that the process that generates the code that extracts the CEV that implements the FAI has to be perfect, in order to ensure that the FAI is perfect, which is important because even an epsilon deviation from perfection multiplied by the potential utility of a perfect FAI represents a huge disutility that might leave us vomiting happily on the sands of Mars. And since testing is not a reliable process for achieving perfection, merely for reducing defects to epsilon, it seems to follow that testing simply isn't relevant. We don't test the CEV-generator, by this view; rather we develop it in such a way that we know it's correct. And once we've done that, we should be more willing to trust the CEV-generator's view of what we really want than our own view (which is demonstrably unreliable). So if it turns out to involve wearing watermelons on our feet (or living gender-segregated lives on different planets, or whatever it turns out to be) we should accept that that really is our extrapolated volition, and be grateful, even if our immediate emotional reaction is confusion, disgust, or dismay. I hasten to add that I'm not supporting this view, just trying to understand it.
Given the choice between (apparently benevolent people's volition) + (unpredictable factor) or (all people's volition) + (random factor) I'd choose the former every time.
Extrapolating volition doesn't make agree with mine.
It's also entirely plausible that "implement CEV for Americans" will get less US government intervention than "implement CEV for all humanity," assuming that the US gov't takes any official notice of any of this in the first place. It's not entirely clear to me what follows from any of this political speculation, though.
I don't want to live with the wrong CEV for all eternity because it was politically expedient. Personally I would much prefer the former - and I'm not a SIAI volunteer or donor (although I could then become one).
Fair enough, but I was talking about your gut level reaction as a proxy for the rest of humanity's gut level reaction. You may not have the reaction, but most of the rest of the world would see "CEV for only us" and think mad scientist/doomsday cult/etc. because the pattern fits.
Most of the world are going to see the words "AI" and "Singularity", think mad scientist, and send troops. The word "CEV" they're going to ignore, because it's unfamiliar and the media won't tell them what it is.
You really think the public associates "AI" and "Singularity" with mad scientist? That seems like an exaggeration to me.
Depends on your definition of "the public". Most people in the world population have certainly never heard of "the singularity" and while they may have heard about the Hollywood concept of "AI" (which actually portraits UFAI pretty well, except that the Hollywood versions are normally stupider-than-humans) they know nothing about AI as it exists or might exist in reality. More to the point, very few people in the world have thought seriously about either topic, or ever will. I expect that most people will accept a version deriving from something presented in the media. Among the things the media might present, "mad science" ranks high: it's likely they'll call it "science" (or technology/engineering), and they will surely present it as impossible and/or undesirable, which makes it mad. Mad science, even Evil Mad Science, is really not so bad and may be a mark of respect. Contrast it with the popular image of Evil Science, like Nazi scientists doing human experiments. Or Unnatural Science, the Frankenstein meme (which the public image of cryonics barely skirts). The other image the SIngularity is tainted with in the public mind is, of course, "the rapture of the nerds": atheist geeks reinventing silly religion and starting cults (like LW). In other words, madness without the science. Mad science would be an upgrade to the SIngularity's public image right now. Mad science is something people take a little seriously, because it just might work, or at least leave a really big hole. Test my hypothesis! Try to explain the concept of a fooming AI-driven singularity to anyone who hasn't heard of it in depth, in 5 minutes - more than most people will spend on listening to the media or thinking about the subject before reaching a conclusion. See if you can, even deliberately, make them reach any conclusion other than "mad scientist" or "science-religious cultist" or "just mad".
Explaining it to geeks is easy enough IME. ("There's no reason an AI would be anything like a human or care about anything humans care about, so it might increase its power then kill us all by accident. Friendly AI is the quest to make an AI that actually cares about humans.") Non-geeks, I suspect results like you describe.
For non-geeks, I would drop the word "intelligence", which carries too much baggage. "Machines that can improve their ability to improve themselves can improve very quickly -- much faster than you might expect if you don't look at the math. And if a machine quickly self-improves to the point where it can change the world in radical ways, those changes might make us really unhappy or even kill us all. So we want self-improving machines to be 'Friendly' -- that is, we want them to be designed in such a way that the changes they make to themselves and their environment are good for humans. The upside is that a Friendly self-improving machine can also make the environment much, much, much better than you might expect... for example, it can develop improved technologies, cures for diseases, more reliable economic models, extend longevity, etc." Come to think of it, that might be better for many geeks as well, who are not immune to the baggage of "intelligence". Though many would likely be offended by my saying so.
Yes - and geeks are not representative of the population at large, and not at all representative of powerful individuals (politicians, government officials, army commanders, rich businessmen). Even with geeks, I expect a success rate well below 100% due to future shock and imperfect updating.
'Accordingly' would seem to be the appropriate word in the context.
Well, surely it would depend on the alternatives. If I believed that EY can build a superhuman AGI in my lifetime that optimizes for the reflectively stable ways EY prefers the world to be, and otherwise believe what I currently do about the world (which includes not believing that better alternatives are likely in my lifetime), I would enthusiastically support (e.g., fund) that effort. Say what I will about EY's preferences, I see no reason to expect them to leave me worse off than the current state of affairs.

I would vote +10 each for those two questions.

How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition?

That's exactly my objection to CEV. No-one acts on anything but their personal desires and values, by definition. Eliezer's personal desire might be to implement CEV of humanity (whatever it turns out to be). I believe, however, that for well over 99% of humans this would not be the best possible outcome they might desire. At best it might be a reasonable compromise, but that would depend entirely on what the CEV actually ended up being.

9Eliezer Yudkowsky
I'm not clear on what you could mean by this. Do you mean that you think the process just doesn't work as advertised, so that 99% of human beings end up definitely unhappy and with there existing some compromise state that they would all have preferred to CEV? Or that 99% of human beings all have different maxima so that their superposition is not the maximum of any one of them, but there is no single state that a supermajority prefers to CEV?
Yes. I expect CEV, if it works as advertised, to lead to a state that almost all humans (as they are today, with no major cognitive changes) would see as an acceptable compromise, an improvement over things today, but far worse than their personal desires implemented at the expense of the rest of humankind. Therefore, while working on CEV of humanity might be a good compromise and cooperation, I expect any group working on it to prefer to implement that group's CEV, instead. You say that you (and all people on this project) really prefer to take the CEV of all humanity. Please explain to me why - I honestly don't understand. How did you end up with a rare preference among humans, that says "satisfy all humans even though their desires might be hateful to me"?
"but far worse than their personal desires implemented at the expense of the rest of humankind." uh....i thought this was sort of the point. also, given holodecks (or experience machines of any sort), I disagree. EDIT: never mind, conversational context mismatch.
If that's the point, then why does EY prefer it over implementing the CEV of himself and a small group of other people? As for holodecks (and simulations), as long as people are aware they are in a simulation, I think many would care no less about the state of the external world. (At a minimum they must care somewhat, to make sure their simulation continues to run.)
um I think a miscommunication occurred. I am not commenting on what eliezer wants or why. I am commenting on my understanding of CEV being a (timeless) utilitarian satisfaction of preference.

One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results.

I hadn't seen this before, but it strikes me as irredeemably silly. If we're picking a specific person (or set of people) from antiquity to compare, are we doing so randomly? If so, the results will be horrifying. If not, then we're picking them according to some standard- and why don't we just encapsulate that standard directly?

Here's a quote from the CEV paper: If Archimedes (or a random person) could live for a thousand years (together with the rest of humanity), could think a billion times faster, could learn all the FAI knows, etc, etc, then they'd very likely arrive at the same answer as a modern person. So it shouldn't matter which person/people you pick. This is how the extrapolation is supposed to work.
First, this seems implausible (people really do have different desires and personalities), and second, if plausible, the starting location doesn't seem to matter. If you take Archimedes and a brain-damaged murderer and a chimpanzee and end up with similar outputs, what did you need the inputs for? They don't appear to have made much difference. If the answer is that you'll modify the chimp and the murderer to become like Archimedes, then why don't you just copy Archimedes directly instead of doing it by scrubbing out those individuals and then planting Archimedes inside? Until CEV has a plan for dealing with envy, it strikes me as underplanned. Real humans interfere with each other- that's part of what makes life human.
The CEV of chimpanzees would not be the same as the CEV of humans.
What sort of differences are we looking at, here?
The CEV of humans, if it exists, would depend in part on those things all human minds have in common. Chimpanzees have been endowed with a different sort of mind — and, I expect, different values.
So, what all humans have in common is that their most dangerous enemy are other humans. I don't see how that's getting us anywhere good. [edit]I forgot to add- I'm looking for specificity here. What do humans care about that chimps don't? If chimps thought ten thousand times faster, would they care about those things?
I will clarify: My second link, "Thou Art Godshatter", says that our values are causally rooted in evolution, which gave us our brains. CEV is supposed to extrapolate that chain of causality, in a certain way. My first link, "The Psychological Unity Of Humankind", says that human brains are similar to each other. Chimpanzee brains are unlike human brains. There is no reason to expect human universal values to be identical to chimpanzee universal values. There is no reason to expect these species' extrapolated values to be identical. That is what i am saying. Are you skeptical of this argument? Art, for example. The question you want to ask is, Would a CEV-extrapolated chimpanzee care about art? I don't see why it should.
Sure there is- "maximize inclusive genetic fitness." Are you familiar with the theory that the human brain is essentially a peacock's tail, and that language, creative endeavors, and so on were evolved primarily to attract mating partners? It's possible that a chimp that gets a lot smarter won't start having more intellectual tastes, but the basic problem of "who do I have sex with, and how do I get them to agree?" is still around and it seems like they would try very similar methods to what we have.

Sure there is- "maximize inclusive genetic fitness."

That is an anthropomorphic representation of the 'values' of a gene allele. It is not the value of actual humans or chimpanzees.

Ah, it seems we have different ideas about what human values actually are. Organisms are adaptation-executers, not fitness-maximizers. Art is a human terminal value. Maximizing inclusive genetic fitness is not. Even if the causal reason for us having this value were that artsy people get more sex, we would value art for its own sake. I highly recommend the wiki article on the complexity of value and the articles linked from it.
Wait wait wait. Are we talking about human values, or human universal values? Those seem to me to be different concepts. It seems to me that what we have in common is that we're competing with one another and what differs between humans is how we compete and what we seek to maximize. I think the difference between my approach to music and a musician's approach to music is more than a difference of degree, and so am reluctant to say we share the same value. Given the many differing attitudes towards justice, it seems more apt to call it a political ploy about precedents than one objective standard all humans strive towards. I could keep going, but hopefully two examples is sufficient. Human values seem to be individual values, first and foremost. And when we remove some human values- predation, cruelty, rape, domination- from the list because their time has passed, it is not clear to me why the others remain. If we alter ourselves so we no longer understand events as narratives (since that's a motherlode of bias right there), will literature be a casualty of that improvement?
Agreed that individual humans have differing values. However I believe there's more to say about human universal values than "we're competing with one another". Here are two pieces of evidence: (1) Almost all human cultures have what we recognize as art. So far as I know, chimpanzee cultures do not. (Although a quick google search tells me that chimps who live in human cultures do create art.) So art, broadly defined, is universal; but individual humans have more specific tastes. I expect humans and chimpanzees have senses of justice (differing from individual to individual). (2) Individuals can change their values through moral arguments and education. If I try, through moral argument, to convince someone of my idea of justice, it's because I feel that my idea of justice has at least a somewhat universal appeal. And, with sufficient resources, you could become a musician and gain a richer appreciation of music. One could imagine an extrapolation process under which individuals' values converge, at least some of the time. However: I share the concern that I think you're raising here: For all we know, moral progress might not be pointing towards any particular destination. More research needs to be done on the mechanics of morality.
Mostly agreed. This I'm not as sure about. I can become competent at playing an instrument and better at interpreting music, but I'm not sure I can rewrite my talents. If those are determined by a relatively inelastic part of my brain's configuration, then it seems likely that the pathway to becoming a Beethoven simply does not exist for me. A better example might be heroic compassion- people who put themselves at risk without thinking to save others. The consensus opinion is it's probably at least somewhat genetic, and you either are a hero (in that narrow sense) or you aren't. I could be roused to tears by tales of heroic compassion but not have the unthinking impulse to do it, and it's not clear that if I don't have it I could acquire it. There are other things that people do acquire- bodyguards learn how to operate after being shot, and soldiers learn how to stay active in combat situations- so it might be possible. But I think bounded potentials are more realistic than unbounded potentials.
Aumann's agreement theorem seems to imply that individual extrapolated volition assignments must agree on statements of fact: the setup implies that they're simulated as perfect reasoners and share a knowledge pool, and the extrapolation process provides for an unbounded number of Bayesian updates. So we can expect extrapolated volition to cohere exactly to the extent that it's based on common fundamental goals: not immediate desires and not possibly-fallible philosophical results, but the low-level affective assignments that lead us to think of those higher-level results as desirable or undesirable. To what extent do common fundamental goals drive our moral reasoning? I don't know, but individual differences do exist (the existence of masochistic people should prove that), and if they're large enough then CEV may end up looking incomplete or unpleasantly compromise-driven.
But is that relevant to the question that CEV tries to answer? As far as I know, most masochistic people don't also hold a belief that everybody should be masochistic.
Even if individual differences in fundamental goals are not extended to other people as imperatives, they imply that the ability of a coherent extrapolated volition scheme to satisfy individual preferences must be limited. Depending on the size of those differences, this may or may not be a big deal. And we're very likely to have fundamental social goals that do include external imperatives, although masochism isn't one.
That would make them throughly non-human in psychology. It's a possibly useful take on CEV but I'm not sure it's a standard one.
Mea culpa; I seem to have overgeneralized the extrapolation process. But unless all its flaws are context-independent and uniformly distributed across humanity, I suspect they'd make human volition less likely to cohere, not more.
The CEV claims that is really doesn't matter whom we sample, as long as we sample enough people who are different enough from one another. I agree that this needs very strong proofs for me to believe it.

In questions like this, it's very important to keep in mind the difference between state of knowledge about preference (which corresponds to explicitly endorsed moral principles, such as "slavery bad!"; this clearly changed), and preference itself (which we mostly don't understand, even if our minds define what it is). Since FAI needs to operate according to preference, and not out state of knowledge about preference, any changes in our state of knowledge (moral principles) is irrelevant, except for where they have a chance of reflecting changes ... (read more)

So the idea is that 21st century American and caveman Gork from 40000 BC probably have very similar preference, because they have very similar cognitive architecture

If something like Julian Jaynes' notion of a recent historical origin of consciousness from a prior state of bicameralism is true, we might be in trouble there.

More generally, you need to argue that culture is a negligible part of cognitive architecture; I strongly doubt that is the case.

What do you believe about these immutable, universal preferences?

Here are some potential problems I see with these theorized builtin preferences, since we don't know what they actually are yet:

  • They may conflict with our consciously held morals or desires: e.g., they may not include compassion or altruism for anyone we never met face to face. They may even conflict with both our own morals and with Gork's morals, at the same time. In that case, why shouldn't we privilege our conscious desires?
  • They may not be very interesting: just "want to have food and comfort, sex, social status, children". They wouldn't include many things we consciously want because those thing evolved out of subverted button-pushing or as hyperstimuli - such as scientific research. Why should we choose to discard such values just because they aren't embodied in our hardware?
  • They may be susceptible to many cognitive traps and dead-ends (e.g. wireheading) that we can only work around using conscious thought and our consciously held values.
  • They may include values or desires we would consciously prefer to eradicate entirely, such as a drive for fighting or for making war. If you thought that 1) most
... (read more)
CEV is supposed to incorporate not only the things you want (or enjoy), but also the things you want to want (or don't want to enjoy, in this case).
Supposed to based on what evidence? As Vladimir Nesov said, there are builtin preferences (which CEV takes into account), and then there are our conscious desires or "state of knowledge about preference". The two may be in conflict in some cases. How do you know that CEV won't include something that all the humans alive today, on the conscious level, would find hateful?
If you're saying actual human preference is determined by human biology and brain architecture, but mostly independent from brain content, this is a very new claim that I don't remember hearing ever before. You'll need pretty strong arguments to defend it. I'd bet at about 80% odds that Eliezer would disagree with it.
Hmm, I think I've said this many times already. Of course beliefs are bound to change preference to some extent, but shouldn't be allowed to do this too much. On reflection, you wouldn't want the decisions (to obtain certain beliefs) of your stupid human brain with all its biases that you already know not to endorse, to determine what should be done with the universe. Only where such decisions manage to overcome this principle, will there be change, and I can't even think of a specific example of when that should happen. Generally, you can't trust yourself. The fact that you believe that X is better than Y is not in itself a reason to believe that X is better than Y, although you might believe that X is better than Y because it is (because of a valid reason for X being better than Y, which your belief in X being better than Y isn't). So when beliefs do change your preference, it probably won't be in accordance with beliefs about preference.
As opposed our biology and brain architecture, which were designed by the blind idiot god.
But don't our biological preferences imply pressing pleasure buttons? Isn't it just for our cultural/learnt preferences (brain content) that we assign low utility to drug induced happiness and push-button pleasure?

For 1), the sense I got was that it assumes no progress, and furthermore that if you perform an extrapolation that pleases 21st century Americans but would displease Archimedes or any other random Syracusan, your extrapolation-bearing AGI is going to tile the universe with American flags or episodes of Seinfield.

For 2), it feels a No True Scotsman issue. If by some definition of current, personal volition you exclude anything that isn't obviously a current, personal desire by way of deeming it insincere, then you've just made your point tautological. Do yo... (read more)

You're right, that would be terrible. They should be Texan flags. I think there other failure modes that are significant- for example, a world where women are given full moral weight and autonomy would probably be terrifying to someone whose society is centered around women being the most valuable property there is (for both men and women, I imagine- abuse leaves quite the mark on minds).
Exactly. The desired case is where there are no failure modes; the CEV seems like it should logically have no failure modes that it can't avoid, and that any it can't avoid cannot be avoided by any system.

If Archimedes and the American happen to extrapolate to the same volition, why should that be because the American has values that are a progression from those of Archimedes? It's logically possible that both are about the same distance from their shared extrapolated volition, but they share one because they are both human. Archimedes could even have values that are closer than the American's.

The extrapolated volition of a cultural group might, given certain assumptions, resolve to a single set of values. If that were the case, you could express changes in expressed volition in that group over time as either progress toward or regression from that EV, and for various reasons I'd expect them to usually favor the "progress" option. I suspect that's what cousin_it is getting at. I'm not convinced that we gain anything by expressing that in simple terms of progress, though. The expressed volition of modern Western society is probably closer to the CEV of humanity than the cultural norms of 500 BC were, but cultural progression along the metric described above hasn't been monotonic and isn't driven by reference to CEV; it's the result of a stupendous hodgepodge of philosophical speculation, scale effects, technological changes, and random Brownian motion. That might resolve to something resembling capital-P progress, especially since the Enlightenment, but it's not predictively equivalent to what people usually mean by the term. And it certainly can't be expected to apply over all cultural traditions. The check on CEV described in the OP, however, should.

This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?

Changes in human values seem to have generally involved expanding the subset of people with moral worth, especially post-enlightenment. This suggests to me that value change isn't random drift, but it's only weak evidence that the changes reflect some inevitable fact of human nature.

Suppose, just for the sake of specificity, that it turns out that the underlying mechanism works like this: * there's an impulse (I1) to apply all controllable resources to my own gratification * there's an impulse (I2) to extend my own self-gratifying impulses to others * I1 is satiable... the more resources are controllable, the weaker it fires * I2 is more readily applied to a given other if that other is similar to me * The degree to which I consider something as having "moral worth" depends on my willingness to extend my own self-gratifying impulses to it. (I'm not claiming that humans actually have a network like this, I just find it's easier to think about this stuff with a concrete example.) Given that network, we'd expect humans to "expand the subset of people with moral worth" as available resources increase. That would demonstrably not be random drift: it would be predictably correlated with available resources, and we could manipulate people's intuitions about moral worth by manipulating their perceptions of available resources. And it would demonstrably reflect a fact about human nature... increasingly more refined neuroanatomical analyses would identify the neural substrates that implement that network and observe them firing in various situation. ("Inevitable"? No fact about human nature is inevitable; a properly-placed lesion could presumably disrupt such a network. I assume what's meant here is that it isn't contingent on early environment, or some such thing.) But it's not clear to me what demonstrating those things buys us. It certainly doesn't seem clear to me that I should therefore endorse or repudiate anything in particular, or that I should prefer on this basis that a superintelligence optimize for anything in particular. OTOH, a great deal of the discussion on LW on this topic seems to suggest, and often seems to take for granted, that I should prefer that a superintelligence optimize for some value V if and only if it turns
There have been other changes as well, which don't fit this generalization. For instance, we now treat the people who do have moral worth much better, in many ways. Also, there have historically been major regressions along the "percentage of society having moral worth" scale. E.g., Roman Republican society gave women, and all Roman citizens, more rights than the post-Roman Christian world that followed. Finally, "not random drift" isn't the same as "moving towards a global singular goal". A map with fractal attractors isn't random, either.
Agreed on all points.
Are you sure this isn't the Texas sharpshooter fallacy? That is to say, values are complicated enough that if they drifted in a random direction, there would exist a simple-sounding way to describe the direction of drift (neglecting, of course, all the other possible axes of change)- and of course this abstraction would sound like an appealing general principle to those with the current endpoint values.

I'm still wondering how you'd calculate a CEV. I'm still wondering how you'd calculate one human's volition. Hands up all those who know their own utility function. ... OK, how do you know you've got it right?

I don't think anyone here would have raised their hand at the first prompt, with the exceptions of Tim Tyler and Clippy.
Omohundro's rationality in a nutshell reads: 1. Have clearly specified goals. 2. In any situation, identify the possible actions. 3. For each action consider the possible consequences. 4. Take the action most likely to meet the goals. 5. Update the world model based on what actually happens. Step 1 seems to be a fairly basic and fundamental one to me. If you don't know what you are trying to do, it is not easy to know whether you are succeeding at it - or not. I think rational agents should try and figure out what they want. "It's complicated" is a kind of answer - but not a very practical or useful one. I suspect failures at step 1 are mostly to do with signalling. For example, Tom Lehrer once spoke of "a man whose allegiance is ruled by expedience". If you publish your goals, that limits your ability to signal motives that are in harmony with those of your audience - reducing your options for deceptive signalling.
I doubt this very strongly, and can say with the an extremely high level of confidence that I fail at step 1 even when I don't have to tell anyone my answer. What humans mean by "clearly specified" is completely different than what is required to write down some program whose output you would like an omnipotent agent to maximize. There are other reasons to argue that CEV is not a useful concept, but this is really a bad one.
FWIW, I wasn't talking about CEV or superintelligent agents. I was just talking about the task of figuring out what your own goals were. We can't really coherently discuss in detail the difficulties of programming goals into superintelligent agents until we know how to build them. Programming one agent's goals into a different agent looks challenging. Some devotees attempt to fulfill their guru's desires - but that is a trickier problem than fulfilling their own desires - since they don't get direct feedback from the guru's senses. Anyway, these are all complications that I did not even pretend to be going into. What do you actually mean when you say you "fail at step 1". You have no idea what your own goals are?!? Or just that your knowledge of your own goals is somewhat incomplete?
I wasn't talking about CEV or superintelligent agents either. I mean that I have no idea how to write down my own goals. I am nowhere close to having clearly specified goals for myself, in the sense that I as a mathematician usually mean "clearly specified". The fact that I can't describe my goals well enough that I could tell them to someone else and trust them to do what I want done is just one indication that my own conception of my goals is significantly incomplete.
OK. You do sound as though you don't have very clearly-defined goals - though maybe there is some evasive word-play around the issue of what counts as a "clear" specification. Having goals is not rocket science! In any case, IMO, you would be well advised to start at number 1 on the above list. * http://www.best-self-help-sites.com/goal-setting.html
The simplest known complete representation of my utility function is my brain, combined with some supporting infrastructure and a question-asking procedure. Any utility function that is substantially simpler than my brain has almost certainly left something out. The first step in extracting a human's volition is to develop a complete understanding of the brain, including the ability to simulate it. We are currently stuck there. We have some high-level approximations that work around needing this understanding, but their accuracy is questionable.
Assume you have a working simulation. What next?
The implication is that you let the FAI do it. (The task in building the FAI so that it will do this has somewhat different challenges.)
Yes, but saying "we get the FAI to do it" is just moving the hard bit. The intelligence to calculate the CEV needs to be pre-FOOM. We have general intelligences of pre-FOOM level already. So: what would a pre-FOOM general intelligence actually do?

The intelligence to calculate the CEV needs to be pre-FOOM.

No, a complete question to which CEV is an answer needs to be pre-FOOM. All an AI needs to know about morality before it is superintelligent is (1) how to arrive at a CEV-answer by looking at things and doing calculations and (2) how to look at things without breaking them and do calculations without breaking everything else.

Ah, OK. So do we have any leads on how to ask the question?
I believe the idea is the have the pre-FOOM AI commit to doing the calculation first thing post-FOOM.
Eliezer beat me to the punch. My answer was approximately the same.

This is a great post and some great points are made in discussion too.

Is it possible to make exact models exhibiting some of these intuitive points? For example, there is a debate about whether extrapolated human values would depend strongly on cognitive content or whether they could be inferred just from cognitive architecture. (This could be a case of metamoral relativism, in which the answer simply depends on the method of extrapolation.) Can we come up with simple programs exhibiting this dichotomy, and simple constructive "methods of extrapolati... (read more)

2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition?

I am honestly not sure what to say to people who ask this question with genuine incredulity, besides (1) "Don't be evil" and (2) "If you think clever arguments exist that would just compel me to be evil, see rule 1."

I don't understand your answer. Let's try again. If "something like CEV" is what you want to implement, then an AI pointed at your volition will derive and implement CEV, so you don't need to specify it in detail beforehand. If CEV isn't what you want to implement, then why are you implementing it? Assume all your altruistic considerations, etc., are already folded into the definition of "you want" - just like a whole lot of other stuff-to-be-inferred is folded into the definition of CEV.

ETA: your "don't be evil" looks like a confusion of levels to me. If you don't want to be evil, there's already a term for that in your volition - no need to add any extra precautions.

If CEV isn't what you want to implement, then why are you implementing it?

The sane answer is that it solves a cooperation problem. ie. People will not kill you for trying it and may instead donate money. As we can see here this is not the position that Eliezer seems to take. He goes for the 'signal naive morality via incomprehension' approach.

I do not think this would work. Take the viewpoint of a government. What does CEV do? It does deprive them of some amount of ultimate power. The only chance I see to implement CEV using an AI going FOOM is either secretly or due to the fact that nobody takes you serious enough. Both routes are rather unlikely. Military analysis of LW seems to be happening right now. And if no huge unforeseeable step towards AGI happens, it will move forward gradually enough for governments (or other groups), who already investigate LW and the SIAI, to notice and take measures to disable anyone trying to implement CEV. The problem is that once CEV becomes feasible, governments will consider anyone working on it as an attempted coup. Regardless of the fact that the people involved might not perceive it to be politics, working on a CEV is indeed an highly political activity. At least this will be the viewpoint of many who do not understand CEV or oppose it for different reasons.
Pardon me. To be more technically precise: "Implementing an AI that extrapolates the volition of something other or broader than yourself may facilitate cooperation. It would reduce the chance that people will kill you for the attempt and increase the chance of receiving support."
Aha, I see. My mistake, ignoring the larger context. Seen this? Anyway, I feel that it is really hard to tackle this topic because of its vagueness. As multifoliaterose implied here, at the moment the task to recognize humans as distinguished beings already seems to me too broad a problem to tackle directly. Talking about implementing CEV indirectly, by derivation from Yudkowsky's mind, versus specifying the details beforehand, seems to be fun to argue but ultimately ineffective at this point. In other words, an organisation that claims to solve some meta problem by means of CEV is only slightly different from one proclaiming to make use of magic. I'd be much more comfortable to donate to a decision theory workshop for example. I digress, but I thought I should clarify some of my intention for always getting into discussions involving the SIAI. It is highly interesting, sociological I suppose. On the one hand people take this topic very serious, the most important topic indeed, yet seem to be very relaxed about the only organisation involved in shaping the universe. There is simply no talk about more transparency to prove the effectiveness of the SIAI and its objectives. Further, without transparency you simply cannot conclude that because someone writes a lot of ethical correct articles and papers that that output is reflective of their true goals. Also people don't seem to be worried very much about all the vagueness involved here, as this post proves once again. Where is the progress that would justify further donations? As I said, I digress. Excuse me but this topic is the most fascinating issue for me on LW. Back to your comment, it makes sense. Surely if you tell people to also take care of what they want, they'll be less opposed than if you told them that you'll just do what you want because you want to make them happy. Yet there will be those who don't want you to do it, regardless of wanting to make them happy. There will be those who only want you to im
Are you sure? I imagine there are many people interested in evaluating the effectiveness of the SIAI. At least I am, and from the small number of real discussions I have had about the SIAI's project I extrapolate that uncertainty is the main inhibitor of enthusiasm (although of course if the uncertainty was removed this may create more fundamental problems).
The counterargument I've read in earlier ("unreal") discussions on the subject is, roughly, that people who claim their support for SIAI is contingent on additional facts, analyses, or whatever are simply wrong... that whatever additional data is provided along those lines won't actually convince them, it will merely cause them to ask for different data.
I assume you're referring to Is That Your True Rejection?.
(nods) I think so, yes.
This strikes me as a difficult thing to know, and the motives that lead to assuming it are not particularly pleasant.
While the unpleasant readings are certainly readily available, more neutral readings are available as well. By way of analogy: it's a common relationship trope that suitors who insist on proof of my love and fidelity won't be satisfied with any proofs I can provide. OTOH, it's also a common trope that suitors who insist that I should trust in their love and fidelity without evidence don't have them to offer in the first place. If people who ask me a certain type of question aren't satisfied with the answer I have, I can either look for different answers or for different people; which strategy I pick depends on the specifics of the situation. If I want to infer something about someone else based on their choice of strategy I similarly have to look into the specifics of the situation. IME there is no royal road to the right answer here.
It is a shame that understatement is so common it's hard to be precise quickly; I meant to include neutral readings in "not particularly pleasant."
Huh. Interesting. Yes, absolutely, I read your comment as understatement... but if you meant it literally, I'm curious as to the whole context of your comment. For example, what do you mean to contrast that counterargument with? That is: what's an example of an argument for which the motives for assuming it are actively pleasant? What follows from their pleasantness?
A policy like "assume good faith" strikes me as coming from not unpleasant motives. What follows is that you should attribute a higher probability of good faith to someone who assumes good faith. If someone assumes that other people cannot be convinced by evidence, my knowledge of projection suggests that should increase my probability estimate that they cannot be convinced by evidence. That doesn't entirely answer your question- since I talked about policies and you're talking about motives- but it should suggest an answer. Policies and statements represent a distribution of sets of possible motives, and so while the motives themselves unambiguously tell you how to respond the policies just suggest good guesses. But, in general, pleasantness begets pleasantness and unpleasantness begets unpleasantness.
It strikes me as a tendency that can either be observed as a trend or noted to be absent. This strikes meas a difficult thing to know. And distastefully ironic.
There are a large number of possible motives that could lead to assuming that the people in question are simply wrong. None of them are particularly pleasant (but not all of them are unpleasant). I don't need to know which motivates them in order to make the statement I made. However, the statement as paraphrased by TheOtherDave is much more specific; hence the difficulty. As a more general comment, I strongly approve of people kicking tires, even if they're mine. When I see someone who doesn't have similar feelings, I can't help but wonder why. Like with my earlier comment, not all the reasons are unpleasant. But some are.
Please read this comment. It further explains why I actually believe that transparency is important to prove the effectiveness of the SIAI. I also edited my comment above. I seem to have messed up on correcting some grammatical mistakes. It originally said, there is simply no talk about more transparency....
I didn't intend to write that. I don't know what happened there.
"The only organisation involved in shaping the universe"?!? WTF? These folks have precious little in terms of resources. They apparently haven't even started coding yet. You yourself assign them a miniscule chance of succeeding at their project. How could they possibly be the "the only organisation involved in shaping the universe"?!?
Really? Even if they were working on a merely difficult problem, you would expect coding to be the very last step of the project. People don't solve hard algorithmic problems by writing some code and seeing what happens. I wouldn't expect an organization working optimally on AGI to write any code until after making some remarkable progress on the problem. There could easily be no organization at all trying to deliberately control the long-term future of the human race; we'd just get whatever we happened to stumble into. You are certainly correct that there are many, many organizations which are involved in shaping our future; they just rarely think about the really long-term effects (I think this is what XiXiDu meant).
IMO, there's a pretty good chance of an existing organisation being involved with getting there first. The main problem with not having any working products is that it is challenging to accumulate resources - which are needed to hire researchers and programmers - which you need to fuel your self-improvement cycle. Google, hedge funds, and security agencies have their self-improvement cycle already rolling - they are evidently getting better and better as time passes. That results in accumulated resources, which can be used to drive further development. If you were a search company who aimed directly at a human-level search agent, you are now up against a gorilla with an android army who already has most of the pieces of the puzzle. Waiting until you have done all the relevant R+D is just not how software development works. You get up and running as fast as you can - or else someone else does that first - and eats your lunch.
Right - but this seems as though it isn't how things are likely to go down. CEV is a pie-in-the-sky wishlist - not an engineering proposal. Those attempting to directly implement things like it seem practically guaranteed to get to the plate last. For example Ben's related proposal involved "non-invasive" scanning of the human brain. That just isn't technology we will get before we have sophisticated machine intelligence, I figure. So: either the proposals will be adjusted so they are more practical en route - or else, the proponents will just fail. Most likely there will be an extended stage where people tell the machines what to do - much as Asimov suggested. The machines will "extrapolate" in much the same way that Google Instant "extrapolates" - and the human wishes will "cohere" - to the extent that large-scale measures in society encourage cooperation.
FWIW, I mostly gave up on them a while back. As a spectator, I mostly look on, grimacing, while wondering whether there are any salvage opportunities.
Here is the original comment. It wasn't my intention to say that, it originally said there is simply no talk about more transparency.... I must have messed up on correcting some mistakes.
I just copied-and-pasted verbatim. However the current edit does seem to make more sense.
That is more-or-less my own analysis. Notoriously: CEV may get some the votes from the poor - but offers precious little to the rich. Since those are the folk who are running the whole show, it is hard to see how they will approve it. They won't approve it - there isn't anything in it for them. So, I figure, the plan is probably pretty screwed - the hopeful plan of a bunch of criminal (their machine has no respect for the law!) and terrorist (if they can make it stick!) outlaws - who dream of overthrowing their own government.
Awesome comment, thanks. I'm going to think wishfully and take that as SIAI's answer.
Reciprocal altruism sometimes sends a relatively weak signal - it says that you will cooperate so long as the "shadow of the future" is not too ominous. Invoking "good" and "evil" signals more that you believe in moral absolutes: the forces of good and evil. On the one hand, that is a stronger signalling technique - it attempts to signal that you won't defect - no matter what! On the other hand, it makes you look a bit as though you are crazy, don't understand rationality or game theory - and this can make your behaviour harder to model. As with most signalling, it should be costly to be credible. Alas, practically anyone can rattle on about good and evil. I am not convinced it is very effective overall.
Also, from OP: If what you want is to have something pointed at your volition then you first have to design the AI that points to it rather than something else. This whole CEV stuff was an attempt at answering the "design an AI that points to it" question, and the crucial consideration that led to it was that there is no magically intelligent system that would automatically converge to what we'd prefer. Of course, there remains the question of balance between AI structure determined by what I want and AI structure determined by what the AI thinks I want. The realization of FAI is that you cannot eliminate the first item from the balance and get an acceptable result. It is better to ask "How could I best solve the FAI problem using my brain rather than something else?" than to ask "Could I use something else than my brain to solve the FAI problem?". If a CEV isn't what I want to implement, it is still good to implement CEV because it'll find out what I want to implement - plus more stuff that I would agree to implementing but not think of in the first place.
Eliezer didn't realize that you meant his own personal CEV, rather than his current incoherent, unextrapolated volition.

You have a personal definition for evil, like everyone else. Many people have definitions of good that include things you see as evil; some of your goals are in conflict. Taking that into account, how can you precommit to implementing the CEV of the whole of humanity when you don't even know for sure what that CEV will evaluate to?

To put this another way: why not extrapolate from you, and maybe from a small group of diverse individuals whom you trust, to get the group's CEV? Why take the CEV of all humanity? Inasmuch as these two CEVs differ, why would you not prefer your own CEV, since it more closely reflects your personal definitions of good and evil?

I don't see how this can be consistent unless you start out with "implementing humanity's CEV" as a toplevel goal, and any divergence from that is slightly evil.

One thing you could say that might help is if you were clearer about when you consider it evil to ignore the volition of an intelligence, since it's clear from your writing that sometimes you don't. For example, "don't be evil" clearly isn't enough of an argument to convince you to build an AI that fulfills Babykiller or Pebblesorter or SHFP volition, for example, should we encounter any... although at least some of those would indisputably be intelligences. Given that, it might reassure people to explicitly clarify why "don't be evil" is enough of an argument to convince you to build an AI that fulfills the volition of all humans, rather than (let's say) the most easily-jointly-satisfied 98% of humanity, or some other threshold for inclusion. If this has already been explained somewhere, a pointer would be handy. I have not read the whole site, but thus far everything I've seen to this effect seems to boil down to assuming that there exists a single volition V such that each individual human would prefer V upon reflection to every other possible option, or at least a volition that approximates that state well enough that we can ignore the dis-satisfied minority. If that assumption is true, the answer to the question you quote is "Because they'd prefer the results of doing so," and evil doesn't enter into it. If that assumption is false, I'm not sure how "don't be evil" helps.

This seems to assume that change in human values over time is mostly "progress" rather than drift.

I do not accept the proposition that modern values are superior to ancient values. We're doing better in some regards than the ancients; worse in other regards. To the extent that we've made any progress at all, it's only because the societies that adopted truly terrible moral principles (e.g. communism) failed.

Please clarify: do you think there's some objective external standard or goal, according to which we've been progressing in some areas and regressing in others?

If you're aware of what that goal is, why haven't you adopted it as your personal morals, achieving 100% progression?

If you're not aware of what it is, why do you think it exists and what do you know about it?

To me the word "morality" means the philosophy underlying the social contract. The goal is human well-being. When I say we've regressed in some areas, I mean that we've modified the social contract in ways that are harmful to human well-being. To make the terms concrete, consider the case of communism. Clearly this was a drastic revision of the social contract with disastrous consequences for human well-being. The revision was justified by a certain set of moral values that turned out to be far inferior to the more traditional ones.
What is an objective measure of human well-being? Whatever the answer, it will contradict some people's morals; so you're condemning these morals as wrong. To me that seems to be favoring your own morals. Don't get me wrong - our moral are probably quite compatible. It's just that I think "I prefer my own morals" is both simpler and more honest than "I prefer an objective measure, which just happens to agree with my morals, and which many other people don't accept".
Communism isn't a truly terrible moral principle. It's just a moral principle that is naive and impractical as a political and economic solution given humanity as it is. A couple of elements that we would disparage as communist may actually be necessary in a post-human FAI enhanced universe in order to prevent a Hansonian hell.