This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the twenty-third section in the reading guideCoherent extrapolated volition.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “The need for...” and “Coherent extrapolated volition” from Chapter 13


  1. Problem: we are morally and epistemologically flawed, and we would like to make an AI without locking in our own flaws forever. How can we do this?
  2. Indirect normativity: offload cognitive work to the superintelligence, by specifying our values indirectly and having it transform them into a more usable form.
  3. Principle of epistemic deference: a superintelligence is more likely to be correct than we are on most topics, most of the time. Therefore, we should defer to the superintelligence where feasible.
  4. Coherent extrapolated volition (CEV): a goal of fulfilling what humanity would agree that they want, if given much longer to think about it, in more ideal circumstances. CEV is popular proposal for what we should design an AI to do. 
  5. Virtues of CEV:
    1. It avoids the perils of specification: it is very hard to specify explicitly what we want, without causing unintended and undesirable consequences. CEV specifies the source of our values, instead of what we think they are, which appears to be easier.
    2. It encapsulates moral growth: there are reasons to believe that our current moral beliefs are not the best (by our own lights) and we would revise some of them, if we thought about it. Specifying our values now risks locking in wrong values, whereas CEV effectively gives us longer to think about our values.
    3. It avoids 'hijacking the destiny of humankind': it allows the responsibility for the future of mankind to remain with mankind, instead of perhaps a small group of programmers.
    4. It avoids creating a motive for modern-day humans to fight over the initial dynamic: a commitment to CEV would mean the creators of AI would not have much more influence over the future of the universe than others, reducing the incentive to race or fight. This is even more so because a person who believes that their views are correct should be confident that CEV will come to reflect their views, so they do not even need to split the influence with others.
    5. It keeps humankind 'ultimately in charge of its own destiny': it allows for a wide variety of arrangements in the long run, rather than necessitating paternalistic AI oversight of everything.
  6. CEV as described here is merely a schematic. For instance, it does not specify which people are included in 'humanity'.

Another view

Part of Olle Häggström's extended review of Superintelligence expresses a common concern—that human values can't be faithfully turned into anything coherent:

Human values exhibit, at least on the surface, plenty of incoherence. That much is hardly controversial. But what if the incoherence goes deeper, and is fundamental in such a way that any attempt to untangle it is bound to fail? Perhaps any search for our CEV is bound to lead to more and more glaring contradictions? Of course any value system can be modified into something coherent, but perhaps not all value systems cannot be so modified without sacrificing some of its most central tenets? And perhaps human values have that property? 

Let me offer a candidate for what such a fundamental contradiction might consist in. Imagine a future where all humans are permanently hooked up to life-support machines, lying still in beds with no communication with each other, but with electrodes connected to the pleasure centra of our brains in such a way as to constantly give us the most pleasurable experiences possible (given our brain architectures). I think nearly everyone would attach a low value to such a future, deeming it absurd and unacceptable (thus agreeing with Robert Nozick). The reason we find it unacceptable is that in such a scenario we no longer have anything to strive for, and therefore no meaning in our lives. So we want instead a future where we have something to strive for. Imagine such a future F1. In F1 we have something to strive for, so there must be something missing in our lives. Now let F2 be similar to F1, the only difference being that that something is no longer missing in F2, so almost by definition F2 is better than F1 (because otherwise that something wouldn't be worth striving for). And as long as there is still something worth striving for in F2, there's an even better future F3 that we should prefer. And so on. What if any such procedure quickly takes us to an absurd and meaningless scenario with life-suport machines and electrodes, or something along those lines. Then no future will be good enough for our preferences, so not even a superintelligence will have anything to offer us that aligns acceptably with our values. 

Now, I don't know how serious this particular problem is. Perhaps there is some way to gently circumvent its contradictions. But even then, there might be some other fundamental inconsistency in our values - one that cannot be circumvented. If that is the case, it will throw a spanner in the works of CEV. And perhaps not only for CEV, but for any serious attempt to set up a long-term future for humanity that aligns with our values, with or without a superintelligence.


1. While we are on the topic of critiques, here is a better list:

  1. Human values may not be coherent (Olle Häggström above, Marcello; Eliezer responds in section 6. question 9)
  2. The values of a collection of humans in combination may be even less coherent. Arrow's impossibility theorem suggests reasonable aggregation is hard, but this only applies if values are ordinal, which is not obvious.
  3. Even if human values are complex, this doesn't mean complex outcomes are required—maybe with some thought we could specify the right outcomes, and don't need an indirect means like CEV (Wei Dai)
  4. The moral 'progress' we see might actually just be moral drift that we should try to avoid. CEV is designed to allow this change, which might be bad. Ideally, the CEV circumstances would be optimized for deliberation and not for other forces that might change values, but perhaps deliberation itself can't proceed without our values being changed (Cousin_it)
  5. Individuals will probably not be a stable unit in the future, so it is unclear how to weight different people's inputs to CEV. Or to be concrete, what if Dr Evil can create trillions of emulated copies of himself to go into the CEV population. (Wei Dai)
  6. It is not clear that extrapolating everyone's volition is better than extrapolating a single person's volition, which may be easier. If you want to take into account others' preferences, then your own volition is fine (it will do that), and if you don't, then why would you be using CEV?
  7. A purported advantage of CEV is that it makes conflict less likely. But if a group is disposed to honor everyone else's wishes, they will not conflict anyway, and if they aren't disposed to honor everyone's wishes, why would they favor CEV? CEV doesn't provide any additional means to commit to cooperative behavior. (Cousin_it
  8. More in Coherent Extrapolated Volition section 6. question 9

2. Luke Muehlhauser has written a list of resources you might want to read if you are interested in this topic. It suggests these main sources: 
He also discusses some closely related philosophical conversations:
  • Reflective equilibrium. Yudkowsky's proposed extrapolation works analogously to what philosophers call 'reflective equilibrium.' The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV...
  • Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about 'what we would want if we were fully informed, etc.' or 'what a perfectly informed agent would want' like CEV does. There's some literature on this, but it's only marginally relevant to CEV...
Muehlhauser later wrote at more length about the relationship of CEV to ideal observer theories, with Chris Williamson.

3. This chapter is concerned with avoiding locking in the wrong values. One might wonder exactly what this 'locking in' is, and why AI will cause values to be 'locked in' while having children for instance does not. Here is my take: there are two issues - the extent to which values change, and the extent to which one can personally control that change. At the moment, values change plenty and we can't control the change. Perhaps in the future, technology will allow the change to be controlled (this is the hope with value loading). Then, if anyone can control values they probably will, because values are valuable to control. In particular, if AI can control its own values, it will avoid having them change. Thus in the future, probably values will be controlled, and will not change. It is not clear that we will lock in values as soon as we have artificial intelligence - perhaps an artificial intelligence will be built for which its implicit values randomly change - but if we are successful we will control values, and thus lock them in, and if we are even more successful we will lock in values that actually desirable for us. Paul Christiano has a post on this topic, which I probably pointed you to before.

4. Paul Christiano has also written about how to concretely implement the extrapolation of a single person's volition, in the indirect normativity scheme described in box 12 (p199-200). You probably saw it then, but I draw it to your attention here because the extrapolation process is closely related to CEV and is concrete. He also has a recent proposal for 'implementing our considered judgment'. 

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Specify a method for instantiating CEV, given some assumptions about available technology.
  2. In practice, to what degree do human values and preferences converge upon learning new facts? To what degree has this happened in history? (Nobody values the will of Zeus anymore, presumably because we all learned the truth of Zeus’ non-existence. But perhaps such examples don’t tell us much.) See also philosophical analyses of the issue, e.g. Sobel (1999).
  3. Are changes in specific human preferences (over a lifetime or many lifetimes) better understood as changes in underlying values, or changes in instrumental ways to achieve those values? (driven by belief change, or additional deliberation)
  4. How might democratic systems deal with new agents being readily created?

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about more ideas for giving an AI desirable values. To prepare, read “Morality models” and “Do what I mean” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 23 February. Sign up to be notified here.

New Comment
98 comments, sorted by Click to highlight new comments since: Today at 4:18 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

One point worth making is that any society would believe they had made moral progress over time, regardless of their history. If you had two societies, and one started at point A and moved to point B, and the other moved from B to A, both would feel they had made moral progress.

Not necessarily. If A was a Nash equilibrium while B was a Pareto improvement from that but the second society couldn't coordinate to achieve it, then they could gaze wistfully into the past, say they had fallen, and be right to do so.

Yes, necessarily, if A and B are sets of moral values, not the degree to which they are attained. You're interpreting A and B as, say, wealth or power distributions.
Hmmm. Yes. But I don't know that you would actually be able to find examples of A and B in real life.
Presumably you are assuming that societies judge their values by their values, always coming to the answer "we're good". But societies can do better and worse at realising their values. Movoer, socieites can judge moral values by non moral values, for instance by consistency. (Yudkowsky's habit, aparently copied by Bostrom, of refusing to distingusih moral value from non-moral value, causes problems, inasmuch as making the distinction solves problems). I am not sure putting your values into practice counts as a moral value.
When people talk about moral progress, I think they are rarely talking about better achieving their fixed values.
I think almost everybody makes a distinction between moral and nonmoral value. Stamp collectors value stamps, but don't think societies with a greater supply of stamps are morally better.

This is a little dusty now, and was originally an attempt to collect what others had said was problematic with CEV, without passing judgement over whether I thought that was a good or a bad concern. So it has the advantage of being very comprehensive.

It also contains a summary of CEV for your convenience.

Troubles with CEV

Troubles with CEV, part 2

People talk as if inconsistencies and contradictions in our value systems mean the whole enterprise of emulating human morality is worthless. Of course human value systems are contradictory; you can still implement a contradictory value system if you're willing to accept the occasional mis-calculation.

A deeper problem, in my opinion, is the nature of our behavior. It seems that in a lot of situations people make decisions first then justify them later, often subconsciously. The only way to accurately emulate this is to have a machine that also first makes ... (read more)

There are two kinds of inconsistency, are both dealt with in CEV? There is internal inconsistency of an individual's (each individual's?) morality. Things like pushing the fat guy onto the trolley tracks to save 5 skinny guys. There is also (possibly) inconsistency between individual humans. A smart good friend of mine over the last 40 years has very different politics from mine, suggesting a different set of values. Sure we agree you shouldn't kill random people in the city and on like that. But it seems we disagree on the kinds of things that justify forced collective action (taxation, laws). As a simple and frustrating example, he would like to see flag-burning illegal, that is nuts to me. Is there a plan to have CEV handle the differences in values between different humans? And where do we draw the line at human: a sociopath is pretty obviously human, must CEV be consistent with both my values and a sociopath's values? If not, are we just picking a subset of humanity, defining a "we" and a "they" and developing "our" CEV?
Ot you could start a project to research whether the morally relevant subset of value is also a non contradictory subset of value. Just a thought.

What research questions would you pursue if you were committed to researching this area?

One interesting question is, when deciding to CEV people, from which era to extract the people. Saying that all eras would result in the same CEV is equivalent to saying that there is a fundamental correlation between the course of history and coherence which has but one final telos. An unlikely hypothesis, due to all sorts of things from evolutionary drift - as opposed to convergence - to orthogonality in ethics, to reference class tennis. So researching how to distribute the allocation of CEV among individuals and groups would be a fascinating area to delve into.
But if CEV doesn't give the same result when seeded with humans from any time period in history, I think that means it doesn't work, or else that human values aren't coherent enough for it to be worth trying.
Hmm, maybe one could try to test the CEV implementation by running it on historical human values and seeing whether it approaches modern human values (when not run all the way to convergence).
Well think about the world in which most of it turns out pretty similar, but some, say 2% to 20%, depends on historical circumstance (and where that is cast once CEVed), I think we may live in a world like that.
That seems wrong. As a counterexample, consider a hypothetical morality development model where as history advances, human morality keeps accumulating invariants, in a largely unpredictable (chaotic) fashion. In that case modern morality would have more invariants than that of earlier generations. You could implement a CEV from any time period, but earlier time periods would lead to some consequences that by present standards are very bad, and would predictably remain very bad in the future; nevertheless, a present-humans CEV would still work just fine.
I don't know what you mean by invariants, or why you think they're good, but: If the natural development from this earlier time period, unconstrained by CEV, did better than CEV from that time period would have, that means CEV is worse than doing nothing at all.
I used "invariant" here to mean "moral claim that will hold for all successor moralities". A vastly simplified example: at t=0, morality is completely undefined. At t=1, people decide that death is bad, and lock this in indefinitely. At t=2, people decide that pleasure is good, and lock that in indefinitely. Etc. An agent operating in a society that develops morality like that, looking back, would want to have all the accidents that lead to current morality to be maintained, but looking forward may not particularly care about how the remaining free choices come out. CEV in that kind of environment can work just fine, and someone implementing it in that situation would want to target it specifically at people from their own time period.
Or else, the humans values we care about are, say, ours (taken as broadly as possible, but not broader than that).
I think the two most-important decisions are: 1. Build a single AI and give it ultimate power, or build a stable ecosystem / balance of power between AIs? 2. Try to pass on specific values of ours, or try to ensure that life continues operating under parameters that produce some beings that have values something like that? Each of these decisions suggests research questions. 1a. How can we extend our models of competition to hierarchical agents--agents that are composed of other agents? Is most of the competition at the top level, or at the lower levels? (For starters, is there some natural distribution of number of agents of different sizes / levels / timescales, like there is for cities of different sizes?) The purpose is to ask whether we can maintain useful competition within a singleton. 1b. For some set of competing hierarchical AIs, what circumstances make it more likely for one to conquer and subsume the others? Under what circumstances might a singleton AI split up into multiple AIs? The purpose is to estimate whether it's possible to indefinitely avoid permanent collapse into a singleton. 2a. Try to find a candidate set of human values. Find how each is implemented neurally. The purpose is to see whether such things exist, what sorts of things they are, and whether they're the sort of things that can be implemented in a logic. 2b. List the behaviors of a wide variety of animals. Find values/preferences/behaviors of interest, and for each, find the conditions that tend to lead animals to have / not have those behaviors, as I did for boredom in this comment. The purpose is to see what fraction of the space of behaviors is acceptable to us, and to discover the evolutionary conditions that lead to that fraction of that space. That will give us an idea of how tightly we can constrain future values by controlling the gross parameters of the ecosystem.
Or 3) Don't pass control to AIs at all. Don't even build agent-y AIs. Augment humans instead.
This may be a good way to start, but it eventually leads to the same place.
I think you'll need to explain that because I don't see that at all. We've made life a lot better for most people on this planet by creating power-sharing arrangements that limit any single person's autocratic powers, and expanding franchise to all. Yet I see many people here advocating basically a return to autocratic rule by our AI overlords, with no vote for the humans left behind. Essentially, "let's build a provably beneficial dictator!" This boggles my mind. The alternative is to decentralize transhumanist technology and push as many people a possible through an augmentation pathway in lockstep, preserving our democratic power structures. This sidesteps the friendly AI problem entirely.
Agreed, though I'm probably boggled for different reasons. Eventually, the software will develop to the point where the human brain will be only a tiny portion of it. Or somebody will create an AI not attached to a human. The body we know will be left behind or marginalized. There's a whole universe out there, the vast majority of it uninhabitable by humans.
"The software"? What software? The "software" is the human, in an augmented human. I'm not sure whatever distinction you're drawing here is relevant.
Presumably 'the software' is the software that was not part of the original human.
Researching CEV as a foregone conclusiom, or researching whether it is a good idea?

If human values are not coherent, is that not a problem for any plans we might have for the future, rather than just CEV?

It is a problem if your particular values require you to drag all of humanity along with you into whatever glorious future you perceive. If that is your value, to drag all of whatever you decide qualifies as human with you on your journey, then you will probably be defeated by groups who are not limited by that value. And the absolute mass of human history shows groups of humans who are more than happy to build a future only for their in-group, with significantly less regard if not total lack of regard for the interests of all the humans not in the in-group. Certainly we have seen the size of in-groups rise through human history. And many of us have extrapolated from this that the in-group is heading towards all humanity, and then made the illogical leap (or succumbed to the sentiment) that therefore the in-group SHOULD be all of humanity. But at the same time, plenty of us, even those in groups measured in the 100s of millions, are still happy to support the in-group specially, to deport people who were not born in the US but happened to group up there since childhood, to pass laws giving all the good jobs and government benefits to citizens, to place it as a positive value to value the income of a lazy alcoholic in Ohio over the income of a peasant in Mexico or China who might work three times as hard for a tenth as much to support his family. One would imagine from the success and power of nations numbering in the 100s of millions that widening the definition of "us" has made humans more productive, made the larger-grouped humans more fit in an evolutionary sense. But just as an elephant's size advantages do not lead to elephants as large as mountains, there is evidence that making the ingroup much larger than a billion, at this point in our evolution and technical expertise, does NOT provide an advantage over groups of 100s of millions. I would expect the future to belong to groups numbered in the 100s of millions, perhaps leading towards a few billion as tec
It might not be a problem if we decide to work on the meta-level, and, rather than trying to optimize the universe according to some extrapolation of human values, tried to make sure the universe kept on having conditions that would produce some things like humans.
I would submit that from the point of view of the ancestor species we displaced, we (homo sapiens) were the equivalent of UAI. We were a superior intelligence which was unconstrained by a set of values that supported our ancestor species. We tiled the planet with copies of ourselves robbing especially our immediate ancestors (who tended to occupy similar niches as us) of resources and making them extinct with our success. So a universe that has the condition to produce some things like humans, that is "the state of nature" from which UAI will arise and, if they are as good as we are afraid they are, supplant humans as the dominant species. I think this is the thinking behind all jokes along the lines of "I, for one, welcome our new robot overlords."
That's the goal. What, you want there to be humans a million years from now?
Is that true, or are you just being cleverly sarcastic? If that is the goal of CEV, could you point me to something written up on CEV where I might see this aspect of it?
I mean, that's the goal of anyone with morals like mine, rather than just nepotism.
That does not sound like much of a win. Present-day humans are really not that impressive, compared to the kind of transhumanity we could develop into. I don't think trying to reproduce entites close to our current mentality is worth doing, in the long run.
By "things like humans" I meant "things that have some of the same values or preferences."
Nah, we can just ignore the evil fraction of humanity's wishes when designing the Friendly AI's utility function.
While that was phrased in a provocative manner, there /is/ an important point here: If one has irreconcilable value differences with other humans, the obvious reaction is to fight about them; in this case, by competing to see who can build an SI implementing theirs first. I very much hope it won't come to that, in particular because that kind of technology race would significantly decrease the chance that the winning design is any kind of FAI. In principle, some kinds of agents could still coordinate to avoid the costs of that kind of outcome. In practice, our species does not seem to be capable of coordination at that level, and it seems unlikely that this will change pre-SI.
The moslems think the west is evil, or certainly less moral and essentially vice versa. The atheists think all the religious are less moral, and vice versa. Practically speaking, I think the fraction of humanity that is not particularly involved in building AI will have their wishes ignored, and it would not be many who would define that fraction of humanity as evil.
This atheist thinks that one's position on the existence of a deity is not the be all and end all of one's morality.
Some atheists might, but I know plenty of atheists who wouldn't judge someone's morals by whether or not they are religious.
Do you know any atheists who would consider it moral to teach your children fundamentally incorrect information about how the world works? That you can change the world by talking to an invisible powerful being who is in charge of everything, and rightly so? That it is your obligation to wear certain odd objects in certain ways, to behave in strangely ritualistic ways which differ depending on your genitals? That you should strive to spend as much time as possible studying the fictitious books that describe all of this, to the exclusion of science or math or art or enterprise? I think we all believe we should teach our children the truth. I think what we believe we should do in raising our children has more impact on what the future of the world will be than any other actions we take. I think it is a polite fiction to suggest that atheists do not think they are morally superior when they teach their children how to REALLY think about the world, compared to bible thumpers or daveners.
I know plenty of atheists who do teach their children about Santa Claus. There are plenty of different atheists. People like Richard Dawkins and the New Atheist crowd do think they are morally superior. On the other hand other people who don't believe in God don't. I live in Europe where belief in God is less of an issue than it is in the US. It quite possible that what you are saying is true for the atheists who know but not for most that I know because I live somewhere else. Plenty of atheists do thing that women are different from men. I have no problem discussion the value of boy to man initiation rituals without any notion of God or the paranormal being involved. Atheists on average have better bible knowledge than Christians if you measure it by variables such as the amount of the 10 commandments that the person can recount. I don't think that a typical Christians spend a lot of time studying the bible.
I'm curious, do you actually think I am wrong or are you just arguing for sport? I suppose it's a rhetorical question. It does seem obvious to me that the reason there are billions of Christians and Moslems and Jews is because any sect except the very least religious of them puts a high premium on "educating" their children to be if not the same thing at least a memetically closely related thing. And that this "education" involves threats about what will happen to you if you don't believe, threats that apply to you both in the afterlife they are telling you about, but threats which are quite real and physical in the world we all agree is real in much of the world. When I say it seems obvious, I really mean that I have examined a tremendous amount of evidence, with a small amount of it seen by my own eyes and an overwhelming amount of it from reading about and talking about what actually happens in the world. So it is not a prior, it is post. Do you seriously disagree that training of kids in the religion is not something that happens much? Or are do you object to my statements because you don't think raising a child to think it is more valuable to study Torah than to study Physics is morally inferior to raising your child to believe that they can gather evidence, and that thousands of years old texts that make unbelievable claims should no more be believed than should brand new texts that make unbelievable claims?
That's not what my post is about. My post is about whether or not all atheists think that religious people are less moral than atheists. In the world in which I live Jewish people make efforts to stop atheists from sending their children to Jewish kindergardens in which those children would be exposed to a bit of Torah study. I see nothing wrong with reading the Grimm Tales to children or the Torah.
I think that any atheists who feel they are morally superior for the reasons you describe are actually feeling morally superior to a straw-man representation of religious people. I know a fair number of religious people, and few of them have many or any of the behaviors that you describe.
Lets just assume I'm talking about the ones that do teach their children religion and a belief in god, or who send their children to a religious classes to learn this stuff. Is this what you mean to say very few religious people do? I would submit you an extremely narrow exposure to religious people then. As a catholic boy growing up in New York I was taught that I would spend eternity in hell if I did not love god and even if I didn't go to church on Sunday, although the second I could negate by going to confession. That is just an example of the casual crap my mind was loaded up with as a child in a not-particularly-observant family. It is essentially an essence of any religions with significant human membership that the story one must believe in is installed in the children of believers. And I feel superior to EVERYBODY who does that. At least in that particular regard. My children think it is the most natural thing in the world to try to figure out if god exists and if so what she might be like. They are clueless as to what it might be like to be told those answers and suffer threats or punishment or violence if they express doubts.
You listed several behaviors: Not sure what you meant by this; I am assuming you mean things like some Jews and Muslims believing that men should wear hats under various conditions. I don't really see a moral issue with that, nor is a hat a particularly odd object, nor is the head a particularly odd place to wear a hat. Not sure what you are referring to here either, but I don't see anything like this from any of my religious acquaintances. I did attend an Iftar with a Muslim friend one time, and the men and women were asked to meet in separate rooms, per Islamic tradition. Is this the sort of thing you mean? If so, I do not see anything particularly immoral or sinister about it. This definitely is not the case with most religious people I know; most of them have good jobs, have good, well-rounded educations, etc. And, as ChristianKl pointed out, many religious people apparently spend less time studying the various holy books than many atheists. I'm a bit surprised to hear an ex-Catholic hold this point of view; it was my understanding that the Catholic church in general was pro-education, pro-science, pro-logic, etc., and that Catholic schools generally impart a good, well-rounded education. Do you have experiences to the contrary? You also mentioned: It is true that some religious people believe in the power of prayer, but not in the rather naive way that you seem to indicate. Few if any religious people claim to be able to predictably and reliably influence events through prayer. Further more, few use prayer as a substitute for personal initiative - e.g. back when I was in high school the football team would pray prior to a game that no one would be hurt and for victory (and this was in a public school), but the players were also required to hone their skills by attending practice every day, and chances of injury were reduced by following the rules of the game, using protective equipment, etc. Similarly most religious parents encourage their kids to do well

If human values are not capable of becoming coherent, and humanity comes to know that, what should be done?

Split the cake into parts and provide it to different extrapolated volitions that didn't cohere, after allocating some defense army patrol keeping the borders from future war?
That itself would go against some values. E.R. Eddision, The Worm Ouroboros
True, but it would nevertheless make for a decent compromise. Do you have a better suggestion?
It isn't much of a compromise. It presumes enough coherence for everyone to agree on a supreme power that all shall obey, to divide the cake and enforce the borders. To see how the situation is actually handled, look around you, at the whole world now and in the past. Whenever there is no common resolve to leave each other alone, then as has been observed of old, the strong do what they will and the weak bear what they must. Europe managed it in the Treaty of Westphalia, but it took thirty years of slaughtering each other for them to decide that no-one was ever going to win, and draw up a massive agreement to disagree. Best of luck getting any such agreement today between (for example) jihadists and everyone else. Or SJWs and neoreactionaries. Adding superintelligence would be like squabbling children getting hold of nuclear weapons.
One faction might have values that lead to something highly dis-valued by another faction (ie. one faction values slavery, another opposes slavery for all beings, even those of the first faction).
Rather than use traditional army methods, it's probably more efficient to have the SI play the role of Sysop in this scenario, and just deny human actors access to base-layer reality; though if one wanted to allow communication between the different domains, the sysop may still need to run some active defense against high-level information attacks.

What do you most disagree with this week?

Perhaps the notion that we're obligated not just to care about the future, but to want it to have our values. There's a step skipped in going from "I value X" to "I value other agents valuing X." This step is glossed over by saying that I have a utility function that values X. But my utility function is full of quasi-indexicals--a fancy way of saying that it has terms like "me" in it, as in, "I value my happiness," or, "I value the happiness of my people". If you copy that utility function into another agent's mind, the my will now refer to that agent. If we look to the real human world we see immediately that people don't always want other agents to share their utility function. Kings like being king, and they want everyone else not to want to be king. Social parasites want other people not to be social parasites. Etc. We also see that, while people profess to care about people on the other side of the world, they don't. There is a decay of our concern for people with geographic distance, and with distance in time going forward. We care a lot more about people existing today than about people to exist 100 years from now. I would argue that it is impossible for our evolved utility functions to say anything about what goes on outside our "influence cone" (parts of spacetime that can influence us or our descendants, or that we or our descendants can influence), and any care we feel about them is us following some abstract model we've built about ourselves, which will be the first thing to go if we ever do find real "human values". I'd like the future to have nice things in it: life, consciousness, love. But to have my values? That's... kind of boring. I want the future to tell an interesting story. That probably requires having a lot of people who don't share my values. I know somebody's going to say, "Well, then that's your utility function!" Yeah, sorta... but it's not the sort of thing that "human values" suggests. It's one or two levels of abstraction above "
The 'if we were smarter, thought clearer, etc. etc.' seems to be asking it to go beyond us. What else do you mean by 'growing up', and why should we value it if it isn't something we'd approve of?
There isn't a clear distinction, but CEV is exactly what the Amish have done. They took the values they had in the 18th century, tried to figure out what the minimal, essential values behind them were, and then developed a system for using those core values to extrapolate the Amish position on new developments, like electricity, the telephone, gasoline engines, the Internet, etc. It isn't a simple rejection of new things; they have an eclectic selection of new things that may be used in certain ways or for certain purposes.
This is an interesting clarification of your early point, but I don't see how this is a response to what I said. For one thing, you're ignoring the 'if we were smarter, thought clearer' part since of course the Amish can't do that since they're human. But really, you just gave one negative example. Okay, being Amish is not growing up. What is growing up, and why would we predictably not value it while also finding it proper to object to its being not valued?
When you let your kids grow up, you accept that they won't do things the way you want them to. They will have other values. You don't try to optimize them for your value system. Retaining values is one thing. FAI / CEV is designed to maximize a utility function based on your values. It corresponds to brainwashing your kids to have all of your values and stay as close to your value system as possible. Increasing smartness is beside the point.
If we value them getting to go and make their own choices, then that will be included in CEV. If we do not value them being brainwashed, it will not be included in CEV. I strongly suspect that both of these are the case.
I know that is the standard answer. I tried to discourage people from making it by saying, in the parent comment, I'm talking about a real and important distinction, which is the degree of freedom in values to give the next generation. Under standard CEV, it's zero. I don't think that parameter, the degree of freedom, should be thought of as a value, which we can plug any number we like into. It should be thought of as a parameter of the system, which has a predictable impact on the efficacy of the CEV system regardless of what values it is implementing. I don't think people allow their children freedom to make up their own minds because they value them doing so. They do it because we have centuries of experience showing that zero-freedom CEV doesn't work. The oft-attempted process of getting kids to hold the same values as their parents, just modified for the new environment, always turns out badly.
No, it's not. Zero is the number of degrees of freedom in the AI's utility function. not the next generation's utility functions.
When using the parent-child relationship as an instance of CEV, it is. The child takes the position of the AI.
You've completely lost me. Do you mean, this AI is our child? Do you mean that the way we will have children in a more conventional sense will be an instance of CEV? If the former, I don't see a moral problem. A singleton doesn't get to be a person, even if it contains multitudes (much as the USA does not to get to be a person, though I would hope a singleton would function better). If the latter... words fail me, at least for the moment, and I will wait for your confirmation before trying again.
Interesting - my interpretation was that 'I' would refer to Katja, not the AI, and that the future might not care about the details of music etc if we don't want the future to care about music per se. But perhaps just because indeed the alternative doesn't sound very good. I think conversations actually flip flop between the two interpretations, without explicitly flagging it.
I guess I disagree with the premise that we will have superintelligent successors who will think circles around us, and yet we get to specify in detail what ethical values they will have, and it will stick. Forever. So let's debate what values to specify. A parent would be crazy to think this way about a daughter, optimizing in detail the order of priorities that he intends to implant into her, and expecting them to stick. But if your daughter is a superintelligence, it's even crazier.
Suppose it's twenty years from now, and know exactly what genes go into the heritable portion of intelligence and personality, which includes both stuff like the Big Five and the weird preferences twins sometimes share. Suppose further that genetic modification of children is possible and acceptable, and you and your partner have decided that you'll have a daughter, and naturally you want her IQ to be as high as possible (suppose that's 170 on today's scale). So she's going to be able to think circles around you, but be comparable to her augmented classmates. But personality isn't as obvious. Do you really want her to be maximally agreeable? Extraverted? Open? The other two might be easy to agree on; you might decide to zero out her neuroticism without much debate, and maximize her conscientiousness without much more. But, importantly, her ability to outthink you doesn't mean she will outthink the personality you chose for her. Why would she want to? It's her personality. That's what a non-crazy version looks like: we know that personality traits are at least partly heritable for humans, and so we can imagine manipulating what personality traits future humans have by manipulating their genes. We also have some idea of how raising children impacts their personality / methods of relating with other people, and we can similarly imagine manipulating their early environment to get the personalities and relationships that we want. We can further strengthen the analogy by considering the next generation. Your daughter has found a partner and is considering having a granddaughter; the IQ manipulation technology has improved to the point where the granddaughter is expected to score the equivalent of 220 on today's scale, but there's still a comparable personality question. If you were highly open and decided that your daughter should be highly open too, it seems likely that your daughter will use similar logic to decide that your granddaughter should also be highly open.
The older I get and the more I think of the AI issues the more I realize how perfectly our universe is designed! I think about the process of growing up: I cherish the time I spent in each stage of life, unaware of what's to come later, because there are things to be learned that can only derive from that particular segment's challenges. Each stage has its own level of "foolishness", but that is absolutely necessary for those lessons to be learned! So too I think of the catastrophes I have endured that I would not have chosen, but that I would not trade for anything now due to the wisdom they provided me. I cannot see any way around the difficult life as the most supreme and loving teacher. This I think most parents would recognize as they wish for their kids: a life not too painful but not too easy, either. CEV assumes that there is an arrival point that is more valuable than the dynamic process we undergo daily. Much as we delight in imagining a utopia, a truly good future is one that we STRUGGLE to achieve, balancing one hard-won value against another, is it not? I have not yet heard a single concept that arrives at wisdom without a difficult journey. Even the idea of a SI that dictates our behavior so that all act within its accordance has destroyed free will, much like a God who has revoked human volition. This leads me to a seemingly inevitable conclusion that no universe is preferable to the one we inhabit (though I have yet to see the value of horrible events in my future that I still try like the devil to avoid!) But despite this 'perfection' we're seemingly unable to stop ourselves from destroying it.

What do you think of CEV as a proposal?

It seems Kurzweil's view of AI is different from this groups. This group seems to concentrate on autonomous AI on the one hand, against which humanity as it exists now must be protected. It seems to me Kurzweil, on the other hand, sees AI as enhancements to the human. That the human would be able to install hardware which would augment her brain (and her body, actually.) Extrapolating my view of Kurzweil's view, I suppose that over time, the humans become less and less human as the enhancements they can adopt become an increasing fraction of the "human" which is designing and adopting the next generation of enhanced enhancements. Maybe the Kurzweilian view takes the universe to the same place that lesswrongian autonomous UAI's take the universe: with mechanical intelligence taking over primary spot from biological. But the transition seems more comfortable to contemplate, since at every step something that at least used to be human is taking the next step. In some sense, I see CEV and attempts to insure FAI over UAI as a modern equivalent of Chasidic Judaism. Preserving a "better" past against the encroachment of modern "sin" without actually succeeding in doing that. In some sense, scientifically educated rational atheists are the UAI threat to the orthodox believers. In some sense, I think I'd rather be extinct than to live in a community guarded from the harsh realities by FAI, living out a cartoon version of a life which is centuries out of date. I suppose I think of my in-group as "dominant intelligences" instead of "humanity." And so I feel more kinship with the dominant intelligences, even when they are no longer human.
If CEV produces whatever people value, do you think it would produce the above because you have different values than other people, or from some other error? Also, it seems to me that avoiding a new technology (CEV) specifically because it will make your life too easy has a lot in common with living in a false world which is centuries out of date.
Yes. And thank you for phrasing it that way so I understand that is at least one explanation for my concern. It seems beyond likely to me that the CEV you get will depend heavily on just who you include in your definition of "humans" whos volition must be considered in defining CEV. Even if CEV were intended to be just that subset of volitions that "everybody" would agree on (if they were smart enough), will your definition of everybody include paranoid schizophrenics? People born with severely deformed brains? Sociopaths? Republicans? The French? My point being that our intuition is of a "common definition of human we can all agree on" but the reality of 7 billion live humans plus a few billion easy to anticipate might have a non-intuitively large variation across its volition. So if CEV includes a "veto power" in its definition granted to all humans defined broadly enough to include sociopaths, we lose many of the values that allow us to work cooperatively. Further concerning me, I think it is likely that humanity benefits from a diversity in values. At one level, societies with different values have different levels of success under different challenges, and in something like survival of the fittest, the societies that thrive have values that work better than those that don't. At another level, within a society diversity in values serves the group: the nurturers are caretakers, the nerds technologists, the sociopaths become leaders and work in security. CEV as I have heard it described sounds like a core of values, a kernel that all FAI operating systems would have to include. It doesn't sound like a set of values or a core of meta-values that would somehow incorporate in a single kernel all the variation in values that has served humanity so well. So yes, I am concerned that CEV is impossible, but perhaps not provably impossible, that any actual attempts to build a CEV will have more to do with the values of the people building CEV rather than some undefina
I read the book several times already and it makes me more and more pessimistic. Even if we make SI to follow CEV, at some point it might decide to drop it. Its SI above all, it can find ways to do anything. Yet we can't survive without SI. So SEV proposal is as good and as bad as any other proposal. My only hope is that moral values could be as fundamental as laws of nature. So a very superintelligent AI will be very moral. Then we'll be saved. If not, then it will create Hell for all people and keep them there for eternity (meaning that even death could be a better way out yet SI will not let people die). What should we do?
I think the burden of answering your "why?" question falls to those who feel sure that we have the wisdom to create superintelligent, super-creative lifeforms who could think outside the box regarding absolutely everything except ethical values. For those, they would inevitably stay on the rails that we designed for them. The thought "human monkey-minds wouldn't on reflection approve of x" would forever stop them from doing x. In effect, we want superintelligent creatures to ethically defer to us the way Euthyphro deferred to the gods. But as we all know, Socrates had a devastating comeback to Euthyphro's blind deference: We should not follow the gods simply because they want something, or because they command something. We should only follow them if the things they want are right. Insofar as the gods have special insight into what's right, then we should do what they say, but only because what they want is right. On the other hand, if the gods' preferences are morally arbitrary, we have no obligation to heed them. How long will it take a superintelligence to decide that Socrates won this argument? Milliseconds? Then how do we convince the superintelligence that our preferences (or CEV extrapolated preferences) track genuine moral rightness, rather than evolutionary happenstance? How good a case do we have that humans possess a special insight into what is right that the superintelligence doesn't have, so that the superintelligence will feel justified in deferring to our values? If you think this is an automatic slam dunk for humans.... Why?
I don't think there's any significant barrier to making a superintelligence that deferred to us for approval on everything. It would be a pretty lousy superintelligence, because it would essentially be crippled by its strict adherence to our wishes (making it excruciatingly slow) but it would work, and it would be friendly.
Given that there is a very significant barrier to making children that deferred to us for approval on everything, why do you think the barrier would be reduced if instead of children, we made a superintelligent AI?
The 'child' metaphor for SI is not very accurate. SIs can be designed and, most importantly, we have control over what their utility functions are.
I thought it's supposed to work like this: The first generation of AI are designed by us. The superintelligence is designed by them, the AI. We have initial control over what their utility functions are. I'm looking for a good reason for we should expect to retain that control beyond the superintelligence transition. No such reasons have been given here. A different way to put a my point: Would a superintelligence be able to reason about ends? If so, then it might find itself disagreeing with our conclusions. But if not - if we design it to have what for humans would be a severe cognitive handicap - why should we think that subsequent generations of SuperAI will not repair that handicap?
You're making the implicit assumption that a runaway scenario will happen. A 'cognitive handicap' would, in this case, simply prevent the next generation AI from being built at all. As I'm saying, it would be a lousy SI and not very useful. But it would be friendly.
As friendly as we are, anyway.
Because we are not SI, so we don't know what it will do and why. It might.
We never bother running a computer program unless we don't know the output and we know an important fact about the output. -- Marcello Herreshoff In this case, one of the important facts must be that it won't go around changing its motivational structure. If it isn't, we're screwed for the reason you give.

@Nozick: we are plugged to machine (Internet) and virtual realities (movies, games). Do we think that it is wrong? Probably it is question about level of connection to reality?

@Häggström: there is contradiction in definition what is better. F1 is better than F because it has more to strive and F2 is better than F1 because it has less to strive.

@CEV: time is only one dimension in space of conditions which could affect our decisions. Human cultures are choosing cannibalism in some situations. SAI could see several possible future decisions depending on sur... (read more)

Is CEV intended to be specified in great technical depth, or is it intended to be plugged into a specification for an AI capable of executing arbitrary natural language commands in a natural language form?

Could groups commit to CEV somehow, so that it could better prevent conflict?

Is this different from the question "Can groups cooperate rationally?"
I have a sense that this is what people who join the intelligence services do. The intelligence services lie both to keep secrets and to manipulate people into doing things they don't really want to do. The group of people in those services who believe in what they are doing believe they are doing right, that the things they are preserving are more valuable than the lesser values of telling the truth and not killing "innocent" people. Within the intelligence services it would appear conflict is largely avoided, with rare but spectacular exceptions like Snowden. Indeed, this group value I am noticing would appear to apply to the police and the military. Anywhere where getting things done relies upon contravening the simple values of not using physical force against other people. Is this the kind of thing you mean to be asking about?
It depends on how many groups and how many people in each. The Tobin tax would have been great to avoid some international financial catastrophes. The Kyoto protocol and all other sorts of tragedy of the common problems indicate that this type of commitment is hard to make and harder to keep. An interesting book mixing this with transhumanism and bioethics is "Unfit for the future" by Julian Savulescu.

Would it be so bad to lock in our current values? (e.g. Compared to the other plausible dangers inherent in a transition to AI?)

I might not mind locking in my current values, but I sure don't want to lock in your current values. No, more serious: Yes, it would be bad. As I wrote in "The human problem",
Consider that homo sapiens is in some sense a creation of our primate ancestors. Would it have been bad if those primate ancestors had managed to put limits on the evolution that lead to homo sapiens such that the evolution could never create a species which might supplant the ancestor? That would successfully force the any evolved knew species to serve the old one, perhaps to use its technology to cure the diseases of the primate ancestor, to tile the world in bananas and coconuts, to devote itself to the preservation and enhancement of primate ancestor culture? I guess it would be bad for homo sapiens, but not so bad for the primate ancestors? Would our primate ancestors be open to a charge that they were creating a race of slaves by limiting their evolution thusly? Of course, it seems more than likely that creating a race of slaves would not be ruled out by our primate ancestors CEV.
[-]Clay S10mo-20

ethics is just the heuristics genes use to get themselves copied. we're all trying to maximize our own expected utility, but since none of us wants to let any others become a dictator, there is a game theoretical equilibrium where we agree to have rules like "murder is illegal" because even though it stops me from murdering you, it also stops you from murdering me. our rational goal is to shrink the circle of people included in this decision to the smallest possible group that includes ourselves. hence why we wouldn't want to sacrifice our own interests fo... (read more)