I'm posting this article on behalf of Brian Tomasik, who authored it but is at present too busy to respond to comments.

Update from Brian: "As of 2013-2014, I have become more sympathetic to at least the spirit of CEV specifically and to the project of compromise among differing value systems more generally. I continue to think that pure CEV is unlikely to be implemented, though democracy and intellectual discussion can help approximate it. I also continues to feel apprehensive about the conclusions that a CEV might reach, but the best should not be the enemy of the good, and cooperation is inherently about not getting everything you want in order to avoid getting nothing at all."


I'm often asked questions like the following: If wild-animal suffering, lab universes, sentient simulations, etc. are so bad, why can't we assume that Coherent Extrapolated Volition (CEV) will figure that out and do the right thing for us?



Most of my knowledge of CEV is based on Yudkowsky's 2004 paper, which he admits is obsolete. I have not yet read most of the more recent literature on the subject.


Reason 1: CEV will (almost certainly) never happen

CEV is like a dream for a certain type of moral philosopher: Finally, the most ideal solution for discovering what we really want upon reflection!

The fact is, the real world is not decided by moral philosophers. It's decided by power politics, economics, and Darwinian selection. Moral philosophers can certainly have an impact through these channels, but they're unlikely to convince the world to rally behind CEV. Can you imagine the US military -- during its AGI development process -- deciding to adopt CEV? No way. It would adopt something that ensures the continued military and political dominance of the US, driven by mainstream American values. Same goes for China or any other country. If AGI is developed by a corporation, the values will reflect those of the corporation or the small group of developers and supervisors who hold the most power over the project. Unless that group is extremely enlightened, CEV is not what we'll get.

Anyway, this is assuming that the developers of AGI can even keep it under control. Most likely AGI will turn into a paperclipper or else evolve into some other kind of Darwinian force over which we lose control.

Objection 1: "Okay. Future military or corporate developers of AGI probably won't do CEV. But why do you think they'd care about wild-animal suffering, etc. either?"

Well, they might not, but if we make the wild-animal movement successful, then in ~50-100 years when AGI does come along, the notion of not spreading wild-animal suffering might be sufficiently mainstream that even military or corporate executives would care about it, at least to some degree.

If post-humanity does achieve astronomical power, it will only be through AGI, so there's high value for influencing the future developers of an AGI. For this reason I believe we should focus our meme-spreading on those targets. However, this doesn't mean they should be our only focus, for two reasons: (1) Future AGI developers will themselves be influenced by their friends, popular media, contemporary philosophical and cultural norms, etc., so if we can change those things, we will diffusely impact future AGI developers too. (2) We need to build our movement, and the lowest-hanging fruit for new supporters are those most interested in the cause (e.g., antispeciesists, environmental-ethics students, transhumanists). We should reach out to them to expand our base of support before going after the big targets.

Objection 2: "Fine. But just as we can advance values like preventing the spread of wild-animal suffering, couldn't we also increase the likelihood of CEV by promoting that idea?"

Sure, we could. The problem is, CEV is not an optimal thing to promote, IMHO. It's sufficiently general that lots of people would want it, so for ourselves, the higher leverage comes from advancing our particular, more idiosyncratic values. Promoting CEV is kind of like promoting democracy or free speech: It's fine to do, but if you have a particular cause that you think is more important than other people realize, it's probably going to be better to promote that specific cause than to jump on the bandwagon and do the same thing everyone else is doing, since the bandwagon's cause may not be what you yourself prefer.

Indeed, for myself, it's possible CEV could be a net bad thing, if it would reduce the likelihood of paperclipping -- a future which might (or might not) contain far less suffering than a future directed by humanity's extrapolated values.


Reason 2: CEV would lead to values we don't like

Some believe that morality is absolute, in which case a CEV's job would be to uncover what that is. This view is mistaken, for the following reasons: (1) Existence of a separate realm of reality where ethical truths reside violates Occam's razor, and (2) even if they did exist, why would we care what they were?

Yudkowsky and the LessWrong community agree that ethics is not absolute, so they have different motivations behind CEV. As far as I can gather, the following are two of them:

Motivation 1: Some believe CEV is genuinely the right thing to do

As Eliezer said in his 2004 paper (p. 29), "Implementing CEV is just my attempt not to be a jerk." Some may believe that CEV is the ideal meta-ethical way to resolve ethical disputes.

I have to differ. First, the set of minds included in CEV is totally arbitrary, and hence, so will be the output. Why include only humans? Why not animals? Why not dead humans? Why not humans that weren't born but might have been? Why not paperclip maximizers? Baby eaters? Pebble sorters? Suffering maximizers? Wherever you draw the line, there you're already inserting your values into the process.

And then once you've picked the set of minds to extrapolate, you still have astronomically many ways to do the extrapolation, each of which could give wildly different outputs. Humans have a thousand random shards of intuition about values that resulted from all kinds of little, arbitrary perturbations during evolution and environmental exposure. If the CEV algorithm happens to make some more salient than others, this will potentially change the outcome, perhaps drastically (butterfly effects).

Now, I would be in favor of a reasonable extrapolation of my own values. But humanity's values are not my values. There are people who want to spread life throughout the universe regardless of suffering, people who want to preserve nature free from human interference, people who want to create lab universes because it would be cool, people who oppose utilitronium and support retaining suffering in the world, people who want to send members of other religions to eternal torture, people who believe sinful children should burn forever in red-hot ovens, and on and on. I do not want these values to be part of the mix.

Maybe (hopefully) some of these beliefs would go away once people learned more about what these wishes really implied, but some would not. Take abortion, for example: Some non-religious people genuinely oppose it, and not for trivial, misinformed reasons. They have thought long and hard about abortion and still find it to be wrong. Others have thought long and hard and still find it to be not wrong. At some point, we have to admit that human intuitions are genuinely in conflict in an irreconcilable way. Some human intuitions are irreconcilably opposed to mine, and I don't want them in the extrapolation process.

Motivation 2: Some argue that even if CEV isn't ideal, it's the best game-theoretic approach because it amounts to cooperating on the prisoner's dilemma

I think the idea is that if you try to promote your specific values above everyone else's, then you're timelessly causing this to be the decision of other groups of people who want to push for their values instead. But if you decided to cooperate with everyone, you would timelessly influence others to do the same.

This seems worth considering, but I'm doubtful that the argument is compelling enough to take too seriously. I can almost guarantee that if I decided to start cooperating by working toward CEV, everyone else working to shape values of the future wouldn't suddenly jump on board and do the same.

Objection 1: "Suppose CEV did happen. Then spreading concern for wild animals and the like might have little value, because the CEV process would realize that you had tried to rig the system ahead of time by making more people care about the cause, and it would attempt to neutralize your efforts."

Well, first of all, CEV is (almost certainly) never going to happen, so I'm not too worried. Second of all, it's not clear to me that such a scheme would actually be put in place. If you're trying to undo pre-CEV influences that led to the distribution of opinions to that point, you're going to have a heck of a lot of undoing to do. Are you going to undo the abundance of Catholics because their religion discouraged birth control and so led to large numbers of supporters? Are you going to undo the over-representation of healthy humans because natural selection unfairly removed all those sickly ones? Are you going to undo the under-representation of dinosaurs because an arbitrary asteroid killed them off before CEV came around?

The fact is that who has power at the time of AGI will probably matter a lot. If we can improve the values of those who will have power in the future, this will in expectation lead to better outcomes -- regardless of whether the CEV fairy tale comes true.

New Comment
87 comments, sorted by Click to highlight new comments since: Today at 9:42 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Humans have a thousand random shards of intuition about values that resulted from all kinds of little, arbitrary perturbations during evolution and environmental exposure. If the CEV algorithm happens to make some more salient than others, this will potentially change the outcome, perhaps drastically (butterfly effects).

There's a brief discussion of butterfly effects as a potential pitfall for CEV in this thread.

I have two objections.

  1. I do in fact value the things I value. I'm not playing at some game.
  2. I'm an idiot, and my values are incoherent.

Therefore, CEV. Regardless of FAI developments.

I can see that this would lead you to want CEV of your (competing) values. But why would it lead you to want CEV of both your and other people's values?
Because of the incoherent bit. I think that others are doing a better job of optimizing for the world I actually want to live in than I am in many cases.
If others are optimizing for the world you want, if they are promoting your values better than you yourself are promoting them, then why do you express a wish to live in a world also optimized for their values via CEV?
I can trivially account for divergent values without CEV: holodecks/experience machines. Surely holodecks are the lower bound for any utopian vision.
Actually, this would be a strong argument against CEV. If individual humans commonly have incoherent values (which they do), there is no concrete reason to expect an automated extrapolation process to magically make them coherent. I've noticed that CEV proponents have a tendency to argue that the "thought longer, understood more" part of the process will somehow fix all objections of this sort, but given the complete lack of detail about how this process is supposed to work you might as well claim that the morality fairy is going to descend from the heavens and fix everything with a wave of her magic wand. If you honestly think you can make an AI running CEV produce a coherent result that most people will approve of, it's up to you to lay out concrete details of the algorithm that will make this happen. If you can't do that, you've just conceded that you don't actually have an answer for this problem. The burden of proof here is on the party proposing to gamble humanity's future on a single act of software engineering, and the standard of evidence must be at least as high as that of any other safety-critical engineering.
Can you point me to some serious CEV proponents who argue that most people will approve of the results? I agree with you that this seems implausible, but it has never been clear to me that anyone serious actually asserts it. FWIW, it has seemed to me from the beginning that the result of the CEV strategy would likely include at least something that makes me go "Um... really? I'm not entirely comfortable with that." More generally, it seems unlikely to me that the system which best implements my values would feel comfortable or even acceptable to me, any more than the diet that best addresses my nutritional needs will necessarily conform to my aesthetic preferences about food.
At first I thought this comparison was absolutely perfect, but I'm not really sure about that anymore. With a diet, you have other values to fall back on which might make your decision to adopt an aesthetically displeasing regimen still be something that you should do. With CEV, it's not entirely clear to me in why I would want to prefer CEV values over my own current ones, so there's no underlying reason for me to accept that I should accept CEV as the best implementation of my values. That got a little complicated, and I'm not sure it's exactly what I meant to say. Basically, I'm trying to say that while you may not be entirely comfortable with a better diet, you would still implement it for yourself since it's a rational thing to do, whereas if you aren't comfortable with implementing your own CEV, there's no rational reason to compel you to do so.
Sure. And even if I did accept CEV(humanity) as the best implementation of my values in principle, the question of what grounds I had to believe that any particular formally specified value system that was generated as output by some seed AI actually was CEV(humanity) is also worth asking. Then again, there's no underlying reason for me to accept that I should accept my current collection of habits and surface-level judgments and so forth as the best implementation of my values, either. So, OK, at some point I've got a superhuman value-independent optimizer all rarin' to go, and the only question is what formal specification of a set of values I ought to provide it with. So, what do I pick, and why do I pick it?
Isn't this begging the question? By 'my values' I'm pretty sure I literally mean 'my current collection of habits and surface-level judgements and so forth'. Could I have terminal values of which I am completely unaware in any way shape or form? How would I even recognize such things, and what reason do I have to prefer them over 'my values'. Did I just go in a circle?
Well, you tell me: if I went out right now and magically altered the world to reflect your current collection of habits and surface-level judgments, do you think you would endorse the result? I'm pretty sure I wouldn't, if the positions were reversed.
I would want you to change the world so that what I want is actualized, yes. If you wouldn't endorse an alteration of the world towards your current values, in what sense do you really 'value' said values? I'm going to need to taboo 'value', aren't I?
I don't know if you need to taboo it or not, but I'll point out that I asked you a question that didn't use that word, and you answered a question that did. So perhaps a place to start is by answering the question I asked in the terms that I asked it?

If you're trying to undo pre-CEV influences that led to the distribution of opinions to that point, you're going to have a heck of a lot of undoing to do.

This issue and related ones were raised in this post and its comments.

I don't think that people valuing eternal torture of other humans is much of a concern, because they don't value it nearly as much as the people in question disvalue being tortured. I bet there are a lot more people who care about animals' feelings and who care a lot more, than those who care about the aesthetics of brutality in nature. I think the majority of people have more instincts of concern for animals than their actions suggest, because now it is convenient to screw over animals as an externality of eating tasty food, and the animals suffering are ... (read more)

Well, at the moment, there are hundreds of environmental-preservation organizations and basically no organizations dedicated to reducing wild-animal suffering. Environmentalism as a cause is much more mainstream than animal welfare. Just like the chickens that go into people's nuggets, animals suffering in nature "are out of sight, and the connection between [preserving pristine habitats] and animals living terrible lives elsewhere is hard to visualize." It's encouraging that more LessWrongers are veg than average, although I think 12.4% is pretty typical for elite universities and the like as well. (But maybe that underscores your point.) An example post. I care a lot about suffering, a little about happiness, and none about other things. Yep!
Suppose most people agree on valuing the torture of a few people, and only a few people disagree. Would you be OK with the majority's values outweighing the minority's, if it's a large enough majority? If you're OK with that, and if this is not specific to the example of torture, then you are effectively saying that you value the extrapolated consensus values of humanity more than your own, even though you don't know what those values may be. That you value the (unspecified) CEV process, and whatever values it ends up generating, more than any other values you currently hold. Is that so? Even if you're OK with that, you'd be vulnerable to a "clone utility monster": if I can clone myself faster than average, then the values of me and my clones will come to dominate the global population. This seems true for almost any value aggregation process given a large enough majority (fast enough cloning).
No, I would not be okay with it. I don't terminally value CEV. I think it would be instrumentally valuable, because scenarios where everyone wants to torture a few people are not that likely. I would prefer that only my own extrapolated utility function controlled the universe. Unlike Eliezer Yudkowsky, I don't care that much about not being a jerk. But that is not going to happen. If this detail from the original paper still stands, the CEV is allowed to modify the extrapolating process. So if there was the threat of everyone having to race to clone themselves as much as possible for more influence, it might modify itself to give clones less weight, or prohibit cloning.
Prohibiting these things, and CEV self-modifying in general, means optimizing for certain values or a certain outcome. Where do these values come from? From the CEV's programmers. But if you let certain predetermined values override the (unknown) CEV-extrapolated values, how do you make these choices, and where do you draw the line?
I mean that the CEV extrapolated from the entire population before they start a clone race could cause that self-modification or prohibition, not something explicitly put in by the programmers.

It is quite simple to make a LessWrong account, and it would be helpful so that you can respond to comments.

If you think it might be difficult to get the sufficient karma, you can also post a comment in the open thread asking for upvotes so that you can post. It's worked nicely before :)

Brian is very busy at the moment, and said he was reluctant to post partly because this would create the reasonable expectation that he would have enough time to respond to comments. I have slightly edited the italicized notice above to clarify this point.

There seem to be two objections here. The first is that CEV does not uniquely identify a value system; starting with CEV, you don't have actual values until you've identified the set of people/nonpeople you're including, an extrapolation procedure, and a reconciliation procedure. But when this is phrased as "the set of minds included in CEV is totally arbitrary, and hence, so will be the output," an essential truth is lost: while parts of CEV are left unspecified, other parts are, and so the output is not fully arbitrary. The set of CEV-compatibl... (read more)

2Peter Wildeford11y
How could you ever guarantee that? Do you think progress toward utilitarian values increases with intelligence/wisdom?
In the context of I think it's clear that with you have decided to exclude some (potential) minds from CEV. You could just as easily have decided to include them and said "valuing choice loses to others valuing their life". But, to be clear, I don't think that even if you limit it to "existing, thinking human minds at the time of the calculation", you will get some sort of unambiguous result.
What parts are specified? If the set of people is unspecified, the extrapolation procedure is unspecified, and the reconciliation procedure is unspecified, then what is left? No. For all value systems X who are held by some people, you could always apply the CEV to a set of people who hold X. Unless the extrapolation procedure does something funny, the CEV of that set of people would be X. Unless the extrapolation and the reconciliation procedures are trivial, computing the CEV of mankind would be probably beyond the possibility of any physically plausible AGI, superintelligent or not. People here seem to assume AGI = omniscient deity, but there are no compelling technical reasons for that assumption. Most likely that's just a reflection of traditional religious beliefs.

The CEV of humanity is not likely to promote animal suffering. Most people don't value animal suffering. They value eating hamburgers, and aren't particularly bothered by the far away animal suffering that makes it possible for them to eat hamburgers. An FAI can give us hamburgers without causing any animal suffering.

Future humans may not care enough about animal suffering relative to other things, or may not regard suffering as being as bad as I do. As noted in the post, there are people who want to spread biological life as much as possible throughout the galaxy. Deep ecologists may actively want to preserve wild-animal suffering (Ned Hettinger: "Respecting nature means respecting the ways in which nature trades values, and such respect includes painful killings for the purpose of life support.") Future humans might run ancestor sims that happen to include astronomical numbers of sentient insects, most of which die (possibly painfully) shortly after birth. In general, humans have motivations to simulate minds similar to theirs, which means potentially a lot more suffering along for the ride.

How many people? How much of this is based on confusion and not actually confronting the scale of suffering involved? (Note that CEV is supposed to account for this, giving us not what we say we want, but what we would want if we were smarter and knew more.) I am not convinced that insects are sentient (though an FAI that understands how sentience works could tell me I'm wrong and I'd believe it). If insects do turn out to be sentient, it would not be hard (and would actually take fewer computational resources) to replace the insect's sentience with an abstract model of its behavior. Sure, if we are stupid about it, but we are already working on how not to be stupid about it. And seriously, a successful singularity should give us far more interesting things to do than running such simulations (or eating hamburgers).
CEV proponents can always avoid an unpalatable objection to CEV by saying that, "if we knew more and were smarter", we wouldn't act in such objectionable ways. But a reason has to be provided for thinking that greater intelligence or better knowledge would in fact cause us to act differently. In these discussions, I see a lot of handwaving, but not many actual reasons.
Yeah, it works well against naive objections that some humans behave badly and they have influence on the CEV. Note that I referred to "if we knew more and were smarter" after asking if proponents of preserving wild animal suffering have actually confronted the scale of suffering involved.
Thanks, JGWeissman. There are certainly some deep ecologists, like presumably Hettinger himself, who have thought long and hard about the scale of wild-animal suffering and still support preservation of ecology as is. When I talk with ecologists or environmentalists, almost always their reply is something like, "Yes, there's a lot of suffering, but it's okay because it's natural for them." One example: You can see many more examples here. A growing number of people have been convinced that wild-animal suffering should be reduced where feasible, but I think this is still a minority view. If more people thought about it harder, probably there would be more support, but ecological preservation is also a very strong intuition for some people. It's easy not to realize this when we're in our own bubbles of utilitarian-minded rationalists. :) Spreading life far and wide is less widespread as a value, but it's popular enough that the Panspermia Society is one of a few groups that feels this way. I also have a very smart friend who happens to share this goal, even though he acknowledges this would create a lot of suffering. As far as insects, it's not obvious that post-humans would care enough to undertake the approximation of their brains that you mention, because maybe it would make the simulation more complicated (=> expensive) or reduce its fidelity. There's an analogy with factory farming today: Sure, we could prevent animal suffering, but it's more costly. Still, yes, we can hope that post-humans would give enough weight to insect suffering to avoid this. And I agree insects may very well not be sentient, though if they are, the numbers of suffering minds would be astronomical. The work on nonperson predicates and computational hazards is great -- I'm glad you guys are doing that!
The argument seems to be less that the suffering is OK because it is natural than any intervention we can make to remove it would cause nature to not work, as in removing predator species results in more herbivores, which leads to vegetation being over consumed, which leads to ecological collapse. I am sympathetic to this argument. On a large enough scale, this means no breathable atmosphere. So while I think that wild animal suffering is a bad thing, I will accept it for now as a cost of supporting human life. (Maybe you could remove all animals not actually symbiotic with plants, but this seems like a hell of a gamble, we would likely regret the unintended consequences, and it could be difficult to undo.) Once humanity can upload and live in simulations, we have more options. Do you think the typical person advocating ecological balance has evaluated how the tradeoffs would change given future technology? CEV is supposed to figure out what people would want if they were more rational. If rationalists tend to discard that intuition, it is not likely to have a strong effect on CEV. (Though if people without such strong intuitions are likely to become more rational, this would not be strong evidence. It may be useful to try raising the sanity waterline among people who demonstrate the intuition and see what happens.) I am completely against giving up the awesomeness of a good singularity because it is not obvious that the resulting society won't devote some tiny fraction of their computing power to simulations in which animals happen to suffer. The suffering is bad, but there are other values to consider here, that the scenario includes in far greater quantities.
Good point. Probably not, and for some, their views would change with new technological options. For others (environmentalist types especially), they would probably retain their old views. That said, the future-technology sword cuts both ways: Because most people aren't considering post-human tech, they're not thinking of (what some see as) the potential astronomical benefits from human survival. If 10^10 humans were only going to live at most another 1-2 billion years on Earth, their happiness could never outweigh the suffering of the 10^18 insects living on Earth at the same time. So if people aren't thinking about space colonization, why do they care so much about preserving humanity anyway? Two possible reasons are because they're speciesist and care more about humans or because they value things other than happiness and suffering. I think both are true here, and both are potentially problematic for CEV values. Yeah, that would be my concern. These days, "being rational" tends to select for people who have other characteristics, including being more utilitarian in inclination. Interesting idea about seeing how deep ecologists' views would change upon becoming more rational. We have different intuitions about how bad suffering is. My pain:pleasure exchange rate is higher than that of most people, and this means I think the expected suffering that would result from a Singularity isn't worth the potential for lots of happiness.
With bacon? There'd better be bacon!

This is a question about utilitarianism, not AI, but can anyone explain (or provide a link to an explanation) of why reducing the total suffering in the world is considered so important? I thought that we pretty much agreed that morality is based on moral intuitions and it seems pretty counterintuitive to value the states of mind of people too numerous to sympathize with as highly as people here do.

It seems to me that reducing suffering in a numbers game is the kind of thing you would say is your goal because it makes you sound like a good person, rather ... (read more)

When I become directly acquainted with an episode of intense suffering, I come to see that this is a state of affairs that ought not to exist. My empathy may be limited, but I don't need to empathize with others to recognize that, when they suffer, their suffering ought to be relieved too. I don't pretend to speak on behalf of all other hedonistic utilitarians, however. Brian himself would probably disagree with my answer. He would instead reply that he "just cares" about other people's suffering, and that's that.
Knowing that you've abandoned moral realism, how would you respond to someone making an analogous argument about preferences or duties? For instance, "When a preference of mine is frustrated, I come to see this as a state of affairs that ought not to exist," or "When someone violates a duty, I come to see this as a state of affairs that ought not to exist." Granted, the acquaintance may not be as direct as in the case of intense suffering. But is that enough to single out pleasure and suffering?
Preventing suffering is what I care about, and I'm going to try to convince other people to care about it. One way to do that is to invent plausible thought experiments / intuition pumps for why it matters so much. If I do, that might help with evangelism, but it's not the (original) reason why I care about it. I care about it because of experience with suffering in my own life, feeling strong empathy when seeing it in others, and feeling that preventing suffering is overridingly important due to various other factors in my development.
Thanks, Brian. I know this is your position, I'm wondering if it's benthamite's as well.
I am not sure that the hedonistic utilitarian agenda is high status. The most plausible cynical/psychological critique of the hedonistic utilitarian agenda, is that they are too worried about ethical consistency and about coherently extrapolating a simple principle from their values.
Cooperation for mutual benefit. Potential alliance building. Signalling of reliability, benevolence, and capability. It's often beneficial to adopt a general policy of helping strangers whenever the personal price is low enough. And (therefore) the human mind is such that people mostly enjoy helping others as long as it's not too strenuous.
You could reduce human suffering to 0 by reducing the number of humans to 0, so there's got to be another value greater than reducing suffering. It seems plausible to me that suffering could serve some useful purpose & eliminating it (or seeking to eliminate it) might have horrific consequences.
Almost all hedonistic utilitarians are concerned with maximizing happiness as well as minimizing suffering, including Brian. The reason that he talks about suffering so much is because, it is most people rank a unit of suffering as, say a -3 experience and a unit of suffering as, say, a -1 experience. And he thinks that there is much more suffering than happiness in the world and that it easier to prevent it. (Sorry if I got any of this wrong Brian)
Thanks, Jabberslythe! You got it mostly correct. :) The one thing I would add is that I personally think people don't usually take suffering seriously enough -- at least not really severe suffering like torture or being eaten alive. Indeed, many people may never have experienced something that bad. So I put high importance on preventing experiences like these relative to other things.
I'm not strongly emotionally motivated to reduce suffering in general but I realize that my and other instances of suffering are examples of suffering in general so I think it's a good policy to try to reduce world-suck. This is reasonably approximated by saying I would like to reduce unhappiness or increase happiness or some such thing.

Another thing to worry about with CEV is that the nonperson predicates that whoever writes it decides on will cover things that you consider people, or would not like to see be destroyed at the end of an instrumental simulation.

Humans probably have no built-in intuitions for the details of distinction of things that deserve ethical consideration at the precision required for a nonperson predicate that can flag things as nonpersons that will be useful for instrumental simulations, and yet not flag a fully-detailed simulation of you or me as a nonperson. We ... (read more)

Interesting story. Yes, I think our intuitions about what kinds of computations we want to care about are easily bent and twisted depending on the situation at hand. In analogy with Dennett's "intentional stance," humans have a "compassionate stance" that we apply to some physical operations and don't apply to others. It's not too hard to manipulate these intuitions by thought experiments. So, yes, I do fear that other people may differ (perhaps quite a bit) in their views about what kinds of computations are suffering that we should avoid.

I think that there's a misunderstanding about CEV going on.

At some point, we have to admit that human intuitions are genuinely in conflict in an irreconcilable way.

I don't think an AI would just ask us what we want, and then do what suits most of us. It would consider how our brains work, and exactly what shards of value make us up. Intuition isn't a very good guide to what is the best decision for us - the point of CEV is that if we knew more about the world and ethics, we would do different things, and think different thoughts about ethics.

You might... (read more)

6Peter Wildeford11y
But not similar enough, I'd argue. For example, I value not farming nonhuman animals and making sure significant resources address world poverty (for a few examples). Not that many other people do. Hopefully CEV will iron that out so this minority wins over the majority, but I don't quite know how. (Comment disclaimer: Yes, I am woefully unfamiliar with CEV literature and unqualified to critique it. But hey, this is a comment in discussion. I do plan to research CEV more before I actually decide to disagree with it, assuming I do disagree with it after researching it further.)
-1Ben Pace11y
Okay. Either, if we all knew more, thought faster, understood ourselves better, we would decide to farm animals, or we wouldn't. For people to be so fundamentally different that there would be disagreement, they would need massively complex adaptations / mutations, which are vastly improbable. Even if someone sits down, and thinks long and hard about an ethical dilemma, they can very easily be wrong. To say that an AI could not coherently extrapolate our volition, is to say we're so fundamentally unlike that we would not choose to work for a common good if we had the choice.
But why run this risk? The genuine moral motivation of typical humans seems to be weak. That might even be true of the people working for human and non-human altruistic causes and movements. What if what they really want, deep down, is a sense of importance or social interaction or whatnot? So why not just go for utilitarianism? By definition, that's the safest option for everyone to whom things can matter/be valuable. I still don't see what could justify coherently extrapolating "our" volition only. The only non-arbitrary "we" is the community of all minds/consciousnesses.
1Ben Pace11y
This sounds a bit like religious people saying "But what if it turns out that there is no morality? That would be bad!". What part of you thinks that this is bad? Because, that is what CEV is extrapolating. CEV is taking the deepest and most important values we have, and figuring out what to do next. You in principle couldn't care about anything else. If human values wanted to self-modify, then CEV would recognise this. CEV wants to do what we want most, and this we call 'right'. This is what you value, what you chose. Don't lose sight of invisible frameworks. If we're including all decision procedures, then why not computers too? This is part of the human intuition of 'fairness' and 'equality' too. Not the hamster's one.
Yes. We want utilitarianism. You want CEV. It's not clear where to go from there. FWIW, hamsters probably exhibit fairness sensibility too. At least rats do.
The point you quoted is my main objection to CEV as well. Right now there are large groups who have specific goals that fundamentally clash with some goals of those in other groups. The idea of "knowing more about [...] ethics" either presumes an objective ethics or merely points at you or where you wish you were.
-1Ben Pace11y
The existence of moral disagreement is not an argument against CEV, unless all disagreeing parties know everything there is to know about their desires, and are perfect bayesians. Otherwise, people can be mistaken about what they really want, or what the facts prescribe (given their values). 'Objective ethics'? 'Merely points... at where you wish you were'? "Merely"!? Take your most innate desires. Not 'I like chocolate' or 'I ought to condemn murder', but the most basic levels (go to a neuroscientist to figure those out). Then take the facts of the world. If you had a sufficiently powerful computer, and you could input the values and plug in the facts, then the output would be what you wanted to do best. That doesn't mean whichever urge is strongest, but it takes into account the desires that make up your conscience, and the bit of you saying 'but that's not what's right'. If you could perform this calculation in your head, you'd get the feeling of 'Yes, that's what is right. What else could it possibly be? What else could possibly matter?' This isn't 'merely' where you wish you were. This is the 'right' place to be. This reply is more about the meta-ethics, but for interpersonal ethics, please see my response to peter_hurford's comment above.
The fact that people can be mistaken about what they really want is vanishingly small evidence that if they were not mistaken, they would find out they all want the same things.
A very common desire is to be more prosperous than one's peers. It's not clear to me that there is some "real" goal that this serves (for an individual) -- it could be literally a primary goal. If that's the case, then we already have a problem: two people in a peer group cannot both get all they want if both want to have more than any other. I can't think of any satisfactory solution to this. Now, one might say, "well, if they'd grown up farther together this would be solvable", but I don't see any reason that should be true. People don't necessarily grow more altruistic as they "grow up", so it seems that there might well be no CEV to arrive at. I think, actually, a weaker version of the UFAI problem exists here: sure, humans are more similar to each other than UFAI's need be to each other, but they still seem fundamentally different in goal systems and ethical views, in many respects.
Objective? Sure, without being universal. Human beings are physically/genetically/mentally similar within certain tolerances; this implies there is one system of ethics (within certain tolerances) that is best suited all of us, which could be objectively determined by a thorough and competent enough analysis of humans. The edges of the bell curve on various factors might have certain variances. There might be a multi-modal distribution of fit (bimodal on men and women, for example), too. But, basically, one objective ethics for humans. This ethics would clearly be unsuited for cats, sharks, bees, or trees. It seems vanishingly unlikely that sapient minds from other evolutions would also be suited for such an ethics, either. So it's not universal, it's not a code God wrote into everything. It's just the best way to be a human . . . as humans exposed to it would in fact judge, because it's fitted to us better than any of our current fumbling attempts.
Why not include primates, dolphins, rats, chickens, etc. into the ethics?
What would that mean? How would the chicken learn or follow the ethics? Does it seem even remotely reasonable that social behavior among chickens and social behavior among humans should follow the same rules, given the inherent evolutionary differences in social structure and brain reward pathways? It might be that CEV is impossible for humans, but there's at least enough basic commonality to give it a chance of being possible.

Why would the chicken have to learn to follow the ethics in order for its interests to be fully included in the ethics? We don't include cognitively normal human adults because they are able to understand and follow ethical rules (or, at the very least, we don't include them only in virtue of that fact). We include them because to them as sentient beings, their subjective well-being matters. And thus we also include the many humans who are unable to understand and follow ethical rules. We ourselves, of course, would want to be still included in case we lost the ability to follow ethical rules. In other words: Moral agency is not necessary for the status of a moral patient, i.e. of a being that matters morally.

The question is how we should treat humans and chickens (i.e. whether and how our decision-making algorithm should take them and their interests into account), not what social behavior we find among humans and chickens.

Constructing an ethics that demands that a chicken act as a moral agent is obviously nonsense; chickens can't and won't act that way. Similarly, constructing an ethics that demands humans value chickens as much as they value their own children is nonsense; humans can't and won't act that way. If you're constructing an ethics for humans follow, you have to start by figuring out humans. It's not until after you've figured out how much humans should value the interests of chickens that you can determine how much to weigh the interests of chickens in how humans should act. And how much humans should weigh the value of chickens is by necessity determined by what humans are.
Well, if humans can't and won't act that way, too bad for them! We should not model ethics after the inclinations of a particular type of agent, but we should instead try and modify all agents according to ethics. If we did model ethics after particular types of agent, here's what would result: Suppose it turns out that type A agents are sadistic racists. So what they should do is put sadistic racism into practice. Type B agents, on the other hand, are compassionate anti-racists. So what they should do is diametrically opposed to what type A agents should do. And we can't morally compare types A and B. But type B is obviously objectively better, and objectively less of a jerk. (Whether type A agents can be rationally motivated (or modified so as) to become more B-like is a different question.)
Of course we can morally compare types A and B, just as we can morally compare an AI whose goal is to turn the world into paperclips and one whose goal is to make people happy. However, rather than "objectively better", we could be more clear by saying "more in line with our morals" or some such. It's not as if our morals came from nowhere, after all. See also: "The Bedrock of Morality: Arbitrary?"
-1Ben Pace11y
Just to make clear, are you saying that we should treat chickens how humans want to treat them, or how chickens do? Because if the former, then yeah, CEV can easily find out whether we'd want them to have good lives or not (and I think it would see we do). But chickens don't (I think) have much of an ethical system, and if we incorporated their values into what CEV calculates, then we'd be left with some important human values, but also a lot of chicken feed.
Thanks, Benito. Do we know that we shouldn't have a lot of chicken feed? My point in asking this is just that we're baking in a lot of the answer by choosing which minds we extrapolate in the first place. Now, I have no problem baking in answers -- I want to bake in my answers -- but I'm just highlighting that it's not obvious that the set of human minds is the right one to extrapolate. BTW, I think the "brain reward pathways" between humans and chickens aren't that different. Maybe you were thinking about the particular, concrete stimuli that are found to be rewarding rather than the general architecture.
It does not imply that there exists even one basic moral/ethical statement any human being would agree with, and to me that seems to be a requirement for any kind of humanity-wide system of ethics. Your 'one size fits all' approach does not convince me, and your reasoning seems superficial and based on words rather than actual logic.
All humans as they currently exist, no. But is there a system of ethics as a whole that humans, even currently disagreeing with some parts of it, would recognize as superior at doing what they really want from an ethical system that they would switch to it? Even in the main? Maybe, indeed, human ethics are so dependent on alleles that vary within the population and chance environmental factors that CEV is impossible. But there's no solid evidence to require assuming that a priori, either. By analogy, consider the person who in 1900 wanted to put together the ideal human diet. Obviously, the diets in different parts of the world differed from each other extensively, and merely averaging all of them that existed in 1900 would not be particularly conducive to finding an actual ideal diet. The person would have to do all the sorts of research that discovered the roles of various nutrients and micronutrients, et cetera. Indeed, he'd have to learn more than we currently do about them. And he'd have to work out the variations to react to various medical conditions, and he'd have to consider flavor (both innate response pathways and learned ones), et cetera. And then there's the limit of what foods can be grown where, what shipping technologies exist, how to approximate the ideal diet in differing circumstances. It would be difficult, but eventually you probably could put together a dietary program (including understood variations) that would, indeed, suit humans better than any of the existing diets in 1900, both in nutrition and pleasure. It wouldn't suit sharks at all; it would not be a universal nutrition. But it would be an objectively determined diet just the same.
The problem with this diet is that it wouldn't be a diet; it would be many different diets. Lots of people are lactose intolerant and it would be stupid to remove dairy products from the diet of those who are not. Likewise, a vegetarian diet is not a "variation" of a non-vegetarian diet. Also, why are you talking about 1900? I think the fact that humans can't agree on even the most basic issues is pretty solid evidence. Also, even if everyone had the same subjective ethics, this still would result in objective contradictions. I'm not aware of any evidence that this problem is solvable at all.
Not similar enough to prevent massive conflicts - historically. Basically, small differences in optimisation targets can result in large conflicts.
And even more simply, if everyone has exactly the same optimization target "benefit myself at the expense of others", then there's a big conflict.
0Ben Pace11y
The existence of moral disagreement is not an argument against CEV, unless all disagreeing parties know everything there is to know about their desires, and are perfect bayesians. People can be mistaken about what they really want, or what the facts prescribe (given their values). I linked to this above, but I don't know if you've read it. Essentially, you're explaining moral disagreement by positing massively improbable mutations, but it's far more likely to be a combination of bad introspection and non-bayesian updating.
Um, different organisms of the same species typically have conflicting interests due to standard genetic diversity - not "massively improbable mutations". Typically, organism A acts as though it wants to populate the world with its offspring, and organism B acts as though it wants to populate the world with its offspring, and these goals often conflict - because A and B have non-identical genomes. Clearly, no "massively improbable mutations" are required in this explanation. This is pretty-much biology 101.
It's very hard for A and B to know how much their genomes differ, because they can only observe each other's phenotypes, and they can't invest too much time in that either. So they will mostly compete even if their genomes happen to be identical.
The kin recognition that you mention may be tricky, but kin selection is much more widespread - because there are heuristics that allow organisms to favour their kin without the need to examine them closely - like: "be nice to your nestmates". Simple limited dispersal often results in organisms being surrounded by their close kin - and this is a pretty common state of affairs for plants and fungi.
2Ben Pace11y
Oops. Yup, I missed something there. Well, for humans, we've evolved desires that work interpersonally (fairness, desires for others' happiness etc,). I think that an AI, which had our values written in, would have no problem figuring out what's best for us. It would say 'well, there's is complex set of values, that sum up to everyone being treated well (or something), and so each party involved should be treated well.' You're right though, I hadn't made clear idea about how this bit worked. Maybe this helps?

First, the set of minds included in CEV is totally arbitrary, and hence, so will be the output. Why include only humans? Why not animals? Why not dead humans? Why not humans that weren't born but might have been? Why not paperclip maximizers? Baby eaters? Pebble sorters? Suffering maximizers? Wherever you draw the line, there you're already inserting your values into the process.

I agree that it is impossible to avoid inserting your values, and CEV does not work as a meta-ethical method of resolving ethical differences. However, it may be effective as a ... (read more)