Troubles With CEV Part1 - CEV Sequence

bydiegocaleiro7y28th Feb 201210 comments

7


The CEV Sequence Summary: The CEV sequence consists of three posts tackling important aspects of CEV. It covers conceptual, practical and computational problems of CEV's current form. On What Selves Are draws on analytic philosophy methods in order to clarify the concept of Self, which is necessary in order to understand whose volition is going to be extrapolated by a machine that implements the CEV procedure. Troubles with CEV part1 and Troubles with CEV part2 on the other hand describe several issues that will be faced by the CEV project if it is actually going to be implemented. Those issues are not of conceptual nature. Many of the objections shown come from scattered discussions found on the web. Finally, some alternatives to CEV are considered.

 

Troubles with CEV Summary: Starting with a summary of CEV, we proceed to show several objections to CEV. First, specific objections to the use of Coherence, Extrapolation, and Volition. Here Part1 ends. Then, in Part2, we continue with objections related to the end product of performing a CEV, and finally, problems relating to the implementation of CEV. We then go on with a praise of CEV, pointing out particular strengths of the idea. We end by showing six alternatives to CEV that have been proposed, and considering their vices and virtues.

Meta: I think Troubles With CEV Part1 and Part2 should be posted to Main. So on the comment section of Part2, I put a place to vote for or against this upgrade.

 

Troubles with CEV Part1

 

Summary of CEV

To begin with, let us remember the most important slices of Coherent Extrapolated Volition (CEV).

“Friendly AI requires:

1.  Solving the technical problems required to maintain a well-specified abstract invariant in a self-modifying goal system. (Interestingly, this problem is relatively straightforward from a theoretical standpoint.)

2.  Choosing something nice to do with the AI. This is about midway in theoretical hairiness between problems 1 and 3.

3.  Designing a framework for an abstract invariant that doesn't automatically wipe out the human species. This is the hard part.

But right now the question is whether the human species can field a non-pathetic force in defense of six billion lives and futures.”
Friendliness is the easiest part of the problem to explain - the part that says what we want. Like explaining why you want to fly to London, versus explaining a Boeing 747; explaining toast, versus explaining a toaster oven. ”

“To construe your volition, I need to define a dynamic for extrapolating your volition, given knowledge about you. In the case of an FAI, this knowledge might include a complete readout of your brain-state, or an approximate model of your mind-state. The FAI takes the knowledge of Fred's brainstate, and other knowledge possessed by the FAI (such as which box contains the diamond), does... something complicated... and out pops a construal of Fred's volition. I shall refer to the "something complicated" as the dynamic.”

This is essentially what CEV is: extrapolating Fred's mind and everyone else's in order to grok what Fred wants. This is performed from a reading of Fred's psychological states, be it through unlikely neurological paths, or through more coarse grained psychological paths. There is reason to think that a complete readout of a brain is overwhelmingly more complicated than a very good descriptive psychological approximation. We must make sure though that this approximation does not rely on our common human psychology to be understood. The descriptive approximation has to be understandable by AGI's, not only by evolutionarily engineered humans. Continuing the summary.

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.“

Had grown up farther together: A model of humankind's coherent extrapolated volition should not extrapolate the person you'd become if you made your decisions alone in a padded cell. Part of our predictable existence is that we predictably interact with other people. A dynamic for CEV must take a shot at extrapolating human interactions, not just so that the extrapolation is closer to reality, but so that the extrapolation can encapsulate memetic and social forces contributing to niceness.“

“the rule [is] that the Friendly AI should be consistent under reflection (which might involve the Friendly AI replacing itself with something else entirely).”

The narrower the slice of the future that our CEV wants to actively steer humanity into, the more consensus required.“

“The dynamic of extrapolated volition refracts through that cognitive complexity of human minds which lead us to care about all the other things we might want; love, laughter, life, fairness, fun, sociality, self-reliance, morality, naughtiness, and anything else we might treasure. ”

“It may be hard to get CEV right - come up with an AI dynamic such that our volition, as defined, is what we intuitively want. The technical challenge may be too hard; the problems I'm still working out may be impossible or ill-defined.

“The same people who aren't frightened by the prospect of making moral decisions for the whole human species lack the interdisciplinary background to know how much complexity there is in human psychology, and why our shared emotional psychology is an invisible background assumption in human interactions, and why their Ten Commandments only make sense if you're already a human. ”

“Even if our coherent extrapolated volition wants something other than a CEV, the programmers choose the starting point of this renormalization process; they must construct a satisfactory definition of volition to extrapolate an improved or optimal definition of volition. ”

 

Troubles with CEV

1) Stumbling on People, Detecting the Things CEV Will Extrapolate:

Concepts on which CEV relies that may be ill-defined, not having a stable consistent structure in thingspace.

CEV relies on many concepts, most notably the concepts of coherence, extrapolation and volition. We will discuss the problems of coherence and extrapolation shortly, for now I'd like to invoke a deeper layer of conceptual problems regarding the execution of a CEV implementing machine. A CEV executing machine ought to be able to identify the kind of entities whose volitions matter to us, the machine must be able to grasp selfhood, or personhood. The concepts of self and person are mingled and complex, and due to their complexity I have dedicated a separate text to address the issue of incompleteness, anomalousness, and fine-grainedness of selves.

 

2) Troubles with coherence

2a) The Intrapersonal objection: The volitions of the same person when in two different emotional states might be different - it’s as if they are two different people. Is there any good criteria by which a person’s “ultimate” volition may be determined? If not, is it certain that even the volitions of one person’s multiple selves will be convergent? As explained in detail in Ainslie's “Breakdown of Will”, we are made of lots of tinier interacting time-slices whose conflicts cannot be ignored. My chocolate has value 3 now, 5 when it's in my mouth and 0 when I reconsider how quick the pleasure was and how long the fat will stay. Valuations not only interpersonally, but also intrapersonally conflict. The variation in what we value can be correlated with not only with different distances in time, but also different emotional states, priming, background assumptions and other ways in which reality hijacks brains for a period.

 

2b) The Biological Onion objection: Our volitions can be thought of to be like an onion, layers upon layers of beliefs and expectations. The suggestion made by CEV is that when you strip away the layers that do not cohere, you reach deeper regions of the onion. Now, and here is the catch, what if there is no way to get coherence unless you stripe away everything that is truly humane, and end up being left only with that which is biological. What if in service of coherence we end up stripping away everything that matters and end up only with our biological drives? There is little in common between Eliezer, Me and Al Qaeda terrorists, and most of it is in the so called reptilian brain. We may end up with a set of goals and desires that are nothing more than “Eat Survive Reproduce,” which would qualify as a major loss in the scheme of things. In this specific case, what ends up dominating CEV is what evolution wants, not what we want. Instead of creating a dynamic with a chance of creating the landscape of a Nice Place to Live, we end up with some exotic extrapolation of simple evolutionary drives. Let us call this failure mode Defeated by Evolution. We are Defeated by Evolution if at any time the destiny of earth becomes nothing more than darwinian evolution all over again, at a different level of complexity or at different speed. So if CEV ends up stripping the biological onion of its goals that matter, extrapolating only a biological core, we are defeated by evolution.

 

3) Troubles with extrapolation

3a) The Small Accretions Objection: Are small accretions of intelligence analogous to small accretions of time in terms of identity? Is extrapolated person X still a reasonable political representative of person X? Are X's values desirably preserved when she is given small accretions of intelligence? Would X allow her extrapolation to vote for her?

This objection is made through an analogy. For countless time philosophers have argued about the immortality of the soul, the existence of the soul, the complexity of the soul and last but not least the identity of the soul with itself over time.

Advancements in the field of philosophy are sparse and usually controversial, and if we were depending on a major advance in understanding of the complexity of our soul we'd be in a bad situation. Luckily, our analogy relies on the issue of personal identity, where it appears as though the issue of personal identity has been treated in sufficient detail by the book Reasons and Persons, Derek Parfit's major contribution to philosophy: Covering cases from fission and fusion to teleportation and identity over time. It is identity over time which concerns us here; Are you the same person as the person you were yesterday? How about one year ago? Or ten years? Derek has helped the philosophical community by reframing the essential question, instead of asking whether X is the same over time, he asks if personal identity is what matters, that is, that which we want to preserve when we deny others the right of shooting us. More recently he develops the question in full detail in his “Is Personal Identity What Matters?”(2007) a long article were all the objections to his original view are countered in minute detail.

We are left with a conception of identity over time not being what matters, and psychological relatedness being the best candidate to take its place. Personal identity is dissolved into a quantitative, not qualitative, question. How much are you the same and the one you were yesterday? Here some percentage enters the field, and once you know how much you are like the person you were yesterday, there is no further question about how much you are the person you were yesterday. We had been asking the wrong question for long, and we risk to be doing the same thing with CEV. What if extrapolation is a process that dissolves that which matters about us and our volitions? What if there is no transitivity of what matters between me and me+1 or me+2 in the intelligence scale? Then abstracting my extrapolation will not preserve what had to be preserved in the first place. To extrapolate our volition in case we knew more, thought faster and had grown up farther together is to accrue small quantities of intelligence during the dynamic, and doing this may be risky. Even if some of our possible extrapolations would end up generating part of a Nice Place to Be, we must be sure none of the other possible extrapolations actually happen. That is, we must make sure CEV doesn't extrapolate in a way that for each step of extrapolation, one slice of what matterness is lost. Just like small accretions of time make you every day less the person you were back in 2010, maybe small accretions of intelligence will be displacing ourselves from what is preserved. Maybe smarter versions of ourselves are not us at all - this is the The Small Accretions Objection.


4) Problems with the concept of Volition

4a) Blue minimizing robots (Yvain post)

4b) Goals vs. Volitions

The machine's actions should be grounded in our preferences, but those preferences are complex and opaque, making our reports unreliable; to truly determine the volitions of people, there must be a previously recognized candidate predictor. We test the predictor in its ability to describe current humans volitions before we give it the task of comprehending extrapolated human volition.

4c) Want to want vs. Would want if thought faster, grew stronger together

Eliezer suggests in CEV that we consider a mistake to give Fred box A if he wanted box A while thinking it contained a diamond, in case we know both that box B contains the diamond and that Fred wants the diamond. Fred's volition, we are told, is to have the diamond, and we must be careful to create machines that extrapolate volition, not mere wanting. This is good, but not enough. There is a sub-area of moral philosophy dedicated to understanding that which we value, and even though it may seem at firsthand that we value our volitions, the process that leads from wanting to having a volition is a different process than the one that leads from wanting to having a value. Values, as David Lewis has argued, are what we want to want. Volitions on the other hand are what we would ultimately want under less stringent conditions. Currently CEV does not consider the iterated wantness aspect of things we value (the want to want aspect). This is problematic in case our volitions do not happen to be constrained by what we value, that is, what we desire to desire. Suppose Fred knows that the diamond he thinks is in box A comes from a bloody conflict region. Fred hates bloodshed and he truly desires not to have desires for diamonds, he wants to be a person that doesn't want diamonds from conflict regions. Yet the flesh is weak and Fred, under the circumstance, really wants the diamond. Both Fred's current volition, and Fred's extrapolated volition would have him choose box B, if only he knew, and in neither case Fred's values have been duly considered. It may be argued that a good enough extrapolation would end up considering his disgust of war, but here we are talking not about a quantitative issue (how much improvement there was) but a qualitative leap (what kind of thing should be preserved). If it is the case, as I argue here, that we ought to preserve what we want to want, this must be done as a separate consideration, not as an addendum, to preserving our volitions, both current and extrapolated.

 

Continues in Part2

7