I first took note of "Coherent Extrapolated Volition" in 2006. I thought it was a brilliant idea, an exact specification of how to arrive at a better future: Figure out exactly how it is that humans make their existing choices, idealize that human decision procedure according to its own criteria, and then use the resulting "renormalized human utility function" as the value system of an AI. The first step is a problem in cognitive neuroscience, the second step is a conceptual problem in reflective decision theory, and the third step is where you make the Friendly AI.

For some reason, rather than pursuing this research program directly, people interested in CEV talk about using simulated human beings ("uploads", "ems", "whole-brain emulations") to do all the hard work. Paul Christiano just made a post called "Formalizing Value Extrapolation"; but it's really about formalizing the safe outsourcing of value extrapolation to a group of human uploads. All the details of how value extrapolation is actually performed (e.g. the three steps listed above) are left completely unspecified. Another recent article proposed that making an AI with a submodule based on models of its makers' opinions is the fast way to Friendly AI. It's also been suggested to me that simulating human thinkers and running them for centuries of subjective time until they reach agreement on the nature of consciousness is a way to tackle that problem; and clearly the same "solution" could be applied to any other aspect of FAI design, strategy, and tactics.

Whatever its value as a thought experiment, in my opinion this idea of outsourcing the hard work to simulated humans has zero practical value, and we would be much better off if the minuscule sub-sub-culture of people interested in creating Friendly AI didn't think in this way. Daydreaming about how they'd solve the problem of FAI in Permutation City is a recipe for irrelevance.

Suppose we were trying to make a "C.elegans-friendly AI". The first thing we would do is take the first step mentioned above - we would try to figure out the C.elegans utility function or decision procedure. Then we would have to decide how to aggregate utility across multiple individuals. Then we would make the AI. Performing this task for H.sapiens is a lot more difficult, and qualitatively new factors enter at the first and second steps, but I don't see why it is fundamentally different, different enough that we need to engage in the rigmarole of delegating the task to uploaded human beings. It shouldn't be necessary, and we probably won't even get the chance to do so; by the time you have hardware and neuro-expertise sufficient to emulate a whole human brain, you will most likely have nonhuman AI anyway.

A year ago, I wrote: "My expectation is that the presently small fields of machine ethics and neuroscience of morality will grow rapidly and will come into contact, and there will be a distributed research subculture which is consciously focused on determining the optimal AI value system in the light of biological human nature. In other words, there will be human minds trying to answer this question long before anyone has the capacity to direct an AI to solve it. We should expect that before we reach the point of a Singularity, there will be a body of educated public opinion regarding what the ultimate utility function or decision method (for a transhuman AI) should be, deriving from work in those fields which ought to be FAI-relevant but which have yet to engage with the problem. In other words, they will be collectively engaging with the problem before anyone gets to outsource the necessary research to AIs."

I'll also link to my previous post about "practical Friendly AI". What I'm doing here is going into a fraction more detail about how you arrive at the Friendly value system. There, I basically said that you just get a committee together and figure it out, clearly an inadequate recipe, but in that article I was focused more on sketching the nature of an organization and a plan which would have some chance of genuinely creating FAI in the real world. Here, I'll say that working out the Friendly value system consists of: making a naturalistic explanation of how human decision-making occurs; determining the core essentials of that process, and applying its own metamoral criteria to arrive at a "renormalized" decision procedure that has been idealized according to human cognition's own preferences ("our wish if we knew more, thought faster, were more the people we wished we were"); and then implementing that decision procedure within an AI - this is where all the value-neutral parts of AI research come into play, such as AGI theory, the theory of value stability under self-modification, and so on. That is the sort of "value extrapolation" that we should be "formalizing" - and preparing to carry out in real life. 

New Comment
5 comments, sorted by Click to highlight new comments since:

This seems quite sane. Upvoted.

I am not an expert tho. What do the experts think of this?

I do have some concern with the neuroscience-only approach to value...

It shouldn't be necessary,

Why? What's your response to Eliezer's "complexity and fragility of values" argument?

Mitchell, sorry, you're clearly a smart person, but you seem to have a habit of repeating your arguments when people disagree with them. You have made this particular argument twice already, I heard you perfectly well both times, and so did the others I guess. Could you answer Paul's question instead?

What I'm doing here is not just restating a criticism, I am stating, in a very abbreviated form, an alternative.

Paul asked what my problem is with the scenario. Well, originally I thought he was proposing that in real life, we should rise to the challenge of Friendly AI by literally uploading the researchers. Then in your reply, you indicated that this scenario might also just function as a thought experiment, a first approximation to a presently unknown process that will reliably result in Friendly AI.

If someone is seriously suggesting that the path to FAI involves uploading the researchers, I say they are both severely underestimating the difficulty of achieving successful whole-brain emulation, and that they are going to be throwing away their precious time by working on the wrong problem (WBE rather than FAI), when they should be tackling FAI directly. A lot of the time, I think people imagine that in such a scenario, even WBE isn't achieved by human effort, it's just figured out by a not-yet-friendly general-problem-solving proto-AI. We think we have an idea of how to make AIs that produce causal models, so we just get this proto-AI to make a causal model of someone, and abracadabra, we have the upload we wanted.

What I find pernicious about this mode of thought is that certain problems - the details of how you "extrapolate the values" of a particular thinking system, the details of how a whole-brain emulation is produced - are entirely avoided; it is assumed that we will have access to human simulations smart enough to solve the first problem, and causal modellers powerful enough to solve the second problem (and thereby provide the human simulations that will be used to solve the first problem). In effect, we assume the existence of ultrapowerful functions makeCausalModel() and askAgentToSolveThisProblem(), and then the algorithm for solving FAI is something like


that is, one asks the agent produced as a causal model of one of the researchers to "solve this problem".

It's fun to write singularity-level pseudocode, and perhaps it will be scary when there are proto-AIs advanced enough to accept an instruction like that and start looking through their heuristics in search of a way to implement it. But I don't consider it a serious engagement with the problem of FAI. Solving the problem of FAI is more akin to solving one of the Millennium Problems, it's a big intellectual task which may be the work of a whole generation in several disciplines, but trying to achieve it by getting uploads to do it is asking for a post-singularity technology to help you achieve a singularity. The way to progress is to directly tackle the details of neuroscience and decision theory.

Regarding the other rationale, according to which we are not literally setting out to upload the researchers before they tackle the problem, we're just telling an imaginary AI to act as those uploaded researchers would ultimately advise it to act, after their 500 subjective years in cyberspace... something like Cyc might be able to parse that command, and Cyc is a present reality where uploads are not, so there's a fraction more realism here. But presumably the AI's diagnostics would eventually tell you something like "need for computing substrate 10^9 more powerful than current processor detected", that you could have figured out for yourself.

I see it as a little out of character for me to go bashing blue-sky speculation and flights of theoretical ingenuity. But as things stand, the attempt to define the perfectly safeguarded wish about who to upload, what to ask of them, and how to curate them in silico, seems to define how people are approaching the problem of FAI, and that's ridiculous and unnecessary. There's plenty to think about and to work on, if one sets out to directly tackle the problem of value extrapolation. This "direct" approach ought to define the mainstream of FAI research, not the indirect approach of "get the uploads to solve the problem".

something like Cyc might be able to parse that command

Actually we might be able to explain that command to a math-only AI pretty soon. For example, in Paul's "formal instructions" approach, you point the AI at a long bitstring generated by a human, and ask the AI to find the most likely program that generated that bitstring under the universal prior. As a result you get something functionally similar to an upload. There are many potential problems with this approach (e.g. how do you separate the human from the rest of the world?) but they seem like the sort of problems you can solve with clever tricks, not insurmountable obstacles. And there might be other ways to use math-only AIs to boost uploading, we've only been thinking about this for several months.

But presumably the AI's diagnostics would eventually tell you something like "need for computing substrate 10^9 more powerful than current processor detected", that you could have figured out for yourself.

A probabilistic answer will suffice, I think, and it doesn't seem to require that much computing power. It's suspicious that you can make educated guesses about what you're going to think tomorrow, but a strong AI somehow cannot. I'd expect a strong AI to be better at answering math questions than a human.

ETA: if solving the object-level problem is a more promising approach than bootstrapping through the meta-problem, then I certainly want to believe that is the case. Right now I feel we only have a viable attack on the meta-problem. If you figure out a viable attack on the object-level problem, be sure to let us know!