It’s the year 2045, and Dr. Evil and the Singularity Institute have been in a long and grueling race to be the first to achieve machine intelligence, thereby controlling the course of the Singularity and the fate of the universe. Unfortunately for Dr. Evil, SIAI is ahead in the game. Its Friendly AI is undergoing final testing, and Coherent Extrapolated Volition is scheduled to begin in a week. Dr. Evil learns of this news, but there’s not much he can do, or so it seems. He has succeeded in developing brain scanning and emulation technology, but the emulation speed is still way too slow to be competitive.
There is no way to catch up with SIAI's superior technology in time, but Dr. Evil suddenly realizes that maybe he doesn’t have to. CEV is supposed to give equal weighting to all of humanity, and surely uploads count as human. If he had enough storage space, he could simply upload himself, and then make a trillion copies of the upload. The rest of humanity would end up with less than 1% weight in CEV. Not perfect, but he could live with that. Unfortunately he only has enough storage for a few hundred uploads. What to do…
Ah ha, compression! A trillion identical copies of an object would compress down to be only a little bit larger than one copy. But would CEV count compressed identical copies to be separate individuals? Maybe, maybe not. To be sure, Dr. Evil gives each copy a unique experience before adding it to the giant compressed archive. Since they still share almost all of the same information, a trillion copies, after compression, just manages to fit inside the available space.
Now Dr. Evil sits back and relaxes. Come next week, the Singularity Institute and rest of humanity are in for a rather rude surprise!
The point about demarcating individuals is important for ethical theories generally (and decision theories that make use of spooky 'reference classes'). Bostrom's Duplication of experience paper illustrates the problem further.
Also, insofar as CEV is just a sort of idealized deliberative democracy, this points to the problem of emulations with systematically unrepresentative values rapidly becoming the majority when emulation hardware is cheap.
Any ethical theory that depends on demarcating individuals, or "counting people", appears doomed.
It seems likely that in the future, "individuals" will be constantly forked and merged/discarded as a matter of course. And like forking processes in Unix, such operations will probably make use of copy-on-write memory to save resources. Intuitively it makes little sense to attach a great deal of ethical significance to the concept of "individual" in those circumstances.
Is it time to give up, and start looking for ethical theories that don't depend on a concept of 'individual"? I'm curious what your thoughts are.
Arguably, the concept of "individual" is incoherent even with ordinary humans, for at least two reasons.
First, one could argue that human brain doesn't operate as a single agent in any meaningful sense, but instead consists of a whole bunch of different agents struggling to gain control of external behavior -- and what we perceive as our stream of consciousness is mostly just delusional confabulation giving rise to the fiction of a unified mind thinking and making decisions. (The topic was touched upon in this LW post and the subsequent discussion.)
Second, it's questionable whether the concept of personal identity across time is anything more than an arbitrary subjective preference. You believe that a certain entity that is expected to exist tomorrow can be identified as your future self, so you assign it a special value. From the evolutionary perspective, it's clear why humans have this value, and the concept is more or less coherent assuming the traditional biological constraints on human life, but it completely breaks down once this assumption is relaxed (as discussed in this recent thread). Therefore, one could argue that the idea of an "individual" existin... (read more)
Thinking about this a bit more, and assuming CEV operates on the humans that exist at the time of its application: Why would CEV operate on humans that do exist, and not on humans that could exist? It seems this is what Dr. Evil is taking advantage of, by densely populating identity-space around him and crowding out the rest of humanity. But this could occur for many other reasons: Certain cultures encouraging high birth rates, certain technologies or memes being popular at the time of CEV-activation that affect the wiring of the human brain, or certain historical turns that shape the direction of mankind. A more imaginative scenario: what if another scientist, who knows nothing about FAI and CEV, finds it useful to address a problem by copying himself into trillions of branches, each examining a certain hypothesis, and all the branches are (merged/discarded) when the answer is found. Let's further say that CEV t-zero occurs when the scientist is deep in a problem-solving cycle. Would the FAI take each branch as a separate human/vote? This scenario involves no intent to defraud the system. It also is not manipulation of a proxy, as there is a real definitional problem here whose answer is not easily apparent to a human. Applying CEV to all potential humans that could have existed in identity-space would deal with this, but pushes CEV further and further into uncomputable territory.
To do the latter, you would need a definition of "human" that can not just distinguish existing humans from existing non-humans, but also pick out all human minds from the space of all possible minds. I don't see how to specify this definition. (Is this problem not obvious to everyone else?)
For example, we might specify a prototypical human mind, and say that "human" is any mind which is less than a certain distance from the prototypical mind in design space. But then the CEV of this "humankind" is entirely dependent on the prototype that we pick. If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of "humanity" come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.
One could of course define the CEV in terms of some previous population, say circa the year 2000. But then you might wonder why it is fair to give higher weight to those groups that managed to reproduce from 1900 to 2000, and so defining it in terms of year 1900 people might be better. But then how far back do you go? How is the increase in population that Dr. Evil manages to achieve for his descendants less legitimate than all the prior gains of various groups in previous periods?
Simple solution: Build an FAI to optimize the universe to your own utility function instead of humanity's average utility function. They will be nearly the same thing anyway (remember, you were tempted to have the FAI use the average human utility function instead, so clearly, you sincerely care about other people's wishes). And in weird situations in which the two are radically different (like this one), your own utility function more closely tracks the intended purpose of an FAI.
The thing I've never understood about CEV is how the AI can safely read everyone's brain. The whole point of CEV is that the AI is unsafe unless it has a human value system, but before it can get one, it has to open everyones heads and scan their brains!? That doesn't sound like something I'd trust a UFAI to do properly.
I bring this up because without knowing how the CEV is supposed to occur it is hard to analyse this post. I also agree with JoshuaZ that this didn't deserve a top-level post.
Presumably by starting with some sort of prior, and incrementally updating off of available information (the Web, conversation with humans, psychology literature, etc). At any point it would have to use its current model to navigate tradeoffs between the acquisition of new information about idealised human aims and the fulfillment of those aims.
This does point to another more serious problem, which is that you can't create an AI to "maximize the expected value of the utility function written in this sealed envelope" without a scheme for interpersonal comparison of utility functions (if you assign 50% probability to the envelope containing utility function A, and 50% probability to the envelope containing utility function B, you need an algorithm to select between actions when each utility function alone would favor a different action). See this OB post by Bostrom.
CEV is more like figuring out an ethical theory, than it is about running around fighting fires, granting wishes, and so on. The latter part is the implementation of the ethical theory. That part - the implementation - has to be consultative or otherwise responsive to individual situations. But the first part, CEV per se - deciding on principles - is not going to require peering into the mind of every last human being, or even very many of them.
It is basically an exercise in applied neuroscience. We want to understand the cognitive basis of human rationality and decision-making, including ethical and metaethical thought, and introduce that into an AI. And it's going to be a fairly abstract thing. Although human beings love food, sex, and travel, there is no way that these are going to be axiomatic values for an AI, because we are capable of coming up with ideas about what amounts to good or bad treatment of organisms or entities with none of those interests. So even if our ethical AI looks at an individual human being and says, that person should be fed, it won't be because its theo... (read more)
Summing up the only counterhacks presented, not including deeper discussions of the other issues people had with CEV.
Taking into account only variances from one mind to another, so that very similar minds cluster and their volition is taken into account, but not given any great preference. running into problem - normal human majorities are also made into minorities.
Taking into account cycle time of humans
Taking into account unique experiences weighted by hours of unique experience
Doing CEV on possible human minds, instead of present human minds **
This seems to be another case where explicit, overt reliance on a proxy drives a wedge between the proxy and the target.
One solution is to do the CEV in secret and only later reveal this to the public. Of course, as a member of said public, I would instinctively regard with suspicion any organization that did this, and suspect that the proffered explanation (some nonsense about a hypothetical "Dr. Evil") was a cover for something sinister.
EDIT: Doesn't work, see Wei Dai below.
This isn't a bug in CEV, it's a bug in the universe. Once the majority of conscious beings are Dr. Evil clones, then Dr. Evil becomes a utility monster and it gets genuinely important to give him what he wants.
But allowing Dr. Evil to clone himself is bad; it will reduce the utility of all currently existing humans except Dr. Evil.
If a normal, relatively nice but non-philosopher human ascended to godhood, ve would probably ignore Dr. Evil's clones' wishes. Ve would destroy the clones and imprison the doctor, because ve... (read more)
Wait, increase utility according to what utility function? If it's an aggregate utility function where Dr. Evil has 99% weight, then why would that precommitment increase utility?
In a sense, this kind of thing (value drift due to differential reproduction) is already happening. For example, see this article about the changing racial demographics of the USA:
An increasing Latin-American population in the US seems to... (read more)
I just realized there's an easier way to "hack" CEV. Dr. Evil just needs to kill everyone else, or everyone who disagrees with him.
What if the influence is weighted by degree of divergence from the already-scanned minds, something like a reverse PageRank? All Dr. Evils would cluster, and therefore count as bit above-1 vote. Also, this could cover the human-spectrum better, less influenced by cultural factors. I guess this would give outliers much more influence but if outliers are in all directions, would they cancel each other out? What else could go terribly wrong with this?
Imagine if elections worked that way: one party, one vote, so the time cubists would get only slightly less influence than the Democrats. I dunno...
Since Dr. Evil is human, it shouldn't be that bad. Extrapolated volition kicks in, making his current evil intentions irrelevant, possibly even preferring to reverse the voting exploit.
That is not the most inconvenient possible world.
The conservative assumption to make here is that Dr. Evil is, in fact, evil; and that, after sufficient reflection, he is still evil. Perhaps he's brain-damaged; it doesn't matter.
Given that, will he be able to hijack CEV? Probably not, now that the scenario has been pointed out, but what other scenarios might be overlooked?
That's terrible. You're letting in people who are mutated in all sorts of ways through stupid, random, 'natural' processes, but not those who have the power of human intelligence overriding the choices of the blind idiot god. If the extropians/transhumanists make any headway with germline genetic engineering, I want those people in charge.
Ten thousand years later, postkangaroo children learn from their history books about Kutta, the one who has chosen to share the future with his marsupial brothers and sisters :)
I think this can be dealt with in terms of measure. In a series of articles, "Minds, Measure, Substrate and Value" I have been arguing that copies cannot be considered equally, without regard to substrate: We need to take account of measure for a mind, and the way in which the mind is implemented will affect its measure. (Incidentally, some of you argued against the series: After a long delay [years!], I will be releasing Part 4, in a while, which will deal with a lot of these objections.)
Without trying to present the full argument here, the mini... (read more)
What if we used a two-tiered CEV? A CEV applied to a small, hand selected group of moral philosophers could be used to determine weighting rules and ad hoc exceptions to the CEV that runs on all of humanity to determine the utility function of the FAI.
Then when the CEV encounters the trillion Dr Evil uploads, it will consult what the group of moral philosophers would have wanted to handle it if "they knew more, thought faster, were more the people we wished we were, had grown up farther together", which would be be something like weight them together as one person.
And who would select the initial group? Oh, I know! We can make it a 3-tiered system, and have CEV applied to an even smaller group choose the group of moral philosophers!
Wait... my spidey-sense is tingling... I think it's trying to tell me that maybe there's a problem with this plan
Insofar as the Evil clones are distinct individuals, they seem to be almost entirely potentially distinct. They will need to receive more computing resources before they can really diverge into distinct agents.
I would expect CEV to give the clones votes only to the extent that CEV gives votes to potential individuals. But the number of potential clones of normal humans is even greater than Evil's trillion, even accounting for their slightly greater actuality. So, I think that they would still be outvoted.
That's an interesting point, but I'm having trouble seeing it as worthy of a top-level post. Maybe if you had a solution proposed also.
I can see why you might feel that way, if this was just a technical flaw in CEV that can be fixed with a simple patch. But I've been having a growing suspicion that the main philosophical underpinning of CEV, namely preference utilitarianism, is seriously wrong, and this story was meant to offer more evidence in that vein.
Why should anyone choose aggregation of preference over a personal FAI, other than under explicit pressure? Whatever obligations you feel (as part of your preference, as opposed to as part of an imaginary game where you play fair), will be payed in full according to your personal preference. This explicit pressure to include other folks in the mix can only be exerted by those present, and presumably "in the know", so there is no need to include the dead or potential future folk. Whatever sympathy you have for them, you'll have ability to express through the personal FAI. The virtue of laziness in FAI design again (this time, moral laziness).
If that was your point, I wish you had gone into more detail about that in a top-level article.
I just noticed that my old alter ego came up with a very similar "hack" two years ago:
I think it might be possible to patch around this by weighting people by their projected future cycle count. Otherwise, I fear that you may end up with a Repugnant Conclusion even without an adversary -- a very large number of happy emulated people running very slowly would outweigh a smaller number of equally happy people running at human-brain-speed. Of course, this still gives an advantage to the views of those who can afford more computing power, but it's a smaller advantage. And perhaps our CEV would be to at least somewhat equalize the available computing power per person.
Doesn't the "coherent" aspect of "coherent extrapolated volition" imply, generally speaking, that it's not democracy-of-values, so to speak? That is to say, CEV of humanity is supposed to output something that follows on from the extrapolated values of both the guy on the street corner holding a "Death to all fags" sign who's been arrested twice for assaulting gay men outside bars, and the queer fellow walking past him -- if prior to the implementation of a CEV-using FAI the former should successfully mobilize raise the visibi... (read more)
Good post. It seems I missed it, I probably wasn't around that time in June. That is one extreme example of a whole range of problems with blindly implementing a Coherent Extrapolated Volition of that kind.
And then Dr. Evil, forced to compete with 999,999,999,999 copies of himself that all want to rule, is back to square one. Would you see multiplying your competition by 999,999,999,999 as a solution to how to rule the universe? If you were as selfish as Dr. Evil, and intelligent enough to try attempting to take over the universe, wouldn't it occur to you that the copies are all going to want to be the one in charge? Perhaps it won't, but if it did, would you try multiplying your competition, then? If not, then maybe part of the solution to this is makin... (read more)
But of course an AI realizes that satisfying the will of trillion copies of Dr. Evil wasn't what his/her programmers intented.
Pun being, this legendary bad argument is surprisingly strong here. I know, I shouldn't be explaining my jokes.
Of course, the AI realizes that it's programmers did not want it doing what the programmers intended, but what the CEV intended instead, so this response fails completely.