Taken from some old comments of mine that never did get a satisfactory answer.
1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?
2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?
Yes, we do. First, we have an understanding of the mechanisms processes that produced old and modern values, and many of the same mechanisms and processes used for "ought" questions are also used for "is" questions. Our ability to answer "is" questions accurately has improved dramatically, so we know the mechanisms have improved. S... (read more)
If you're openly making a fooming AGI, and if people think you have a realistic chance of success and treat you seriously, then I'm very sure that all major world governments, armies, etc. (including your own) as well as many corporations and individuals will treat you as a supervillain - and it won't matter in the least what your goals might be, CEV or no.
I'm pretty sure that it is not purely progress, that 'drift' plays a big part. I see (current) human values as having three sources.
For the same reason they voluntarily do anything which doesn't perfectly align with their own personal volition. Because they understand that they can accomplish more of their own desires by joining a coalition and cooperating. Even though that means having to work to fulfill other people's desires to the same extent that you work to fulfill your own.
A mad scientist building an AI in his basement doesn't have to compromise with anyone, ... until he has to go out and get funding, that is.
On 2: maybe CEV IS EY's own personal volition :)
More seriously, probably game theoretic reasons. Why would anyone want to work with/fund EY if it was his own volition that was being implemented?
*Disclaimer: I didn't read any other comments, so this might just echo what someone else said
I would vote +10 each for those two questions.
That's exactly my objection to CEV. No-one acts on anything but their personal desires and values, by definition. Eliezer's personal desire might be to implement CEV of humanity (whatever it turns out to be). I believe, however, that for well over 99% of humans this would not be the best possible outcome they might desire. At best it might be a reasonable compromise, but that would depend entirely on what the CEV actually ended up being.
I hadn't seen this before, but it strikes me as irredeemably silly. If we're picking a specific person (or set of people) from antiquity to compare, are we doing so randomly? If so, the results will be horrifying. If not, then we're picking them according to some standard- and why don't we just encapsulate that standard directly?
That is an anthropomorphic representation of the 'values' of a gene allele. It is not the value of actual humans or chimpanzees.
In questions like this, it's very important to keep in mind the difference between state of knowledge about preference (which corresponds to explicitly endorsed moral principles, such as "slavery bad!"; this clearly changed), and preference itself (which we mostly don't understand, even if our minds define what it is). Since FAI needs to operate according to preference, and not out state of knowledge about preference, any changes in our state of knowledge (moral principles) is irrelevant, except for where they have a chance of reflecting changes ... (read more)
If something like Julian Jaynes' notion of a recent historical origin of consciousness from a prior state of bicameralism is true, we might be in trouble there.
More generally, you need to argue that culture is a negligible part of cognitive architecture; I strongly doubt that is the case.
What do you believe about these immutable, universal preferences?
Here are some potential problems I see with these theorized builtin preferences, since we don't know what they actually are yet:
For 1), the sense I got was that it assumes no progress, and furthermore that if you perform an extrapolation that pleases 21st century Americans but would displease Archimedes or any other random Syracusan, your extrapolation-bearing AGI is going to tile the universe with American flags or episodes of Seinfield.
For 2), it feels a No True Scotsman issue. If by some definition of current, personal volition you exclude anything that isn't obviously a current, personal desire by way of deeming it insincere, then you've just made your point tautological. Do yo... (read more)
If Archimedes and the American happen to extrapolate to the same volition, why should that be because the American has values that are a progression from those of Archimedes? It's logically possible that both are about the same distance from their shared extrapolated volition, but they share one because they are both human. Archimedes could even have values that are closer than the American's.
Changes in human values seem to have generally involved expanding the subset of people with moral worth, especially post-enlightenment. This suggests to me that value change isn't random drift, but it's only weak evidence that the changes reflect some inevitable fact of human nature.
I'm still wondering how you'd calculate a CEV. I'm still wondering how you'd calculate one human's volition. Hands up all those who know their own utility function. ... OK, how do you know you've got it right?
No, a complete question to which CEV is an answer needs to be pre-FOOM. All an AI needs to know about morality before it is superintelligent is (1) how to arrive at a CEV-answer by looking at things and doing calculations and (2) how to look at things without breaking them and do calculations without breaking everything else.
This is a great post and some great points are made in discussion too.
Is it possible to make exact models exhibiting some of these intuitive points? For example, there is a debate about whether extrapolated human values would depend strongly on cognitive content or whether they could be inferred just from cognitive architecture. (This could be a case of metamoral relativism, in which the answer simply depends on the method of extrapolation.) Can we come up with simple programs exhibiting this dichotomy, and simple constructive "methods of extrapolati... (read more)
I am honestly not sure what to say to people who ask this question with genuine incredulity, besides (1) "Don't be evil" and (2) "If you think clever arguments exist that would just compel me to be evil, see rule 1."
I don't understand your answer. Let's try again. If "something like CEV" is what you want to implement, then an AI pointed at your volition will derive and implement CEV, so you don't need to specify it in detail beforehand. If CEV isn't what you want to implement, then why are you implementing it? Assume all your altruistic considerations, etc., are already folded into the definition of "you want" - just like a whole lot of other stuff-to-be-inferred is folded into the definition of CEV.
ETA: your "don't be evil" looks like a confusion of levels to me. If you don't want to be evil, there's already a term for that in your volition - no need to add any extra precautions.
The sane answer is that it solves a cooperation problem. ie. People will not kill you for trying it and may instead donate money. As we can see here this is not the position that Eliezer seems to take. He goes for the 'signal naive morality via incomprehension' approach.
You have a personal definition for evil, like everyone else. Many people have definitions of good that include things you see as evil; some of your goals are in conflict. Taking that into account, how can you precommit to implementing the CEV of the whole of humanity when you don't even know for sure what that CEV will evaluate to?
To put this another way: why not extrapolate from you, and maybe from a small group of diverse individuals whom you trust, to get the group's CEV? Why take the CEV of all humanity? Inasmuch as these two CEVs differ, why would you not prefer your own CEV, since it more closely reflects your personal definitions of good and evil?
I don't see how this can be consistent unless you start out with "implementing humanity's CEV" as a toplevel goal, and any divergence from that is slightly evil.
I do not accept the proposition that modern values are superior to ancient values. We're doing better in some regards than the ancients; worse in other regards. To the extent that we've made any progress at all, it's only because the societies that adopted truly terrible moral principles (e.g. communism) failed.
Please clarify: do you think there's some objective external standard or goal, according to which we've been progressing in some areas and regressing in others?
If you're aware of what that goal is, why haven't you adopted it as your personal morals, achieving 100% progression?
If you're not aware of what it is, why do you think it exists and what do you know about it?