Nobel prize winners differ from ordinary folk mostly in their smarts, but CEV already asks what we'd think if we were smarter. I don't see any reason to think doing great science is strongly correlated with moral character, and if you were looking to select for moral character, I'm sure there'd be better Schelling points to aim for.
Most practicality concerns could be addressed by taking a subset through lottery. The argument for Nobel prize winners seems to rest on the programming difficulty point.
I think you underestimate the possibility of serious things going wrong when taking the CEV of a demographically, neurologically and ideologically similar group with unusually large egos.
In other words, the CEV initial dynamic shouldn't be regarded as discovering what a group of people most desire collectively "by definition" - it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.
I am wary of using arguments along the lines of "CEV is better for everyone than CEV". If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.
I would still implement the CEV option but I'd do it for real reasons.
Many other groups of minds, for example the FAI programming team themselves, would of course seem too arbitrary to gather sufficient support for the idea.
Depends what you mean by "sufficient support". The only sufficient support that particularly subgroup needs is stealth.
Our chief weapon is ostracism. Ostracism and disrespect, disrespect and ostracism...our two weapons are ostracism and disrespect...and mockery. Our three weapons...
I favor a diaspora cev. Why compromise between wildly divergent CEV's of subsets if you don't actually have to? In more concrete terms, I'm in favor of holodecking psychopaths.
veil of ignorance
The idea is that you must set up a mechanism that lets the AI itself to draw good specific judgment, so that if you find yourself needing to rely on your own, it might indicate that you failed that necessary requirement, and you need to go back to the drawing board.
Furthermore, desirability of the CEV ouput to the average human in existence today should be weighed against the desires of (for example) sentient human uploads created in a post-singularity scenario.
No, it really shouldn't. CEV is inclusive. That is, if we care about post human sentient uploads then CEV accounts for that better than we can. That's the whole point. (This holds unless 'desirability' is defined to mean some other arbitrary thing independent of volition of the type we are talking about.)
A prerequisite for the creation of superintelligent AI must surely be the acquisition of detailed knowledge of the workings of the human brain.
Surely not "surely".
This is a topic I have recently thought a bit about, although by no means as much as you. I largely agree with your post although I'm not quite sure that nobel laureates (excluding peace) are such a better choice than anything else. The Nobel Prize is not awarded for being a noble person, after all. You don't have to wear a halo. Since selection effects other than intellect might play a significant role here, I wouldn't use this group. A random sample of humankind might be better in that regard. I don't know.
You did a good job with this post and in dealing with the difficult original topic, a topic others may have been shying away from because of its difficulty (that's my guess) - in any case, it shows up here less often than I would expect, for whatever reasons.
Unpacking the concept of "difficult", it seems your writings never suffer from specific defects caused by impatience reading, thinking, or writing; clicking "comment" before finishing, transitioning from reading to writing before understanding, that sort of thing.
...the thing might j
I just got struck by an idea that seems to obvious, to naive, to possibly be true, and which horrified me causing my brain to throw a huge batch or rationalizations at it to stop me from believing something as obviously low status. I'm currently very undecided, but sich it seems like the thing I can't handle on my own I'll just leave a transcript of my uncensored internal monologue here:
I unfortunately lack time at the moment; rather than write a badly-thought-out response to the complete structure of reasoning considered, I will for the moment write fully-thought-out thoughts on minor parts thereof that my (?) mind/curiosity has seized on.
'As for “taking over the world by proxy”, again SUAM applies.': this sentence stands out, but glancing upwards and downwards does not immediately reveal what SUAM refers to. Ctrl+F and looking at all appearances of the term SUAM on the page does not reveal what SUAM refers to. The first page of Goo...
I don't see why the FAI creators would base the CEV on anyone other than themselves except to the extent that they need to do so for political reasons. The result of this would by definition be optimal for the creators.
I argue that the practical difficulty of incorporating all humans into the CEV in the first place is unduly great, and that the programming challenge is also made more difficult by virtue of this choice.
Agreed. IMO, CEV is too silly to be worth much in the way of criticism.
Nobel Laureates are highly abnormal people, and not only in intelligence. I would be rather concerned about what might be expected of us all if our future were based on their CEV.
I don't see why whether a set of people being a Schelling point is relevant; you don't seem to be analyzing FAI design as a coordination game. If you were using it metaphorically, please do not use a technical game theory term. Can you clarify this?
On the other hand, given their high level of civilisation and the quality of character necessary for a person to dedicate his life to science, ceteris paribus I’d be more confident of Nobel Prize winners falling into a niceness attractor in comparison to a universal CEV.
The Nobel prize (minus the peace) is roughly an award for western academic achievement, and is mostly awarded to Ashkenazic Jews (27%, which is nine times what you might expect by population). Those three factors do not add up to strong global agreement. Extrapolating from your favorite...
Personally I have little regard for veil of ignorance arguments, on the basis that there is no such thing as a veil of ignorance. No, I would not want the al-Qaeda programmers to nominate a group of humans (presumably Islamic fanatics) and extrapolate their volition – I would rather they used all of humanity. But so what?
This veil of ignorance is unlike the Rawlsian one. The FAI programmer really is ignorant about features of his morality.
Out of interest, can you give a rough idea of your probability estimate that a functioning superintelligent AI can be created in a reasonable time-scale without our having first gained a detailed understanding of the human brain - i.e. that an superintelligence is built without the designers reverse-engineering an existing intelligence to any significant extent?
Edit: because there is nothing rational about interpreting words like "surely" literally when they are obviously being used in a casual or innocently rhetorical way.
because there is nothing rational about interpreting words like "surely" literally when they are obviously being used in a casual or innocently rhetorical way.
You and Nesov either did not interpret your use of 'surely' (in context) to mean the same thing, or Nesov thought that additional clarification was needed (a statement which you do not seem to agree with). I'm failing to parse your use of the word rational in this context.
Intention: Helpful information. I may not respond to a reply.
Note: I appreciate that at this point CEV is just a sketch. However, it’s an interesting topic and I don’t see that there’s any harm in discussing certain details of the concept as it stands.
1. Summary of CEV
Eliezer Yudkowsky describes CEV - Coherent Extrapolated Volition – here. Superintelligent AI is a powerful genie, and genies can’t be trusted; Friendly AI requires the AI to take as input the entire value computation of at least one human brain, because the failure to take into consideration a relatively small element of the human value set, even whilst optimising in several other respects, is likely to be a disaster. CEV is Yudkowsky’s attempt at outlining a Friendly AI volition-extrapolating dynamic: a process in which the AI takes human brainstates, combines this with its own vast knowledge, and outputs suitable actions to benefit humans.
Note that extrapolating volition is not some esoteric invention of Eliezer’s; it is a normal human behaviour. To use his example: we are extrapolating Fred’s volition (albeit with short distance) if given two boxes A and B only one of which contains a diamond that Fred desires, we give him box B when he has asked us to give him box A, on the basis that he incorrectly believes that box A contains the diamond whereas we know that in fact it is in box B.
Yudkowsky roughly defines certain quantities that are likely to be relevant to the functioning of the CEV dynamic:
Spread describes the case in which the extrapolated volition is unpredictable. Quantum randomness or other computational problems may make it difficult to say with strong confidence (for example) whether person A would like to be given object X tomorrow – if the probability computed is 30%, rather than 0.001%, there is significant spread in this case.
Muddle is a measure of inconsistency. For example person A might resent being given object Y tomorrow, but also resent not being given object Y if it isn’t given to him tomorrow.
Distance measures the degree of separation between one’s current self and the extrapolated self, i.e. how easy it would be to explain a given instance of extrapolated volition to someone. In the case of Fred and the diamond the distance is very short, but superintelligent AI could potentially compute Fred’s extrapolated volition to such a distance that it seems incomprehensible to Fred.
To quote Yudkowsky (I assume that the following remains approximately true today):
As of May 2004, my take on Friendliness is that the initial dynamic should implement the coherent extrapolated volition of humankind.
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
Yudkowsky adds that “it should be easier to counter coherence than to create coherence” where coherence refers to strong, un-muddled and un-spread agreement between multiple individual volitions with no strong disagreement from any others; and that “the initial dynamic for CEV should be conservative about saying ‘yes’ and listen carefully for ‘no’” – the superintelligent optimisation process should seek more consensus before steering humanity into narrow slices of the future, relative to the degree of consensus it needs before steering humanity away from some particular narrow slice of the future (about which it has been warned by elements of the CEV).
CEV is an initial dynamic; it doesn’t necessarily have to be the perfect dynamic of human volition for Friendly AI, but the dynamic should be good enough that it allows the AI to extrapolate an optimal dynamic of volition to which we can then switch over if desirous. “The purpose of CEV as an initial dynamic is not to be the solution, but to ask what solution we want”.
Also, “If our extrapolated volitions say we don’t want our extrapolated volitions manifested, the system replaces itself with something else we want, or else...undergoes an orderly shutdown”.
Finally, Yudkowsky suggests that as a safeguard, a last judge of impeccable judgement could be trusted with putting the seal of approval on the output of the CEV dynamic; if something seems to have gone horribly wrong, beyond mere future shock, he can stop the output from being enacted.
2. CEV of all humankind vs. CEV of a subset of humankind
Let us accept that coherent extrapolated volition, in general, is the best (only?) solution that anyone has provided to the problem of AI friendliness. I can see four ways of implementing a CEV initial dynamic:
As Yudkowsky discusses in his document, whilst the second option might perhaps be a reasonable final dynamic (who knows?) it isn’t a suitable initial dynamic. This is because if there is more than one CEV running, the way in which the CEV dynamic works in a general sense cannot be re-written without someone’s individual CEV being violated, and the idea behind the initial dynamic is that a superior dynamic may develop from it.
The third option is obviously sub-optimal, because of the danger that any individual person might be a psychopath – a person whose values are in general markedly hostile to other humans. Knowing more and thinking smarter might lead a given psychopath’s more humane values to win out, but we can’t count on that. In a larger group of people, the law of large numbers applies and the risk diminishes.
Yudkowsky favours the first option, a CEV of all humankind; I am more in favour of the fourth option, an initial CEV dynamic incorporating the minds of only a certain subset of humans. It would like to compare these two options on six relevant criteria:
I Schelling points [edit: apologies for the questionable use of a game theory term for the sake of concision]
Clearly, incorporating the minds of all humankind into the intial dynamic is a Schelling point – a solution that people would naturally generate for themselves in the absence of any communication. So full marks to a universal CEV on this criterion.
Answer quickly: what specific group of people – be that a group of people who meet each other regularly, or a group who are distinguished in some other way – would you nominate, if you had to choose a certain subset of minds to participate in the initial dynamic?
What springs to my mind is Nobel Prize winners, and I suspect that this too is a Schelling point. This seems like a politically neutral selection of distinguished human beings (particularly if we exclude the Peace Prize) of superlative character and intellect. Whether some people would object strongly to this selection is one question, but certainly I expect that many humans, supposing they were persuaded for other reasons that the most promising initial dynamic is one incorporating a small group of worthy humans only, would consider Nobel Prize winners to be an excellent choice to rally around.
Many other groups of minds, for example the FAI programming team themselves, would of course seem too arbitrary to gather sufficient support for the idea.
II Practicality of implementation
One problem with a universal CEV that I have never seen discussed is how feasible it would actually be to take extremely detailed recordings of the brain states of all of the humans on Earth. All of the challenges involved in creating Friendly AI are of course extreme. But ceteris paribus, one additional extremely challenging problem is one too many.
A prerequisite for the creation of superintelligent AI must surely be the acquisition of detailed knowledge of the workings of the human brain. However, our having the ability to scan one human brain in extreme detail does not imply that it is economically feasible to scan 7 billion or more human brains in the same way. It might well come to pass that the work on FAI is complete, but we still lack the means to actually collect detailed knowledge of all existing human minds. A superintelligent AI would develop its own satisfactory means of gathering information about human brains with minimal disruption, but as I understand the problem we need to input all human minds into the AI before switching it on and using it to do anything for us.
Even if the economic means do exist, consider the social, political and ideological obstacles. How do we deal with people who don’t wish to comply with the procedure?
Furthermore, let us suppose that we manage to incorporate all or almost all human minds into the CEV dynamic. Yudkowsky admits the possibility that the thing might just shut itself down when we run it – and he suggests that we shouldn’t alter the dynamic too many times in an attempt to get it to produce a reasonable-looking output, for fear of prejudicing the dynamic in favour of the programmers’ preferences and away from humanity’s CEV.
It would be one thing if this merely represented the (impeccably well-intentioned) waste of a vast amount of money, and the time of some Nobel Prize winners. But if it also meant that the economic, political and social order of the entire world had been trampled over in the process of incorporating all humans into the CEV, the consequences could be far worse. Enthusiasm for a second round with a new framework at some point in the future might be rather lower in the second scenario than in the first.
III Safety
In his document on CEV, Yudkowsky states that there is “a real possibility” that (in a universal CEV scenario) the majority of the planetary population might not fall into a niceness attractor when their volition is extrapolated.
The small group size of living scientific Nobel Prize winners (or any other likely subset of humans) poses certain problems for a selective CEV that the universal CEV lacks. For example, they might all come under the influence of a single person or ideology that is not conducive to the needs of wider humanity.
On the other hand, given their high level of civilisation and the quality of character necessary for a person to dedicate his life to science, ceteris paribus I’d be more confident of Nobel Prize winners falling into a niceness attractor in comparison to a universal CEV. How much trust are we willing to place in the basic decency of humankind – to what extent is civilisation necessary to create a human who would not be essentially willing to torture innocent beings for his own gratification? Perhaps by the time humanity is technologically advanced enough to implement AGI we’ll know more about that, but at our current state of knowledge I see little reason to give humans in general the benefit of the doubt.
Yudkowsky asks, “Wouldn’t you be terribly ashamed to go down in history as having meddled...because you didn’t trust your fellows?” Personally, I think that shutting up and multiplying requires us to make our best estimate of what is likely to benefit humankind (including future humans) the most, and run with that. I’d not be ashamed if in hindsight my estimate was wrong, since no-one can be blamed for having imperfect knowledge.
IV Aesthetic standards
In his document, Yudkowsky discusses the likelihood of certain volitions cancelling one another out whilst others add together; metaphorically speaking, “love obeys Bose-Einstein statistics while hatred obeys Fermi-Dirac statistics”. This supports the idea that extrapolating volition is likely to produce at least some useful output – i.e. having minimal spread and muddle, ideally at not too far a distance.
In a universal CEV this leads us to believe that Pakistani-Indian mutual hatred, for example, cancels out (particularly since coherence is easier to counter than to create) whereas their mutual preferences form a strong signal.
The problem of aesthetic standards concerns the quality of the signal that might cohere within the CEV. Love seems to be a strong human universal, and so we would expect love to play a strong role in the output of the initial dynamic. On the other hand, consider the difference in intelligence and civilisation between the bulk of humanity and a select group such as Nobel Prize winners. Certain values shared by such a select group, for example the ability to take joy in the merely real, might be lost amidst the noise of the relatively primitive values common to humanity as a whole.
Admittedly, we can expect “knowing more” and “growing up farther together” to improve the quality of human values in general. Once an IQ-80 tribesman gains more knowledge and thinks faster, and is exposed to rational memes, he might well end up in exactly the same place as the Nobel Prize winners. But the question is whether it’s a good idea to rely on a superb implementation of these specifications in an initial dynamic, rather than taking out the insurance policy of starting with substantially refined values in the first place – bearing in mind what is at stake.
A worst case scenario, assuming that other aspects of the FAI implementation work as planned, is that the CEV recommends an ignoble future for humanity – for example orgasmium – which is not evil, but is severely lacking in aesthetic qualities that might have come out of a more selective CEV. Of course, the programmers or the last judge should be able to veto an undesirable output. But if (as Yudkowsky recommends) they only trust themselves to tweak the dynamic a maximum of three times in an effort to improve the output before shutting it off for good if the results are still deemed unsatisfactory, this does not eliminate the problem.
V Obtaining a signal
It seems to me that the more muddle and spread there is within the CEV, the greater the challenge that exists in designing an initial dynamic that outputs anything whatsoever. Using a select group of humans would ensure that these quantities are minimised as far as possible. This is simply because they are likely to be (or can be chosen to be) a relatively homogeneous group of people, who have relatively few directly conflicting goals and possess relatively similar memes.
Again, why make the challenge of FAI even more difficult than it needs to be? Bear in mind that failure to implement Friendly AI increases the likelihood of uFAI being created at some point.
VI Fairness
In his document on CEV, Yudkowsky does go some way to addressing the objections that I have raised. However, I do not find him persuasive on this subject:
Suppose that our coherent extrapolated volition does decide to weight volitions by wisdom and kindness – a suggestion I strongly dislike, for it smacks of disenfranchisement. It don’t think it wise to tell the initial dynamic to look to whichever humans judge themselves as wiser and kinder. And if the programmers define their own criteria of “wisdom” and “kindness” into a dynamic’s search for leaders, that is taking over the world by proxy. You wouldn’t want the al-Qaeda programmers doing that, right?
Firstly, the question of disenfranchisement. As I suggested earlier, this constitutes a refusal to shut up and multiply when dealing with a moral question. “Disenfranchisement” is a drop in the ocean of human joy and human suffering that is at stake when we discuss FAI. As such, it is almost completely irrelevant as an item of importance in itself (of course there are other consequences involved in the choice between universal CEV and a degree of disenfranchisement – but they have been discussed already, and are beside the point of the strictly moral question.) This is especially the case since we are only talking about the initial dynamic here, which may well ultimately develop into a universal CEV.
Secondly, there is the mention of al-Qaeda. In the context of earlier mentions of al-Qaeda programmers in the document on CEV, Yudkowsky appears to be positing a “veil of ignorance” – we should behave in creating the FAI as we would want al-Qaeda programmers to behave. This is strange, because in a similar veil of ignorance problem – the modesty argument – Robin Hanson argued that we should act as though there is a veil of ignorance surrounding whether it is ourselves or someone else who is wrong in some question of fact, whereas Eliezer argued against the idea.
Personally I have little regard for veil of ignorance arguments, on the basis that there is no such thing as a veil of ignorance. No, I would not want the al-Qaeda programmers to nominate a group of humans (presumably Islamic fanatics) and extrapolate their volition – I would rather they used all of humanity. But so what? I am quite happy using my own powers of judgement to decide that al-Qaeda’s group is inferior to humanity as a whole, but Nobel Prize winners (for example) are a better choice than humanity as a whole.
As for “taking over the world by proxy”, again SUAM applies.
3. Conclusion
I argue that a selective CEV incorporating a fairly small number of distinguished human beings may be preferable to a CEV incorporating all of humanity. I argue that the practical difficulty of incorporating all humans into the CEV in the first place is unduly great, and that the programming challenge is also made more difficult by virtue of this choice. I consider any increase in the level of difficulty in the bringing into existence of FAI to be positively dangerous, on account of the fact that this increases the window of time available for unscrupulous programmers to create uFAI.
Setting aside the problem of getting the initial dynamic to work at all, I also consider it to be possible for the output of a selective CEV to be more desirable to the average human than the output of a universal CEV. The initial dynamic is the creation of human programmers, who are fallible in comparison to a superintelligent AI; their best attempt at creating a universal CEV dynamic may lead to the positive values of many humans being discarded, lost in the noise.
In other words, the CEV initial dynamic shouldn't be regarded as discovering what a group of people most desire collectively "by definition" - it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.
Furthermore, desirability of the CEV ouput to the average human in existence today should be weighed against the desires of (for example) sentient human uploads created in a post-singularity scenario. Shutting up and multiplying demands that FAI programmers and other people of influence set aside concerns about being “jerks” when estimating the probability that extrapolating the volition of humanity en masse is the best way of meeting their own moral standards.