In favour of a selective CEV initial dynamic

by [anonymous]

13 min read21st Oct 2011114 comments

16

Note: I appreciate that at this point CEV is just a sketch. However, it’s an interesting topic and I don’t see that there’s any harm in discussing certain details of the concept as it stands.

1. Summary of CEV

Eliezer Yudkowsky describes CEV - Coherent Extrapolated Volition – here. Superintelligent AI is a powerful genie, and genies can’t be trusted; Friendly AI requires the AI to take as input the entire value computation of at least one human brain, because the failure to take into consideration a relatively small element of the human value set, even whilst optimising in several other respects, is likely to be a disaster. CEV is Yudkowsky’s attempt at outlining a Friendly AI volition-extrapolating dynamic: a process in which the AI takes human brainstates, combines this with its own vast knowledge, and outputs suitable actions to benefit humans.

Note that extrapolating volition is not some esoteric invention of Eliezer’s; it is a normal human behaviour. To use his example: we are extrapolating Fred’s volition (albeit with short distance) if given two boxes A and B only one of which contains a diamond that Fred desires, we give him box B when he has asked us to give him box A, on the basis that he incorrectly believes that box A contains the diamond whereas we know that in fact it is in box B.

Yudkowsky roughly defines certain quantities that are likely to be relevant to the functioning of the CEV dynamic:

Spread describes the case in which the extrapolated volition is unpredictable. Quantum randomness or other computational problems may make it difficult to say with strong confidence (for example) whether person A would like to be given object X tomorrow – if the probability computed is 30%, rather than 0.001%, there is significant spread in this case.

Muddle is a measure of inconsistency. For example person A might resent being given object Y tomorrow, but also resent not being given object Y if it isn’t given to him tomorrow.

Distance measures the degree of separation between one’s current self and the extrapolated self, i.e. how easy it would be to explain a given instance of extrapolated volition to someone. In the case of Fred and the diamond the distance is very short, but superintelligent AI could potentially compute Fred’s extrapolated volition to such a distance that it seems incomprehensible to Fred.

To quote Yudkowsky (I assume that the following remains approximately true today):

As of May 2004, my take on Friendliness is that the initial dynamic should implement the coherent extrapolated volition of humankind.

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

Yudkowsky adds that “it should be easier to counter coherence than to create coherence” where coherence refers to strong, un-muddled and un-spread agreement between multiple individual volitions with no strong disagreement from any others; and that “the initial dynamic for CEV should be conservative about saying ‘yes’ and listen carefully for ‘no’” – the superintelligent optimisation process should seek more consensus before steering humanity into narrow slices of the future, relative to the degree of consensus it needs before steering humanity away from some particular narrow slice of the future (about which it has been warned by elements of the CEV).

CEV is an initial dynamic; it doesn’t necessarily have to be the perfect dynamic of human volition for Friendly AI, but the dynamic should be good enough that it allows the AI to extrapolate an optimal dynamic of volition to which we can then switch over if desirous. “The purpose of CEV as an initial dynamic is not to be the solution, but to ask what solution we want”.

Also, “If our extrapolated volitions say we don’t want our extrapolated volitions manifested, the system replaces itself with something else we want, or else...undergoes an orderly shutdown”.

Finally, Yudkowsky suggests that as a safeguard, a last judge of impeccable judgement could be trusted with putting the seal of approval on the output of the CEV dynamic; if something seems to have gone horribly wrong, beyond mere future shock, he can stop the output from being enacted.

2. CEV of all humankind vs. CEV of a subset of humankind

Let us accept that coherent extrapolated volition, in general, is the best (only?) solution that anyone has provided to the problem of AI friendliness. I can see four ways of implementing a CEV initial dynamic:

Implement a single CEV dynamic incorporating all humans, the output of which affects everyone.
Implement an individual CEV dynamic for each individual human.
Implement a single CEV dynamic incorporating one human only, the output of which affects everyone.
Implement a single CEV dynamic incorporating a limited subset of humans, the output of which affects everyone.

As Yudkowsky discusses in his document, whilst the second option might perhaps be a reasonable final dynamic (who knows?) it isn’t a suitable initial dynamic. This is because if there is more than one CEV running, the way in which the CEV dynamic works in a general sense cannot be re-written without someone’s individual CEV being violated, and the idea behind the initial dynamic is that a superior dynamic may develop from it.

The third option is obviously sub-optimal, because of the danger that any individual person might be a psychopath – a person whose values are in general markedly hostile to other humans. Knowing more and thinking smarter might lead a given psychopath’s more humane values to win out, but we can’t count on that. In a larger group of people, the law of large numbers applies and the risk diminishes.

Yudkowsky favours the first option, a CEV of all humankind; I am more in favour of the fourth option, an initial CEV dynamic incorporating the minds of only a certain subset of humans. It would like to compare these two options on six relevant criteria:

I Schelling points [edit: apologies for the questionable use of a game theory term for the sake of concision]

Clearly, incorporating the minds of all humankind into the intial dynamic is a Schelling point – a solution that people would naturally generate for themselves in the absence of any communication. So full marks to a universal CEV on this criterion.

Answer quickly: what specific group of people – be that a group of people who meet each other regularly, or a group who are distinguished in some other way – would you nominate, if you had to choose a certain subset of minds to participate in the initial dynamic?

What springs to my mind is Nobel Prize winners, and I suspect that this too is a Schelling point. This seems like a politically neutral selection of distinguished human beings (particularly if we exclude the Peace Prize) of superlative character and intellect. Whether some people would object strongly to this selection is one question, but certainly I expect that many humans, supposing they were persuaded for other reasons that the most promising initial dynamic is one incorporating a small group of worthy humans only, would consider Nobel Prize winners to be an excellent choice to rally around.

Many other groups of minds, for example the FAI programming team themselves, would of course seem too arbitrary to gather sufficient support for the idea.

II Practicality of implementation

One problem with a universal CEV that I have never seen discussed is how feasible it would actually be to take extremely detailed recordings of the brain states of all of the humans on Earth. All of the challenges involved in creating Friendly AI are of course extreme. But ceteris paribus, one additional extremely challenging problem is one too many.

A prerequisite for the creation of superintelligent AI must surely be the acquisition of detailed knowledge of the workings of the human brain. However, our having the ability to scan one human brain in extreme detail does not imply that it is economically feasible to scan 7 billion or more human brains in the same way. It might well come to pass that the work on FAI is complete, but we still lack the means to actually collect detailed knowledge of all existing human minds. A superintelligent AI would develop its own satisfactory means of gathering information about human brains with minimal disruption, but as I understand the problem we need to input all human minds into the AI before switching it on and using it to do anything for us.

Even if the economic means do exist, consider the social, political and ideological obstacles. How do we deal with people who don’t wish to comply with the procedure?

Furthermore, let us suppose that we manage to incorporate all or almost all human minds into the CEV dynamic. Yudkowsky admits the possibility that the thing might just shut itself down when we run it – and he suggests that we shouldn’t alter the dynamic too many times in an attempt to get it to produce a reasonable-looking output, for fear of prejudicing the dynamic in favour of the programmers’ preferences and away from humanity’s CEV.

It would be one thing if this merely represented the (impeccably well-intentioned) waste of a vast amount of money, and the time of some Nobel Prize winners. But if it also meant that the economic, political and social order of the entire world had been trampled over in the process of incorporating all humans into the CEV, the consequences could be far worse. Enthusiasm for a second round with a new framework at some point in the future might be rather lower in the second scenario than in the first.

III Safety

In his document on CEV, Yudkowsky states that there is “a real possibility” that (in a universal CEV scenario) the majority of the planetary population might not fall into a niceness attractor when their volition is extrapolated.

The small group size of living scientific Nobel Prize winners (or any other likely subset of humans) poses certain problems for a selective CEV that the universal CEV lacks. For example, they might all come under the influence of a single person or ideology that is not conducive to the needs of wider humanity.

On the other hand, given their high level of civilisation and the quality of character necessary for a person to dedicate his life to science, ceteris paribus I’d be more confident of Nobel Prize winners falling into a niceness attractor in comparison to a universal CEV. How much trust are we willing to place in the basic decency of humankind – to what extent is civilisation necessary to create a human who would not be essentially willing to torture innocent beings for his own gratification? Perhaps by the time humanity is technologically advanced enough to implement AGI we’ll know more about that, but at our current state of knowledge I see little reason to give humans in general the benefit of the doubt.

Yudkowsky asks, “Wouldn’t you be terribly ashamed to go down in history as having meddled...because you didn’t trust your fellows?” Personally, I think that shutting up and multiplying requires us to make our best estimate of what is likely to benefit humankind (including future humans) the most, and run with that. I’d not be ashamed if in hindsight my estimate was wrong, since no-one can be blamed for having imperfect knowledge.

IV Aesthetic standards

In his document, Yudkowsky discusses the likelihood of certain volitions cancelling one another out whilst others add together; metaphorically speaking, “love obeys Bose-Einstein statistics while hatred obeys Fermi-Dirac statistics”. This supports the idea that extrapolating volition is likely to produce at least some useful output – i.e. having minimal spread and muddle, ideally at not too far a distance.

In a universal CEV this leads us to believe that Pakistani-Indian mutual hatred, for example, cancels out (particularly since coherence is easier to counter than to create) whereas their mutual preferences form a strong signal.

The problem of aesthetic standards concerns the quality of the signal that might cohere within the CEV. Love seems to be a strong human universal, and so we would expect love to play a strong role in the output of the initial dynamic. On the other hand, consider the difference in intelligence and civilisation between the bulk of humanity and a select group such as Nobel Prize winners. Certain values shared by such a select group, for example the ability to take joy in the merely real, might be lost amidst the noise of the relatively primitive values common to humanity as a whole.

Admittedly, we can expect “knowing more” and “growing up farther together” to improve the quality of human values in general. Once an IQ-80 tribesman gains more knowledge and thinks faster, and is exposed to rational memes, he might well end up in exactly the same place as the Nobel Prize winners. But the question is whether it’s a good idea to rely on a superb implementation of these specifications in an initial dynamic, rather than taking out the insurance policy of starting with substantially refined values in the first place – bearing in mind what is at stake.

A worst case scenario, assuming that other aspects of the FAI implementation work as planned, is that the CEV recommends an ignoble future for humanity – for example orgasmium – which is not evil, but is severely lacking in aesthetic qualities that might have come out of a more selective CEV. Of course, the programmers or the last judge should be able to veto an undesirable output. But if (as Yudkowsky recommends) they only trust themselves to tweak the dynamic a maximum of three times in an effort to improve the output before shutting it off for good if the results are still deemed unsatisfactory, this does not eliminate the problem.

V Obtaining a signal

It seems to me that the more muddle and spread there is within the CEV, the greater the challenge that exists in designing an initial dynamic that outputs anything whatsoever. Using a select group of humans would ensure that these quantities are minimised as far as possible. This is simply because they are likely to be (or can be chosen to be) a relatively homogeneous group of people, who have relatively few directly conflicting goals and possess relatively similar memes.

Again, why make the challenge of FAI even more difficult than it needs to be? Bear in mind that failure to implement Friendly AI increases the likelihood of uFAI being created at some point.

VI Fairness

In his document on CEV, Yudkowsky does go some way to addressing the objections that I have raised. However, I do not find him persuasive on this subject:

Suppose that our coherent extrapolated volition does decide to weight volitions by wisdom and kindness – a suggestion I strongly dislike, for it smacks of disenfranchisement. It don’t think it wise to tell the initial dynamic to look to whichever humans judge themselves as wiser and kinder. And if the programmers define their own criteria of “wisdom” and “kindness” into a dynamic’s search for leaders, that is taking over the world by proxy. You wouldn’t want the al-Qaeda programmers doing that, right?

Firstly, the question of disenfranchisement. As I suggested earlier, this constitutes a refusal to shut up and multiply when dealing with a moral question. “Disenfranchisement” is a drop in the ocean of human joy and human suffering that is at stake when we discuss FAI. As such, it is almost completely irrelevant as an item of importance in itself (of course there are other consequences involved in the choice between universal CEV and a degree of disenfranchisement – but they have been discussed already, and are beside the point of the strictly moral question.) This is especially the case since we are only talking about the initial dynamic here, which may well ultimately develop into a universal CEV.

Secondly, there is the mention of al-Qaeda. In the context of earlier mentions of al-Qaeda programmers in the document on CEV, Yudkowsky appears to be positing a “veil of ignorance” – we should behave in creating the FAI as we would want al-Qaeda programmers to behave. This is strange, because in a similar veil of ignorance problem – the modesty argument – Robin Hanson argued that we should act as though there is a veil of ignorance surrounding whether it is ourselves or someone else who is wrong in some question of fact, whereas Eliezer argued against the idea.

Personally I have little regard for veil of ignorance arguments, on the basis that there is no such thing as a veil of ignorance. No, I would not want the al-Qaeda programmers to nominate a group of humans (presumably Islamic fanatics) and extrapolate their volition – I would rather they used all of humanity. But so what? I am quite happy using my own powers of judgement to decide that al-Qaeda’s group is inferior to humanity as a whole, but Nobel Prize winners (for example) are a better choice than humanity as a whole.

As for “taking over the world by proxy”, again SUAM applies.

3. Conclusion

I argue that a selective CEV incorporating a fairly small number of distinguished human beings may be preferable to a CEV incorporating all of humanity. I argue that the practical difficulty of incorporating all humans into the CEV in the first place is unduly great, and that the programming challenge is also made more difficult by virtue of this choice. I consider any increase in the level of difficulty in the bringing into existence of FAI to be positively dangerous, on account of the fact that this increases the window of time available for unscrupulous programmers to create uFAI.

Setting aside the problem of getting the initial dynamic to work at all, I also consider it to be possible for the output of a selective CEV to be more desirable to the average human than the output of a universal CEV. The initial dynamic is the creation of human programmers, who are fallible in comparison to a superintelligent AI; their best attempt at creating a universal CEV dynamic may lead to the positive values of many humans being discarded, lost in the noise.

In other words, the CEV initial dynamic shouldn't be regarded as discovering what a group of people most desire collectively "by definition" - it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.

Furthermore, desirability of the CEV ouput to the average human in existence today should be weighed against the desires of (for example) sentient human uploads created in a post-singularity scenario. Shutting up and multiplying demands that FAI programmers and other people of influence set aside concerns about being “jerks” when estimating the probability that extrapolating the volition of humanity en masse is the best way of meeting their own moral standards.

Coherent Extrapolated Volition

Personal Blog

16

In favour of a selective CEV initial dynamic

New Comment

114 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:56 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]steven046113y180

Nobel prize winners differ from ordinary folk mostly in their smarts, but CEV already asks what we'd think if we were smarter. I don't see any reason to think doing great science is strongly correlated with moral character, and if you were looking to select for moral character, I'm sure there'd be better Schelling points to aim for.

-10Viliam_Bur13y

[-]Jack13y110

Most practicality concerns could be addressed by taking a subset through lottery. The argument for Nobel prize winners seems to rest on the programming difficulty point.

I think you underestimate the possibility of serious things going wrong when taking the CEV of a demographically, neurologically and ideologically similar group with unusually large egos.

1[anonymous]13y

Problems that would remain are actually counting all of the humans on Earth (there are still uncontacted tribes in existence), and ensuring that they comply. Perhaps we could amend your proposal to state that we could have a lottery in which all humans who wish to be participate are free to apply. A compromise with the idea of a selective CEV would be if a select group of distinguished humans were to be supplemented by random humans chosen via the lottery, or vice versa. Personally, as I said I think that "shut up and multiply" demands the fuzzies derived from being inclusive take a back seat to whatever initial grouping is most likely to produce viable, safe and aesthetically satisfactory output from the CEV dynamic.

5Jack13y

If it were just a matter of fuzzies I would agree with you, but I'm worried about the resulting output being unfriendly to subsets of the world that get left out. Maybe we think the algorithm would identify and extrapolate only the most altruistic desires of the selected individuals-- but if that's the case it is correspondingly unlikely that choosing such a narrow subset would make the programming easier. Edit: This argument over the ideal ratio of enfranchisement to efficiency is an ancient one in political philosophy. I'm willing to accept that it might be impractical to attain full representation-- maybe uncontacted tribes get left out. Rule by the CEV of Nobel prize winners is likely preferable to death but is still suboptimal in the same way that living in a Hobbesian monarchy is worse than living in Rousseau's ideal state.

2[anonymous]13y

In the universal CEV, there is indeed the benefit that no group of humans or individual human (although future humans, e.g. uploads, are a different matter) is without a voice. On the other hand, this only guarantees that the output is not unfriendly to any group or person if the output is very sensitive to the values of single people and small groups. In that case, as I said it seems that the programmers would be more likely to struggle to create a dynamic that actually outputs anything, and if it does output anything it is relatively likely to be disappointing from an aesthetic perspective. That is to say, I don't see the inclusion of everyone in the CEV as providing much guarantee that the output will friendly to everyone, unless the dynamic is so sensitive to individuals who counter coherence that it outputs nothing or almost nothing at all. It seems then that in either case - universal CEV or selective CEV - the benevolence of the output depends on whether knowing more, thinking faster and growing up closer together, the extrapolated values of the humans in question will actually be benevolent towards others. Yudkowksy states that failure to fall into a niceness attractor is a significant possibility, and I am inclined to agree. And it seems to me that to maximise the chances of the CEV output being located in a niceness attractor, we should start from a strong position (humans with nicer-than-average character and great intellect) so we are not relying too much on the programmers having created a totally ideal volition-extrapolating dynamic with perfect implementation of "growing up together" etc.

[-]wedrifid13y100

In other words, the CEV initial dynamic shouldn't be regarded as discovering what a group of people most desire collectively "by definition" - it is imperfect. If a universal CEV implementation is more difficult for human programmers to do well than a selective CEV, then a selective CEV might not only extrapolate the desires of the group in question more accurately, but also do a better job of reflecting the most effectively extrapolated desires of humanity as a whole.

I am wary of using arguments along the lines of "CEV is better for everyone than CEV". If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV then an even remotely competent AI can figure that out itself.

I would still implement the CEV option but I'd do it for real reasons.

1Jack13y

That something you want to say in public?

5wedrifid13y

Yes. I really don't want the volition of psychopaths, suicidal fanatics and jerks in general to be extrapolated in such a way as it could destroy all that I hold dear. Let this be my solemnly sworn testimony made public where all can see. Allow me (wedrifid_2011) to commit to my declaration of my preferences as of the 21st of October by requesting that you quote me, leaving me unable to edit it away.

8arundelo13y

wedrifid wrote: Jack wrote: wedrifid wrote:

4Jack13y

Yes, but now they see you coming.

-1D_Alex13y

You are treading on treacherous moral ground! Your "jerk" may be my best mate (OK, he's a bit intense... but you are no angel either!). Your "suicidal fanatic" may be my hero. As for psychopaths, see this. Also, I can understand "I really don't want the volition of ANYONE to be extrapolated in such a way as it could destroy all that I hold dear" - why pick on psychopaths, suicidal fanatics and jerks in particular?

5wedrifid13y

If so then I don't want your volition extrapolated either. Because that would destroy everything I hold dear as well (given the extent to which you would either care about their dystopic values yourself or care about them getting those same values achieved). I obviously would prefer an FAI to extrapolate only MY volition. Any other preference is a trivial reductio to absurdity. The reason to support the implementation of an FAI that extrapolates more generally is so that I can cooperate with other people whose preferences are not too much different to mine (and in some cases may even resolve to be identical). Cooperative alliances are best formed with people with compatible goals and not those whose success would directly sabotage your own. Do I need to write a post "Giving a few examples does not assert a full specification of a set"? I'm starting to feel the need to have such a post to link to pre-emptively.

0D_Alex13y

You are a jerk! . . . . See where this approach gets us?

0wedrifid13y

Not anywhere closer to understanding how altruism and morality apply to extrapolated volition for a start. Note that the conditions that apply to the quote but that are not included are rather significant. Approximately it is conditional on your volition being to help other agents do catastrophically bad things to the future light cone. What I am confident you do not understand is that excluding wannabe accomplices to Armageddon from the set of agents given to a CEV implementation does not even rule out (or even make unlikely) the resultant outcome taking into consideration all the preferences of those who are not safe to include (and just ignoring the obnoxiously toxic ones).

0D_Alex13y

I barely understand this sentence. Do you mean: Excluding "jerks" from CEV does not guarantee that their destructive preferences will not be included? If so, I totally do not agree with you, as my opinion is: Including "jerks" in CEV will not pose a danger, and saves the trouble of determining who is a "jerk" in the first place. This is based on the observation that "jerks" are a minority, an opinion that "EV-jerks" are practically non-existent, and an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV. If you disagree with any of these, please elaborate, but use a writing style that does not exceed the comprehension abilities of an M. Eng.

1wedrifid13y

I hope you are right. But that is what it is, hope. I cannot know with any confidence that and Artificial Intelligence implementing CEV is Friendly. I cannot know if it will result in me and the people I care about continuing to live. It may result in something that, say, Robin Hanson considers desirable (and I would consider worse than simple extinction.) Declaring CEV to be optimal amounts to saying "I have faith that everyone is allright on the inside and we would all get along if we thought about it a bit more. Bullshit. That's a great belief to have if you want to signal your personal ability to enforce cooperation in your social environment but not a belief that you want actual decision makers to have. Or, at least, not one you want them to simply assume without huge amounts of both theoretical and empirical research. (Here I should again refer you to the additional safeguards Eliezer proposed/speculated on for in case CEV results in Jerkiness. This is the benefit of being able to acknowledge that CEV isn't good by definition. You can plan ahead just in case!) It is primarily a question of understanding (and being willing to understand) the content. You don't know that. Particularly since EV is not currently sufficiently defined to make any absolute claims. EV doesn't magically make people nice or especially cooperative unless you decide to hack in a "make nicer" component to the extrapolation routine. You don't know that either. The 'coherence' part of CEV is even less specified than the EV part. Majority rule is one way of resolving conflicts between competing agents. It isn't the only one. But I don't even know that AI> results in something I would consider Friendly. Again, there is a decent chance that it is not-completely-terrible but that isn't something to count on without thorough research and isn't an ideal to aspire to either. Just something that may need to be compromised down to.

0lessdazed13y

One possibility is the one inclined to shut down rather than do anything not neutral or better from every perspective. This system is pretty likely useless, but likely to be safe too, and not certainly useless. Variants allow some negatives, but I don't know how one would draw a line - allowing everyone a veto and requiring negotiation with them would be pretty safe, but also nearly useless. I'm not sure exactly what you're implying so I'll state something you may or may not agree with. It seems likely it makes people more cooperative in some areas, and has unknown implications in other areas, so as to whether it makes them ultimately more or less cooperative, that is unknown. But the little we can see is of cooperation increasing, and it would be unreasonable to be greatly surprised in the event that were found to be the overwhelming net effect. As most possible minds don't care about humans, I object to using "unfriendly" to mean "an AI that would result in a world that I don't value." I think it better to use "unfriendly" to mean those minds indifferent to humans and the few hateful ones. Those that have value according to many but not all, such as perhaps those that seriously threaten to torture people, but only when they know those threatened will buckle, are better thought of as being a subspecies of Friendly AI.

0wedrifid13y

I disagree. I will never refer to anything that wants to kill or torture me as friendly. Because that would be insane. AIs that are friendly to certain other people but not to me are instances of uFAIs in the same way that paperclippers are uFAIs (that are Friendly to paperclips). I incidentally also reject FAI and FAI. Although in the latter case I would still choose it as an alternative to nothing (which likely defaults to extinction). Mind you the nomenclature isn't really sufficient to the task either way. I prefer to make my meaning clear of ambiguities. So if talking about "Friendly" AI that will kill me I tend to use the quotes that I just used while if I am talking about something that is Friendly to a specific group I'll parameterize.

0lessdazed13y

OK - this is included under what I would suggest to call "Friendly", certainly if it only wanted to do so instrumentally, so we have a genuine disagreement. This is a good example for you to raise, as most even here might agree with how you put that. Nonetheless, my example is not included under this, so let's be sure not to talk past each other. It was intended to be a moderate case, one in which you might not call something friendly when many others here would* - one in which a being wouldn't desire to torture you, and would be bluffing if only in the sense that it had scrupulously avoided possible futures in which anyone would be tortured, if not in other senses (i.e. it actually would torture you, if you chose the way you won't). As for not killing you, that sounds like an obviously badly phrased genie wish. As a similar point to the one you expressed would be reasonable and fully contrast with mine I'm surprised you added that. One can go either way (or other or both ways) on this labeling. I am apparently buying into the mind-projection fallacy and trying to use "Friendly" the way terms like "funny" or "wrong" are regularly used in English. If every human but me "finds something funny", it's often least confusing to say it's "a funny thing that isn't funny to me" or "something everyone else considers wrong that I don't consider "wrong" (according to the simplest way of dividing concept-space) that is also advantageous for me". You favor taking this new term and avoiding using the MPF, unlike for other English terms, and having it be understood that listeners are never to infer meaning as if the speaker was committing it, I favor just using it like any other term. So: My way, a being that wanted to do well by some humans and not others would be objectively both Friendly and Unfriendly, so that might be enough to make my usage inferior. But if my molecules are made out of usefulonium, and no one else's are, I very much mind a being exploiting me for that, b

0lessdazed13y

Through me, my dog is included. All the more so mothers' sons! I don't think this is true, the safeguard that's safe is to shut down if a conflict exists. That way, either things are simply better or no worse; judging between cases when each case has some advantages over the other is tricky.

0lessdazed13y

How? As is, psychopaths have some influence, and I don't consider the world worthless. Whatever their slice of a much larger pie, how would that be a difference in kind, something other than a lost opportunity?

4wedrifid13y

There is a reasonable good chance that when averaged out by the currently unspecified method used by the CEV process that any abominable volitions are offset by volitions that are at least vaguely acceptable. But that doesn't mean including Jerks (where 'Jerk' is defined as agents whose extrapolated volitions are deprecated) in the process that determines the fate of the universe is The Right Thing To Do any more than including paperclippers, superhappies and babyeaters in the process is obviously The Right Thing To Do. CEV might turn out OK. Given the choice of setting loose a {Superintelligence Optimising CEV} or {Nothing At All nothing at all and we all go extinct} I'll choose the former. There are also obvious political reasons why such a compromise might be necessary. If anyone thinks that CEV is not a worse thing to set loose than CEV then they are not being altruistic or moral they are being confused about a matter of fact. Disclaimer that is becoming almost mandatory in this kind of discussion: altruism, ethics and morality belong inside utility functions and volitions not in game theory or abstract optimisation processes.

0lessdazed13y

Sure, inclusion is a thing that causes good and bad outcomes, and not necessarily net good outcomes. Sure, but it's not logically necessary that it's a compromise, though it might be. It might be that the good outweighs the bad, or not, I'm not sure from where I stand. Because I value inclusiveness more than zero, that's not necessarily true. It's probably true, or better yet, if one includes the best of the obvious Jerks with the rest of humanity, it's quite probably true. All else equal, I'd rather an individual be in than out, so if someone is all else equal worse than useless but only light ballast, having them is a net good. It's Adam and Eve, not Adam and Vilfredo Pareto!

0wedrifid13y

Huh? Chewbacca?

0lessdazed13y

I think your distinction is artificial, can you use it to show how an example question is a wrong question and another isn't, and show how your distinction sorts among those two types well?

0wedrifid13y

Your Adam and and Eve reply made absolutely no sense and this question makes only slightly more. I cannot relate what you are saying to the disclaimer that you partially quote (except one way that implies you don't understand the subject matter - which I prefer not to assume). I cannot answer a question about what I am saying when I cannot see how on earth it is relevant.

-1D_Alex13y

You missed my point 3 times out of 3. Wait, I'll put down the flyswatter and pick up this hammer...: Excluding certain persons from CEV creates issues that CEV was intended to resolve in the first place. The mechanic you suggest - excluding persons that YOU deem to be unfit - might look attractive to you, but it will not be universally acceptable. Note that "our coherent extrapolated volition is our wish if we knew more, were smarter..." etc . The EVs of yourself and that suicidal fanatic should be pretty well aligned - you both probably value freedom, justice, friendship, security and like good food, sex and World of Warcraft(1)... you just don't know why he believes that suicidal fanaticism is the right way under his circumstances, and he is, perhaps, not smart enough to see other options to strive for his values. Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section? They deal with the instinctive discomfort of including everyone in the CEV. (1) that was a backhand with the flyswatter, which I grabbed with my left hand just then.

8wedrifid13y

No. I will NOT assume that extrapolating the volition of people with vastly different preferences to me will magically make them compatible with mine. The universe is just not that convenient. Pretending it is while implementing a FAI is suicidally naive. I'm familiar with the document, as well as approximately everything else said on the subject here, even in passing. This includes Eliezer propozing ad-hoc work arounds to the "What if people are jerks?" problem.

-6D_Alex13y

0lessdazed13y

What do you mean? As an analogy, .01% sure and 99.99% sure are both states of uncertainty. EVs are exactly the same or they aren't. If someone's unmuddled EV is different than mine - and it will be - I am better off with mine influencing the future alone rather than the future being influenced by both of us, unless my EV sufficiently values that person's participation. My current EV places some non-infinite value on each person's participation. You can assume for the sake of argument each person's EV would more greatly value this. You can correctly assume that for each person, all else equal, I'd rather have them than not, (though not necessarily at the cost of having the universe diverted from my wishes) but I don't really see why the death of most of the single ring species that is everything alive today makes selecting humans alone for CEV the right thing to do in a way that avoids the problem of excluding the disenfranchised whom the creators don't care sufficiently about. If enough humans value what other humans want, and more so when extrapolated, it's an interlocking enough network to scoop up all humans but the biologist who spends all day with chimpanzees (dolphins, octopuses, dogs, whatever) is going to be a bit disappointed by the first-order exclusion of his or her friends from consideration.

-4D_Alex13y

I mean, once they both take pains to understand each other's situation and have a good, long think about it, they would find that they will agree on the big issues and be able to easily accommodate their differences. I even suspect that overall they would value the fact that certain differences exist. EVs can, of course, be exactly the same, or differ to some degree. But - provided we restrict ourselves to humans - the basic human needs and wants are really quite consistent across an overwhelming majority. There is enough material (on the web and in print) to support this. Wedrifid (IMO) is making a mistake of confusing some situation dependent subgoals (like say "obliterate Israel" or "my way or the highway") with high level goals. I have not thought about extending CEV beyond human species, apart from taking into account the wishes of your example biologists etc. I suspect it would not work, because extrapolating wishes of "simpler" creatures would be impossible. See http://xkcd.com/605/.

1wedrifid13y

You are mistaken. That I entertain no such confusion should be overwhelmingly clear from reading nearby comments.

1TheOtherDave13y

That sounds awfully convenient. If there really is a threshold of how "non-simple" a lifeform has to be to have coherently extrapolatable volitions, do you have any particular evidence that humans clear that threshold and, say, dolphins don't? For my part, I suspect strongly that any technique that arrives reliably at anything that even remotely approximates CEV for a human can also be used reliably on many other species. I can't imagine what that technique would be, though. (Just for clarity: that's not to say one has to take other species' volition into account, any more than one has to take other individuals' volition into account.)

2D_Alex13y

The lack of threshold is exactly the issue. If you include dolphins and chimpanzees, explicitly, you'd be in a position to apply the same reasoning to include parrots and dogs, then rodents and octopi, etc, etc. Eventually you'll slide far enough down this slippery slope to reach caterpillars and parasitic wasps. Now, what would a wasp want to do, if it understood how its acts affect the other creatures worthy of inclusion in the CEV? This is what I see as the difficulty in extrapolating the wishes of simpler creatures. Perhaps in fact there is a coherent solution, but having only thought about this a little, I suspect there might not be one.

1lessdazed13y

We don't have to care. If everyone or nearly all were convinced that something less than 20 pounds had no moral value, or a person less than 40 days old, or whatever, that would be that. Also, as some infinite sums have finite limits, I do not think that small things necessarily make summing humans' or the Earth's morality impossible.

0TheOtherDave13y

Ah, OK. Sure, if your concern is that, if we extrapolated the volition of such creatures, we would find that they don't cohere, I'm with you. I have similar concerns about humans, actually. I'd thought you were saying that we'd be unable to extrapolate it in the first place, which is a different problem.

-2pedanterrific13y

Just, uh... just making sure: you do know that wedrifid has more fourteen thousand karma for a reason, right? It's actually not solely because he's an oldtimer, he can be counted on to have thought about this stuff pretty thoroughly. Edit: I'm not saying "defer to him because he has high status", I'm saying "this is strong evidence that he is not an idiot."

0D_Alex13y

I admit to being a little embarrassed as I wrote that paragraph, because this sort of thing can come across as "fuck you". Not my intent at all, just that the reference is relevant, well written, supports my point - and is too long to quote. Having said that, your comment is pretty stupid. Yes, he has heaps more karma here - so what? I have more karma here than R. Dawkins and B. Obama combined!

6pedanterrific13y

(I prefer "Godspeed!") The "so what" is, he's already read it. Also, he's, you know, smart. A bit abrasive (or more than a bit), but still. He's not going to go "You know, you're right! I never thought about it that way, what a fool I've been!" Edit: Discussed here.

1wedrifid13y

I suppose "ethical egoism" fits. But only in some completely subverted "inclusive ethical egoist" sense in which my own "self interest" already takes into account all my altruistic moral and ethical values. ie. I'm basically not an ethical egoist at all. I just put my ethics inside the utility function where they belong.

2pedanterrific13y

Duly noted! (I apologize for misconstruing you, also.)

0lessdazed13y

I'm not sure this can mean one thing that is also important.

1wedrifid13y

Huh? Yes it can. It means "results in something closer to CEV than the alternative does", which is pretty damn important given that it is exactly what the context was talking about.

0lessdazed13y

I agree that context alone pointed to that interpretation, but as that makes your statement a tautology, I thought it more likely than not you were referencing a more general meaning than the one under discussion. This was particularly so because of the connotations of "wary", i.e. "this sort of argument tends to seem more persuasive than it should, but the outside view doesn't rule them out entirely," rather than "arguments of this form are always wrong because they are logically inconsistent".

2wedrifid13y

Because Phlebas's argument is not, in fact, tautologically false and is merely blatantly false I chose to refrain from a (false) accusation of inconsistency.

0[anonymous]13y

Here is the post that you linked to, in which you ostensibly prove that an excerpt of my essay was “blatantly false”: Phlebas: wedrifid: Note that I have made no particular claim about how likely it is that the selective CEV will closer to the ideal CEV of humanity than the universal CEV. I merely claimed that it is not what they most desire collectively “by definition”, i.e. it is not logically necessary that it approximates the ideal human-wide CEV (such as a superintelligence might develop) better than the selective CEV. [Here] is a comment claiming that CEV most accurately identifies a group’s average desires “by definition” (assuming he doesn’t edit it). So it is not a strawman position that I am criticising in that excerpt. You argue that even given a suboptimal initial dynamic, the superintelligent AI “can” figure out for a better dynamic and implement that instead. Well of course it “can” – nowhere have I denied that the universal CEV might (with strong likelihood in fact) ultimately produce at least as close an approximation to the ideal CEV of humaity as a selective CEV would. Nonetheless, high probability =/= logical necessity. Therefore you may wish to revisit your accusation of blatant fallacy. If you are going to use insults, please back them up with a detailed, watertight argument. How probable exactly is an interesting question, but I shan't discuss that in this comment since I don't wish to muddy the waters regarding the nature of the original statement that you were criticising.

-2[anonymous]13y

Here is the post that you linked to, in which you ostensibly prove that an excerpt of my essay was blatantly false: Phlebas: wedrifid: Note that I have made no particular claim in this excerpt about how likely it is that a selective CEV would produce output closer to that of an ideal universal CEV dynamic than a universal CEV would. I merely claimed that a universal CEV dynamic designed by humans is not what humans most desire collectively “by definition”, i.e. it is not logically necessary that it approximates the ideal human-wide CEV (such as a superintelligence might develop) better than the selective CEV. Here is a comment claiming that CEV most accurately identifies a group’s average desires “by definition” (assuming he doesn’t edit it). So it is not a strawman position that I am criticising in that excerpt. You argue that even given a suboptimal initial dynamic, the superintelligent AI “can” figure out a better dynamic and implement that instead. Well of course it “can” – nowhere have I denied that the universal CEV might (with strong likelihood in fact) ultimately produce at least as close an approximation to the ideal CEV of humanity as a selective CEV would. Nonetheless, high probability =/= logical necessity. Therefore you may wish to revisit your accusation of blatant fallacy. How probable exactly is an interesting question, but best left alone in this comment since I don't wish to muddy the waters regarding the nature of the original statement that you were criticising.

-4wedrifid13y

-2[anonymous]13y

The point being that actually, it is worthwhile to point out simply that it is not a logical necessity - because people actually believe that. Once that is accepted, it clears the way for discussion of the actual probability that the AI does such a good job. Therefore there is not one thing wrong with the excerpt that you quoted (and if you have a problem with another part of the essay, you should at least point out where the fallacy is).

-3[anonymous]13y

To address the question of the likelihood of the AI patching things up itself: How much trust do we put in human programmers? In one instance, they would have to create a dynamic that can apply transformations to Nobel laureates; in the other, they must create a dynamic that can apply transformations to a massive number of mutually antagonistic, primitive, low-IQ and superstitious minds. Furthermore, although speculation about the details of the implementation becomes necessary, using a small group of minds the programmers could learn about these minds in vast detail, specifically identifying any particular problems and conducting tests and trials, whereas with 7 billion or more minds this is impossible. The initial dynamic is supposed to be capable of generating an improved dynamic. On the other hand, there are certain things the AI can’t help with. The AI does have vast knowledge of its own, but the programmers have specified the way in which the AI is to “increase knowledge” and so forth of the humans in the first place. This is the distinction wedrifid seems to have missed. If this specification is lousy in the first place, then the output that the AI extracts from extrapolating the volition of humanity might be some way off the mark, in comparison to the ouput if “increasing knowledge” etc. was done in an ideal fashion. The AI may then go on to implement a new CEV dynamic – but this might be a lousy equilibrium generated by an original poor implementation of transforming the volition of humanity, and this poor reflection of human volition is down to the abilities of the human programmers. On the other hand, it might take a suboptimal initial dynamic (with suboptimal specifications of “increase knowledge”, “grow up closer together etc.), and manage to locate the ideal dynamic. What I dispute is that this is “blatantly” obvious. That is (motivated) overconfidence regarding a scenario that is purely theoretical, and very vague at this point. And I certainly

[-]wedrifid13y80

Many other groups of minds, for example the FAI programming team themselves, would of course seem too arbitrary to gather sufficient support for the idea.

Depends what you mean by "sufficient support". The only sufficient support that particularly subgroup needs is stealth.

9Eliezer Yudkowsky13y

I wouldn't go along with it. Marcello wouldn't go along with it. Jaan Tallinn and Peter Thiel might or might not fund it, but probably not. I'm not saying this couldn't exist, just that it would have neither the funding nor the people that it currently does.

7wedrifid13y

Unfortunately you don't currently have either the funding or the people to create an FAI of any kind, selfish or otherwise.

3TheOtherDave13y

Or speed.

0MatthewBaker13y

I may be generalizing from fictional evidence here, but isn't this exactly what Prime Intellect writes to instill in us? (I still don't know why it was a problem to restart the aliens if you kept them in a similar universe to ours caught in a small enough simulation to monitor.)

0Jack13y

Or force.

3wedrifid13y

Or ostracism, disrespect and mockery.

[-]lessdazed13y130

Our chief weapon is ostracism. Ostracism and disrespect, disrespect and ostracism...our two weapons are ostracism and disrespect...and mockery. Our three weapons...

0Jayson_Virissimo13y

Or luck.

0[anonymous]13y

Or power, or momentum, or mass...

[-]nazgulnarsil13y70

I favor a diaspora cev. Why compromise between wildly divergent CEV's of subsets if you don't actually have to? In more concrete terms, I'm in favor of holodecking psychopaths.

[-]Vladimir_Nesov13y70

veil of ignorance

The idea is that you must set up a mechanism that lets the AI itself to draw good specific judgment, so that if you find yourself needing to rely on your own, it might indicate that you failed that necessary requirement, and you need to go back to the drawing board.

[-]wedrifid13y70

Furthermore, desirability of the CEV ouput to the average human in existence today should be weighed against the desires of (for example) sentient human uploads created in a post-singularity scenario.

No, it really shouldn't. CEV is inclusive. That is, if we care about post human sentient uploads then CEV accounts for that better than we can. That's the whole point. (This holds unless 'desirability' is defined to mean some other arbitrary thing independent of volition of the type we are talking about.)

4[anonymous]13y

That depends who "we" is referring to. What I meant to say is that if the FAI programmers and other intellectuals believe that extrapolated humans in general might not care about harming uploads (future humans with no voice in that CEV) - whereas a selective CEV of intellectuals is expected to do so - then they should consider this when deciding whether to allow all of humanity to have a say in the CEV, rather than a subset of minds whom they consider to be safer in that regard. So even if the implementation of a universal CEV as initial dynamic can be expected to reflect the desires of humanity en masse better than a selective initial CEV, this doesn't define the total moral space that should be of concern to those responsible for and having the power to influence that implementation.

[-]Vladimir_Nesov13y60

A prerequisite for the creation of superintelligent AI must surely be the acquisition of detailed knowledge of the workings of the human brain.

Surely not "surely".

-2[anonymous]13y

If not, then the particular point I was making is strengthened.

3Vladimir_Nesov13y

I care not which position a flaw supports, and this one seems like grievous overconfidence.

-2[anonymous]13y

Out of interest, can you give a rough idea of your probability estimate that a functioning superintelligent AI can be created in a reasonable time-scale without our having first gained a detailed understanding of the human brain - i.e. that an superintelligence is built without the designers reverse-engineering an existing intelligence to any significant extent? Edit: because there is nothing rational about interpreting words like "surely" literally when they are obviously being used in a casual or innocently rhetorical way.

1Dorikka13y

You and Nesov either did not interpret your use of 'surely' (in context) to mean the same thing, or Nesov thought that additional clarification was needed (a statement which you do not seem to agree with). I'm failing to parse your use of the word rational in this context. Intention: Helpful information. I may not respond to a reply.

1[anonymous]13y

If Nesov thought that additional clarification was necessary, he could have said so. But actually he simply criticised the use of the word "surely". I consider pedantry to be a good thing. On the other hand, it is at least polite to be charitable in interpreting someone, particularly when the nitpick in question is basically irrelevant to main thrust of the argument. "Surely" is just a word. Literally it means 100% or ~100% probability, but sometimes it just sounds good or it is used sloppily. If I had to give a number, I'd have said 95% probability that superintelligent AI won't be developed before we learn about the human brain in detail. I'm highly amenable to criticism of that estimate from people who know more about the subject, but since my politely phrased request for Nesov's own estimate was downvoted I decided that this kind of uncharitableness has more to do with status than constructive debate. As such it is not rational. I retracted the comment on the basis that it was a little petulant but since you asked, there is my explanation.

[-]TrE13y60

This is a topic I have recently thought a bit about, although by no means as much as you. I largely agree with your post although I'm not quite sure that nobel laureates (excluding peace) are such a better choice than anything else. The Nobel Prize is not awarded for being a noble person, after all. You don't have to wear a halo. Since selection effects other than intellect might play a significant role here, I wouldn't use this group. A random sample of humankind might be better in that regard. I don't know.

3[anonymous]13y

My main reason for mentioning Nobel laureates is that it jumped out to me as an obvious group of people who might be suitable, the first time the thought entered my head that the CEV initial dynamic might be selective. Insofar as I am similar to other people, this suggests to me that said group is a Schelling point, which means that it wouldn't seem like a completely arbitrary choice of group (therefore leading to intractable disagreement about the choice). The Nobel laureates aren't angels. Still less so is the average human being an angel. The perfect is the enemy of the good. Your point about unwanted selection effects on the other hand is to be taken seriously.

[-]lessdazed13y50

You did a good job with this post and in dealing with the difficult original topic, a topic others may have been shying away from because of its difficulty (that's my guess) - in any case, it shows up here less often than I would expect, for whatever reasons.

Unpacking the concept of "difficult", it seems your writings never suffer from specific defects caused by impatience reading, thinking, or writing; clicking "comment" before finishing, transitioning from reading to writing before understanding, that sort of thing.

the thing might j

... (read more)

0[anonymous]13y

Thanks! That is certainly true (the fact that only the output of the dynamic is overwhelmingly important is what I was getting at with those excessive mentions of “shut up and multiply”). But if the implementation of the initial dynamic is less than ideal, then there may not be perfect independence between the output and any strong ideological strain or value that someone managed to implant in a relatively small group of people. Knowing more, thinking faster and growing up closer together may very well render such a problem completely irrelevant, but that doesn’t constitute a reason to assume that these specifications will do so given their implementation by human programmers. This is a point in favour of the universal CEV, but similar considerations apply in favour of the selective CEV (as I attempted to show in my essay). I’ve retracted that comment. I suppose that the careful use of probabilistic terms is important in general.

[-]Armok_GoB12y40

I just got struck by an idea that seems to obvious, to naive, to possibly be true, and which horrified me causing my brain to throw a huge batch or rationalizations at it to stop me from believing something as obviously low status. I'm currently very undecided, but sich it seems like the thing I can't handle on my own I'll just leave a transcript of my uncensored internal monologue here:

What volition do I want to extrapolate
MY one, tautologically
But Eliezer, the great leader who is way, way way smarter than you said you shouldn't and thinking that was

... (read more)

2TheOtherDave12y

Leaving all the in-group/out-group anxiety aside, and assuming I were actually in a position where I get to choose whose volition to extrapolate, there's three options: ...humanity's extrapolated volition is inconsistent with mine (in which case I get less of what I want by using humanity's judgement rather than my own), ...HEV is consistent with, but different from, mine (in which case I get everything I want either way), or ...HEV is identical to mine (in which case I get everything I want either way). So HEV <= mine. That said, others more reliably get more of what they want using HEV than using mine, which potentially makes it easier to obtain their cooperation if they think I'm going to use HEV. So I should convince them of that.

0Armok_GoB12y

But they'd prefer just the CEV of you two to the one of all humanity, and the same goes for each single human who'd raise that objection. The end result is the CEV of you+everyone hat could have stopped you. And this dosn't need handling before you make it either: I'm pretty sure it arises naturally from TDT if you implement your own and were only able to do so because you used this argument on a bunch of people.

[-]Multipartite13y40

I unfortunately lack time at the moment; rather than write a badly-thought-out response to the complete structure of reasoning considered, I will for the moment write fully-thought-out thoughts on minor parts thereof that my (?) mind/curiosity has seized on.

'As for “taking over the world by proxy”, again SUAM applies.': this sentence stands out, but glancing upwards and downwards does not immediately reveal what SUAM refers to. Ctrl+F and looking at all appearances of the term SUAM on the page does not reveal what SUAM refers to. The first page of Goo... (read more)

5Stuart_Armstrong13y

SUAM = shut up and multiply

2Multipartite13y

Ahh. Thank you! I was then very likely at fault on that point, being familiar with the phrase yet not recognising the acronym.

1[anonymous]13y

My brief recapitulation of Yudkowsky’s diamond example (which you can read in full in his CEV document) probably misled you a little bit. I expect that you would find Yudkowsky’s more thorough exposition of “extrapolating volition” somewhat more persuasive. He also warns about the obvious moral hazard involved in mere humans claiming to have extrapolated someone else’s volition out to significant distances – it would be quite proper for you to be alarmed about that! Taken to the extreme this belief would imply that every time you gain some knowledge, improve your logical abilities or are exposed to new memes, you are changed into a different person. I’m sure you don’t believe that – this is where the concept of “distance” comes into play: extrapolating to short distance (as in the diamond example) allows you to feel that the extrapolated version of yourself is still you, but medium or long distance extrapolation might cause you to see the extrapolated self as alien. It seems to me that whether a given extrapolation of you is still “you” is just a matter of definition. As such it is orthogonal to the question of the choice of CEV as an AI Friendliness proposal. If we accept that an FAI must take as input multiple human value sets in order for it to be safe – I think that Yudkowsky is very persuasive on this point in the sequences – then there has to be a way of getting useful output from those value sets. Since our existing value computations are inconsistent in themselves, let alone with each other the AI has to perform some kind of transformations to cohere a useful signal from this input – this screens off any question of whether we’d be happy to run with our existing values (although I’d certainly choose the extrapolated volition in any case). “Knowing more”, “thinking faster”, “growing up closer together” and so on seem like the optimal transformations for it to perform. Short-distance extrapolations are unlikely to get the job done, therefore medium or long-d

0Multipartite13y

Diamond: Ahh. I note that looking at the equivalent diamond section, 'advise Fred to ask for box B instead' (hopefully including the explanation of one's knowledge of the presence of the desired diamond) is a notably potentially-helpful action, compared to the other listed options which can be variably undesirable. ---------------------------------------- Varying priorities: That I change over time is an accepted aspect of existence. There is uncertainty, granted; on the one hand I don't want to make decisions that a later self would be unable to reverse and might disapprove of, but on the other hand I am willing to sacrifice the happiness of a hypothetical future self for the happiness of my current self (and different hypothetical future selves)... hm, I should read more before I write more, as otherwise redundancy is likely. (Given that my priorities could shift in various ways, one might argue that I would prefer something to act on what I currently definitely want, rather than on what I might or might not want in the future (yet definitely do not want (/want not to be done) /now/). An issue of possible oppression of the existing for the sake of the non-existant... hm.) To check, does 'in order for it to be safe' refer to 'safe from the perspectives of multiple humans', compared to 'safe from the perspective of the value-set source/s'? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly. Another example that comes to mind regarding a conflict of priorities: 'If your brain was this much more advanced, you would find this particular type of art the most sublime thing you'd ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can't appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.' ---------------------------------------- Digression: If such an

4[anonymous]13y

Both. I meant, in order for the AI not to (very probably) paperclip us. Our (or someone else’s) volitions are extrapolated in the initial dynamic. The output of this CEV may recommend that we ourselves are actually transformed in this or that way. However, extrapolating volition does not imply that the output is not for our own benefit! Speaking in a very loose sense for the sake of clarity: “If you were smarter, looking at the real world from the outside what actions would you want taking in the real world?” is the essential question – and the real world is one in which the humans that exist are not themselves coherently-extrapolated beings. The question is not “If a smarter you existed in the real world, what actions would it want taking in the real world?” See the difference? Hopefully the AI’s simulations of people are not sentient! It may be necessary for the AI to reduce the accuracy of its computations, in order to ensure that this is not the case. Again, Eliezer discusses this in the document on CEV which I would encourage you to read if you are interested in the subject.

1Multipartite13y

CEV document: I have at this point somewhat looked at it, but indeed I should ideally find time to read through it and think through it more thoroughly. I am aware that the sorts of questions I think of have very likely already been thought of by those who have spent many more hours thinking about the subject than I have, and am grateful that the time has been taken to answer ths specific thoughts that come to mind as initial reactions. ---------------------------------------- Reaction to the difference-showing example (simplified by the assumption that a sapient smarter-me is assumed to not exist in any form), in two examples: Case 1: I hypothetically want enough money to live in luxury (and achieve various other goals) without effort (and hypothetically lack the mental ability to bring this about easily). Extrapolated, a smarter me looking at this real world from the outside would be a separate entity from me, have nothing in particular to gain from making my life easier in such a way, and so not take actions in my interests. Case 2: A smarter-me watching the world from outside may hold a significantly different aesthetic sense than the normal me in the world, and may act to rearrange the world in such a way as to be most pleasing to that me watching from outside. This being done, in theory resulting in great satisfaction and pleasure of the watcher, the problem remains that the watcher does not in fact exist to appreciate what has been done, and the only sapient entities involved are the humans which have been meddled with for reasons which they presumably do not understand, are not happy about, and plausibly are not benefited by. I note that a lot in fact hinges on the hypothetical benevolence of the smarter-me, and the assumption/hope/trust that it would after all not act in particularly negative ways toward the existant humans, but given a certain degree of selfishness one can probably assume a range of hopefully-at-worst-neutral significant actions which

0Multipartite13y

Reading other comments, I note my thoughts on the undesirability of extrapolation have largely been addressed elsewhere already. ---------------------------------------- Current thoughts on giving higher preference to a subset: Though one would be happy with a world reworked to fit one's personal system of values, others likely would not be. Though selected others would be happy with a world reworked to fit their agreed system of values, others likely would not be. Moreover, assuming changes over time, even if such is held to a certain degree at one point in time, changes based on that may turn out to be regrettable. Given that one's own position (and those of any other subset) are liable to be riddled with flaws, multiplying may dictate that some alternative to the current situation in the world be provided, but it does not necessarily dictate that one must impose one subset's values on the rest of the world to the opposition of that rest of the world. Imposition of peace on those filled with hatred who thickly desire war results in a worsening of those individuals' situation. Imposition of war on those filled with love who strongly esire peace results in a worsening of those individuals' situation. Taking it as given that each subset's ideal outcome differs significantly from that of every other subset in the world, any overall change according to the will of one subset seems liable to yield more opposition and resentment than it does approval and gratitude. Notably, when thinking up a movement worth supporting, such an action is frightening and unstable--people with differing opinions climbing over each other to be the ones who determine the shape of the future for the rest. What, then, is an acceptable approach by which the wills coincide of all these people who are opposed to the wills of other groups being imposed on the unwilling? Perhaps to not remake the world in your own image, or even in the image of people you choose to be fit to remake the world

[-]ShardPhoenix13y30

I don't see why the FAI creators would base the CEV on anyone other than themselves except to the extent that they need to do so for political reasons. The result of this would by definition be optimal for the creators.

4lessdazed13y

Considering your other comments, I'm confident you can answer this for yourself with a few hundred seconds worth of thought.

1ShardPhoenix13y

What is that supposed to mean? (It sounds like an oblique insult but I don't want to jump to negative conclusions).

3lessdazed13y

It was intended as the opposite.

3[anonymous]13y

No, not "by definition". I advised against that idea specifically in the penultimate paragraph of my essay. One reason for this is that the programmers are not infallible, therefore if they face more challenges in programming a dynamic that can cope with one specific group of people in comparison to another group of people, then there's no reason why the output of from a CEV dynamic including the second group should not be closer to the output of an ideal CEV including the first group. Another reason: we generally consider individual humans to have interests - not groups of humans. To take the extreme case as a sufficient disproof, an FAI programmer would prefer the CEV to include 1000 kind people not including himself, rather than himself and 999 psychopaths. The FAI programming group or any other group should not be reified as having interests of its own (as in "optimal for the creators"). In any case, "if my Auntie had..." - political reasons do exist!

2ShardPhoenix13y

Your first paragraph seems like a technical issue that may or may not apply in practice, not something that really gets at the heart of CEV. For the second paragraph, I guess I was thinking of the limiting case of there being a single creator of the FAI. With groups, it presumably depends on the extent to which you believe (or can measure) that the average individual from some other group is a better match for your CEV than your FAI co-workers are. But if that's the case then those co-workers will want to include different groups of their own! I do agree that political issues may be very important in practice though - for example, "we'll fund you only if you include such-and-such people's CEV".

0[anonymous]13y

For moral reasons (although if these are true moral reasons, CEV would do that anyway, this particular error seems easy to correct).

[-]timtyler13y30

I argue that the practical difficulty of incorporating all humans into the CEV in the first place is unduly great, and that the programming challenge is also made more difficult by virtue of this choice.

Agreed. IMO, CEV is too silly to be worth much in the way of criticism.

[-]Luke_A_Somers13y20

Nobel Laureates are highly abnormal people, and not only in intelligence. I would be rather concerned about what might be expected of us all if our future were based on their CEV.

[-]endoself13y10

I don't see why whether a set of people being a Schelling point is relevant; you don't seem to be analyzing FAI design as a coordination game. If you were using it metaphorically, please do not use a technical game theory term. Can you clarify this?

2[anonymous]13y

I was referring to idea that other people would be likely to generate that particular solution for themselves when they first consider the problem of "what is the most suitable selective group of humans for CEV". Wikipedia quote: "In game theory, a focal point (also called Schelling point) is a solution that people will tend to use in the absence of communication, because it seems natural, special or relevant to them". Although this is not actually a coordination game, I don't believe that changes that the fact "Nobel laureates" is a solution that people would use if the problem was turned into a coordination game. Maybe I am bending the definition of the term (I don't really think that I am) but I do so on the basis that I can't think of a poetic alternative, and because the only time I've seen someone use that phrase they used it in the same sense that I have.

3endoself13y

I understood that much. I don't see why that's important. Does the fact that it would be a candidate for a Schelling point in a coordination-game-ified version of this problem constitute a reason that choosing it would be desirable (a reason that I don't see due to inferential distance)? If the problem isn't a coordination game, why analyze it as one? In regards to other use of the phrase on LW, I have seen other people misuse it here (though I don't remember where, so I can't confirm that their usage was incorrect). Part of the reason I responded to you was because I was worried about the meaning being diluted by LWers who had only seen the phrase here rather than actually studying game theory; we seem to pick up such memes from each other more easily than would be optimal.

2[anonymous]13y

Yes, in the sense that if it is a Schelling point then it seems less arbitrary in comparison to a group that hardly anyone would think of suggesting. It may be the case that "group X" is a more ideal group of people to participate in a selective CEV than Nobel laureates - but to the vast majority of people, this will seem like a totally arbitrary choice, therefore proponents are likely to get bogged down justifying it. If you dislike the idea of using the term "Schelling point" in this way, perhaps you could suggest a concise way of saying "choice that would naturally occur to people" to be used outside of specific game theory problems? I do recognise your objection and will try to avoid using it in this sense in future.

1endoself13y

Okay, this definitely clears things up. 'Low entropy' is something I would very naturally use. Of course, this is also a technical term. :) I do have a precise meaning for it in my head - "Learning that the chosen group is the set of non-peace Nobel laureates does not give you that much more information in the sense of conditional entropy given that you already know that the group was to be chosen by humans for the purpose of CEV." - but now that I think about it, that is quite inferentially far from "Non-peace Nobel laureates would be a low entropy group.". In the context of LW, perhaps a level of detail between those two could avoid ambiguity. Whether low entropy would be desirable in this context would depend on what you are trying to achieve. In its favour, it would be easier to justify to others as you mentioned, if that is a concern. Apart from that, I would think that the right solution is likely to be a simple one, but that looking for simple solutions is not the best way to go about finding it. Low entropy provides a bit of evidence for optimality, but you already have criteria that you want to maximize; it is better to analyze these criteria than to use a not-especially-good proxy for them, at least until you've hit diminishing returns with the analysis. Also, since you're human, looking at candidate solutions can make your brain try to argue for or against them rather than getting a deeper understanding; that tends not to end well. Since you seem to be looking at this for the purpose of gathering support by avoiding a feeling of "This is the arbitrary whim of the AI designers.", this isn't really relevant to the point you were trying to make, but since I misinterpreted you initially, we get a bit more CEV analysis.

[-][anonymous]13y00

On the other hand, given their high level of civilisation and the quality of character necessary for a person to dedicate his life to science, ceteris paribus I’d be more confident of Nobel Prize winners falling into a niceness attractor in comparison to a universal CEV.

The Nobel prize (minus the peace) is roughly an award for western academic achievement, and is mostly awarded to Ashkenazic Jews (27%, which is nine times what you might expect by population). Those three factors do not add up to strong global agreement. Extrapolating from your favorite... (read more)

0[anonymous]13y

That's a different interpretation of the reference to "al-Qaeda programmers" which I find more compelling. I'll take it as a point in favour of universal CEV. If it were the case that the existence of other simultaneous AGI projects was considered likely as the FAI project came to fruition, then this consideration would become important. "Civilisation" is not intended to have any a priori ethnic connotations. Let us always distinguish between values and facts, and that is all I have to say regarding this line of argument.

2[anonymous]13y

If? IDSIA, Ben Goertzel's OpenCog, Jeff Hawkin's Numenta, Henry Markram's Blue Brain emulation project, and the SIAI are already working toward AGI and none of them are using your "selective second option". The 2011 AGI conference reviewed some fifty papers on the topic. Projects already exist. As the field grows and computing becomes cheaper, projects will increase. You write that CEV "is the best (only?) solution that anyone has provided", so perhaps this is news. If you read the sequences, you might know that Bill Hibbard advocated using human smiles and reinforcement learning to teach friendliness. Tim Freeman has his own answer. Stuart Armstrong came up with a proposal called "Chaining God". There are regular threads on Lesswrong debating points of CEV and trying to think of alternative strategies. Lukeprog has written on the state of the field of machine ethics. Ben Goertzel has a series of writings on the subject, Thoughts on AI Morality might be a good place to start. I'm glad to hear you didn't intend that. I do still believe "civilization" generally has strong cultural connotations (which wikipedia and a few dictionaries corroborate) and offered the suggestion to improve your clarity, not to accuse you of racism.

4[anonymous]13y

I have read the sequences. Since Yudkowsky so thoroughly refuted the idea of reinforcement learning I don't think that that idea deserves to be regarded as a feasible solution to Friendly AI. On the other hand I wasn't particularly aware of the wider AGI movement, so thanks for that. Obviously when I say simultaneous AGI projects, I mean projects that are at a similarly advanced stage of development at that point in time - but your point stands.

[-][anonymous]13y00

Personally I have little regard for veil of ignorance arguments, on the basis that there is no such thing as a veil of ignorance. No, I would not want the al-Qaeda programmers to nominate a group of humans (presumably Islamic fanatics) and extrapolate their volition – I would rather they used all of humanity. But so what?

This veil of ignorance is unlike the Rawlsian one. The FAI programmer really is ignorant about features of his morality.

[This comment is no longer endorsed by its author]Reply

Moderation Log