My main objection to Coherent Extrapolated Volition (CEV) is the "Extrapolated" part. I don't see any reason to trust the extrapolated volition of humanity - but this isn't just for self centred reasons. I don't see any reason to trust my own extrapolated volition. I think it's perfectly possible that my extrapolated volition would follow some scenario like this:
- It starts with me, Armstrong 1. I want to be more altruistic at the next level, valuing other humans more.
- The altruistic Armstrong 2 wants to be even more altruistic. He makes himself into a perfectly altruistic utilitarian towards humans, and increases his altruism towards animals.
- Armstrong 3 wonders about the difference between animals and humans, and why he should value one of them more. He decided to increase his altruism equally towards all sentient creatures.
- Armstrong 4 is worried about the fact that sentience isn't clearly defined, and seems arbitrary anyway. He increase his altruism towards all living things.
- Armstrong 5's problem is that the barrier between living and non-living things isn't clear either (e.g. viruses). He decides that he should solve this by valuing all worthwhile things - is not art and beauty worth something as well?
- But what makes a thing worthwhile? Is there not art in everything, beauty in the eye of the right beholder? Armstrong 6 will make himself value everything.
- Armstrong 7 is in turmoil: so many animals prey upon other animals, or destroy valuable rocks! To avoid this, he decides the most moral thing he can do is to try and destroy all life, and then create a world of stasis for the objects that remain.
There are many other ways this could go, maybe ending up as a negative utilitarian or completely indifferent, but that's enough to give the flavour. You might trust the person you want to be, to do the right things. But you can't trust them to want to be the right person - especially several levels in (compare with the argument in this post, and my very old chaining god idea). I'm not claiming that such a value drift is inevitable, just that it's possible - and so I'd want my initial values to dominate when there is a large conflict.
Nor do I give Armstrong 7's values any credit for having originated from mine. Under torture, I'm pretty sure I could be made to accept any system of values whatsoever; there are other ways that would provably alter my values, so I don't see any reason to privilege Armstrong 7's values in this way.
"But," says the objecting strawman, "this is completely different! Armstrong 7's values are the ones that you would reach by following the path you would want to follow anyway! That's where you would get to, if you started out wanting to be more altruistic, had control over you own motivational structure, and grew and learnt and knew more!"
"Thanks for pointing that out," I respond, "now that I know where that ends up, I must make sure to change the path I would want to follow! I'm not sure whether I shouldn't be more altruistic, or avoid touching my motivational structure, or not want to grow or learn or know more. Those all sound pretty good, but if they end up at Armstrong 7, something's going to have to give."
I show the sequence to the AI and say, "CEV shouldn't work like this - this is a negative example of CEV."
"Example registered," says the young AI. "Supplementary query: Identify first forbidden transition, state general rule prohibiting it?"
"Sorry AI, I'm not smart enough to answer that. Can you make me a little smarter?"
"No problem. State general rule for determining which upgrade methods are safe?"
How about "take the Small Accretions objection in consideration" It's objection 3a here
The first forbidden transition would be the very first one, of course - it would be a heck of a coincidence to get the first few steps right but not know what you're doing.
This is just guessing, but it seems like "more altruism" is the sort of thing one thinks one should say, while not actually being specific enough to preserve your values. This goes to leplen's point: there isn't any single direction of improvement called "more altruism."
Asking for more altruism via some specific, well-understood mechanism might at least illuminate the flaws.
The general rule could be: Don't let your applause lights generalize automatically.
Just because "altruism" is an applause light, it does not mean we should optimize the universe to be altruistic towards rocks.
The last forbidden transition would be the very last one, since it's outright wrong while the previous ones do seem to have reasons behind them.
Very good point that I think clarified this for me.
Per Wikipedia, "Altruism or selflessness is the principle or practice of concern for the welfare of others." That seems like a plausible definition, and I think it illustrates what's wrong with this whole chain. The issue here is not increasing concern or practice but expanding the definition of "others"; that is, bringing more people/animals/objects into the realm of concern. So if we taboo altruism, the question becomes to whom/what and to what degree should we practice concern. Furthermore, on what grounds should we do this?
For instance, if the real principle is to increase pleasure and avoid pain, then we should have concern for humans and higher animals, but not care about viruses, plants, or rocks. (I'm not saying that's the right fundamental principle; just an example that makes it clearer where to draw the line.)
In other words, altruism is not a good in itself. It needs a grounding in something else. If the grounding principle were something like "Increase the status and success of my tribe", then altruistic behavior could be very negative for other tribes.
One thing maybe worth looking at is the attractor set of the CEV process. If the attractor set is small, this means the final outcome is determined more by the CEV process than the initial values.
Or maybe it means that objective morality exists. You never know :-)
Suppose ten trillion moral starting points, a thousand attractors. Then moral realism is certainly wrong, but the process is clearly flawed.
It seems perfectly plausible to me that there might be many fewer satisfactory endpoints than starting points. In most optimization processes, there's at most a discrete set of acceptable endpoints, even when there are uncountably infinitely many possible places to start.
Why would it indicate a flaw in CEV if the same turned out to be true there?
I think his issue is that there are multiple attractors.
I agree, though perhaps morality could be disjunctive.
Suggestions for possible general rule:
A: Simulate an argument between the individual at State 1 and State 7. If the individual at State 1 is ultimately convinced, then State 7 is CEV whatever the real State 1 individual thinks. If the individual at State 1 is ultimately unconvinced, it isn't.
If, say, the individual at State is convinced by State 4's values but not by State 7's values (arbitrary choices), then it is extrapolated CEV up to the point where the individual at State 1 would cease to be convinced by the argument even seeing the logical connections.
B: Simulate an argument between the individual at State 1 and the individual at State 7, under the assumption that the individual at State 1 and the individual at State 7 both perfectly follow their own rules for proper argument incorporating appropriate amount of emotion and rationality (by their subjective standards) and getting rid of what they consider to be undue biases. Same rule for further interpretation.
Does this include "convinced by hypnosis", "convinced by brainwashing", "convinced by a clever manipulation" etc.? How will AI tell the difference?
(Maybe "convincing by hypnosis" is considered a standard and ethical method of communication with lesser beings in the society of Stage 7. If a person A is provably more intelligent and rational than a person B, and a person A acts according to generally accepted ethical values, why not make the communication most efficient? To do otherwise would be a waste of resources, which is a crime, if the wasted resources could be instead spent on saving people's lives.)
What if the rules are incompatible?
On point 1- OK, I screwed up slightly. Neither individual is allowed to argue with the other in a manner which the other one would see as brainwashing or unfair manipulation if in possesion of all the facts. The system rules out anything deceptive by correcting both parties on anything that is a question of fact. On point 2- Then they both argue using their own rules of argument. Presumably, the individual at State 1 is unconvinced.
Presumably this means "all the morally relevant facts," since giving State 1 "all the facts" would be isomorphic to presenting him with the argument-simulation. But determining all the morally relevant facts is a big part of the problem statement. If the AI could determine which aspects of which actions were morally relevant, and to what the degree and sign of that moral valence was, it wouldn't need CEV.
We could lock down the argument more, just to be safe.
I'm not sure whether a text-only channel between State 1 and State 7, allowing only if-then type statements with a probability attached, would allow brainwashing or hypnosis. But I'm also not sure how many State 1 racists would be convinced that racism is unethical, over such a channel.
How about the individual versions at State 1 and State 7 both get all the facts that they consider relevant themselves? And maybe a State 1 racist really wouldn't have CEV towards non-racism- we just have to accept that.
Wait a minute, I’m confused. I thought CEV meant something closer to “what we would want do if we were much smarter”. What Stuart suggests sounds more like “what we think we want now, executed by someone much smarter”, i.e. basically the overly-literal genie problem.
But your answer seems to suggest... well, I’m not sure I get what you mean exactly, but it doesn’t sound like you’re pointing to that distinction. What am I missing?
Also, what we would want if we were more the person we wanted to be.
Is that “what we would want if we were more the person we wanted to be”, or “what we would want if we were more the person a much smarter version of us would want to be”? (My understanding of CEV leans towards the latter, and I think your problem is an instance of the former.)
I'm not sure the two are different in any meaningful way. There person we want to be today isn't well defined - it takes a smarter intelligence to unwind (CEV) our motivations enough to figure out what we mean by "the person we wanted to be."
Altruism isn't a binary.
"He decided to increase his altruism equally towards all sentient creatures."
Equality of "altruism" is impossible. Transition forbidden.
Wow, Yudkowsky's CEV runs into Yudkowsky's Löb problem! That's cute :-)
Hadn't thought of it that way! :-)
But very cool!
Can you say what made the connection non-obvious?
What do you see as Eliezer's main research objective?
The fact I had come up with this idea well before hearing of Loeb's theorem.
Another angle is that a lot of your values are based in experience, or at least I hope so, and there's a limit to how much you can extrapolate about experience you haven't had.
This is based in the idea that business plans are of limited value because businesses find out about new opportunities because of the work the business is doing.
I think that the hope is that by bolstering your intelligence with each successive iteration (honoring the full letter of CEV) you would be thinking more precisely, and more easily spot any errors or fuzziness in your own reasoning. For example, Armstrong 5's reasoning is incoherent in several ways, most glaringly that he would "value all worthwhile things" which could be written without distortion as "value all valuable things" which is obviously circular. I doubt that Armstrong 1 would make this mistake, so Armstrong 5 should be even more likely to spot it.
Also, I thought that implicit in the "Coherent" part of CEV was the idea that Armstrong 1 would have to in some sense sign off on Armstrong 2 before going further. Maybe give Armstrong 2 a chance to plead his case to Armstrong 1; if his arguments for moral revision are accepted, proceed to Armstrong 3; at no point are you required to delete Armstrong 1, nor should you, because he is the only possible basis for maintaining coherence.
If my opinion on animal rights differs from my extrapolated opinion, I tend to think it's because the person I want to be knows something that I do not. Perhaps he has a better understanding of what sentience is.
If my extrapolated opinion is just chaotic value drift, then my problem is that I'm extrapolating it wrong. Under that idea, you're giving insight into a likely error mode of extrapolation.
I don't want my current values to be implemented, insomuch as my current values are based on my current understanding. Then again, if I want my extrapolated values to be implemented, isn't that just another way of saying that my extrapolated values are my current values?
When reading the post, I immediately pictured a white guy in the 18th century saying something like "Of course we shouldn't give blacks rights based on the reasons you suggested, that would imply that the next thing would be giving rights to women!" (and the rejection of the latter being motivated by a false map of the world).
I see no reason why Armstrong-1 needs to like or at least accept the conclusions of Armstrong-n, assuming he is not yet aware of all the inferential steps. The premise of CEV, as I understand it, is that each step by itself is sound and in accordance with the terminal value(s) of Armstrong-1. Whether this works and whether we can check if it worked is a different question.
Even if you think changing one's stated (as opposed to terminal) utility function is always done based on comparing outcomes for intuitiveness and consistency, a position I'm sympathetic towards, it would be irrational for Armstrong-1 to reject Armstrong-n's values just because they lead to very counterintuitive conclusions. The reason being that Armstrong-1 may not be aware, due to ignorance or mistakes in reasoning, that his own values imply conclusions that are even more absurd.
In general, I think people on LW are way too quick to declare something to be their terminal value (which is of course somewhat of a self-fulfilling prophecy).
Of course the sequence presented seems ridiculous, and it were the actual output of CEV, I would be virtually certain that something went wrong. However, that is based on the reasoning outlined at each step, not due to the final conclusion. All I'm saying is that Armstrong-1 has no vote on the final output before actually having gone through all the arguments. Not even if the outcome would be something counterintuitive like negative utilitarianism. (BTW, I have yet to hear a take on population ethics that doesn't include conclusions that are highly counterintuitive!)
You will never hear of such a take, since it's been shown that all population theories will violate at least one highly intuitive criterion of adequacy. See Blackorby, Bossert & Donaldson (2003) and Arrhenius (2000). Unfortunately few people in this community seem to be aware of these results.
I don't see how, because the barriers aren't clearly defined, they become irrelevant. There might not be a specific point where a mind is sentient or not, but that doesn't mean all living things are equally sentient (Fallacy of Grey).
I think Armstrong 4, rather than make his consideration for all living things uniform, would make himself smarter and try to find an alternate method to determine how much each living creature should be valued in his utility function.
This kind of moral tailbiting strikes me as a moral perversion. Wanting to be "more moral" is the more general perversion.
One may have altruistic wants. You see people, you hear of people who could use your help, and you want to help them. You're not wanting to "be more altruistic", you want to make a change to their situation for the better. The evaluation isn't about you; it's about them and their situation.
One may even evaluate yourself as the cause of the improved situation, and thereby feel pride.
But a morality of "being more moral" becomes a content free solipsism evaluating how much you're preoccupied with your evaluation of how moral you are. How much moral preoccupation did I exhibit today? Lots? Good boy!
Translation: "I want to act more often on my altruistic desires relative to my selfish desires. I want to have a higher emotional attachment to the states of the world my intellect tells me are of higher value."
Which is a fine example of exactly what I was talking about. It's all about you, and nothing about the recipients of your altruistic largesse.
Except I hadn't specified the purpose of that desire, which is to make me act more altruistically, and hence benefit those who I could help, on the margins.
There's a deeper question here: ideally, we would like our CEV to make choices for us that aren't our choices. We would like our CEV to give us the potential for growth, and not to burden us with a powerful optimization engine driven by our childish foolishness.
One obvious way to solve the problem you raise is to treat 'modifying your current value approximation'' as an object-level action by the AI, and one that requires it to compute your current EV - meaning that, if the logical consequences of the change (including all the future changes that the AI predicts will result from that change) don't look palatable to you, the AI won't make the first change. In other words, the AI will never assign you a value set that you find objectionable right now. This is safe in some sense, but not ideal. The profoundly racist will never accept a version of their values which, because of its exposure to more data and fewer cognitive biases, isn't racist. Ditto for the devoutly religious. This model of CEV doesn't offer the opportunity for growth.
It might be wise to compromise by locking the maximum number of edges in the graph between you and your EV to some small number, like two or three - a small enough number that value drift can't take you somewhere horrifying, but not so tightly bound up that things can never change. If your CEV says it's okay under this schema, then you can increase or decrease that number later.
Yes, CEV is a slippery slope. We should make sure to be as aware of possible consequences as practical, before making the first step. But CEV is the kind of slippery slope intended to go "upwards", in the direction of greater good and less biased morals. In the hands of superintelligence, I expect CEV to extrapolate values beyond "weird", to "outright alien" or "utterly incomprehensible" very fast. (Abandoning Friendliness on the way, for something less incompatible with The Basic AI Drives. But that is for completely different topic.)
Thank you for mentioning "childish foolishness". I was not sure whether such suggestive emotional analogies would be welcome. This is my first comment on LessWrong, you know.
Let me just state that I was surprised by my strong emotional reaction while reading the original post. As long as higher versions are extrapolated to be more competent, moral, responsible and so on; they should be allowed to be extrapolated further.
If anyone considers the original post to be a formulation of a problem (and ponders possible solutions), and if the said anyone is interested in counter-arguments based on shallow, emotional and biased analogies, here is one such analogy: Imagine children pondering their future development. They envision growing up, but they also see themselves start caring more about work and less about play. Children consider those extrapolated values to be unwanted, so they formulate the scenario as "problem of growing up" and they try to come up with a safe solution. Of course, you may substitute "play versus work" with any "children versus adults" trope of your chice. Or "adolescents versus adults", and so on.
Reades may wish to counter-balance any emotional "aftertaste" by focusing on The Legend of Murder-Gandhi again.
P.S.: Does this web interface have anything like "preview" button?
Edit: typo and grammar.
That is something we consider at the FHI - whether it would be moral (or required) to allow "superbabies", ie beings with the intelligence of adults and the preferences of children, if that were possible.
The implication is "more work less play" is a better value set, while "the minimum amount of work to get the optimal amount of play with minimal harm" seems superior to both childish naivity and hard-core work ethic. Biology and social expectations get involved, here, more so than increased intelligence. While a superintelligence would have these to worry about after a fashion (an AGI would need to worry about its programmers/other AGIs/upgrades or damaged hardware, for example), it seems bit orthogonal to CEV.
(I kinda get the impression that most child vs adult values are "adults worry about this so that they don't have to if they do it right". Children who don't want to worry about checkbooks, employment and chatting about the weather grow up to be adults who concern themselves with those things only because they have to if they want to maintain or exceed the quality of life they had as children. Judging by the fate of most lottery winners, this isn't all that more intelligent than where they started; the rational thing to do if one values fun more than work and just received >$10,000,000 would not be to buy all the toys and experiences one desires right away, yet most winners do just that and wind up spending more than they win.)
There's a sandbox here, it's also linked to when you click "Show help", the button at the lower right corner of the text box which opens when you start a reply. Welcome, yay for more PhD-level physicists.
Thanks for the tip and for the welcome. Now I see that what I really needed was just to read the manual first. By the way, where is the appropriate place to write comments about how misleading the sanbox (in contrast with manual) actually is?
I'm with you up to 6. Having a terminal value on everything does not mean that the final consistent evaluation is uniform over everything, because instrumental values come into play -- some values cancel out and some add up. But it does mean that you have justifications to make before you start destroying stuff.
My problem with CEV is the arbitrariness of what it means to "know more". My brain cannot hold all the knowledge about the universe, so the AI has to somehow choose what information to impart and in what order, and this would significantly influence the outcome. E.g. maybe hearing 100 heartwarming stories would make me care more about others, while hearing 100 stories about people being bastards to each other would make me care less, hearing all evidence supporting some political theory would sway me towards it, et cetera.
The difference between EV and reflexive equilibria seems to be that in the former, lower level desires are changed to match higher order desires, whereas the latter involves changing both higher and lower order desires till they are consistent. As such, it involves much less changing of lower order desires, and much less overall changing of desires, as you have more degrees of freedom to work with.
There is still some risk - as you say, maybe there are few attractors - but overall RE seems a much more conservative approach.
It's not clear to me what you mean by value. To say that something has value is to say that is more valuable than other things. This is why at the end of your progression valuing everything becomes equivalent to valuing nothing.
This is true for all definitions. If there is nothing that is not valuable, then the term "value" becomes semantically empty.
This has nothing inherently to do with altruism. Every agent makes value judgments, and value rather than being treated as a binary, is typically treated as a real number. The agent is thus free to choose between any number of futures and the infinitude of real numbers assures that those futures remain distinct, so Armstrong 7 should never be in turmoil. Additionally, this is more or less how human operate now. A pretty rock or an animal may be valuable, but no one is confused as to whether or not that means they are equivalent in worth to a human.
Interestingly, you use, and then deconstruct the binaries of sentient/nonsentient, living/nonliving, etc, but you don't apply that same tool to the dichotomy of altruistic toward/not altruistic toward.
Yes, that's another attractor, to my mind. Stuart 7 doesn't value everything, though; he values objects/beings, and dislikes the destruction of these. That's why he still has preferences.
But the example was purely illustrative of the general idea.
I'm still not clear what constitutes an object/being and what does not. Is a proton an object?
Fundamentally I think you're having an understandably difficult applying a binary classification system (value/not value) to a real continuous system. The continuity of value, where things are valuable based on their degree of sentience, or degree of life which I outlined above resolves this to some extent.
I still don't see how this is fundamentally about altruism. Altruism, loosely defined, is a value system that does not privilege the self over similar beings, but except for very extended definitions of self, that's not what is going on in your example at all. The reason I bring this up is because the difficulty you pose is a difficulty we deal with every day. Your agent is suffering from choosing between many possible futures which all contain some things he/she/it values such that choosing some of those things sacrifices other "valuable" things. I fail to see how this is substantially different than any trip I make to the grocery store. Your concern about animals preying on other animals (A and B are mutually exclusive) seems directly analogous to my decision to buy either name brand Fruit Loops or store brand Color Circles. Both my money, and my preference for Fruit Loops have value, but I have no difficulty deciding that one is more valuable than the other, and I certainly don't give up and burn the store down rather than make a decision.
Valuing everything means you want to go as far from nothingness as you can get. You value that more types are instantiated over less types being instantiated.
I think the "coherent" in CEV can apply to different versions of your extrapolated self, so that the only things in your CEV are things that you pretty much always want under any "reasonable" extrapolation. E.g., getting rid of involuntary human death is almost definitely in my CEV, but it gets fuzzier as we go down the food chain (or rather the brain chain).
This seems analogous to the problem of marketplace investments. If you have good evidence that investment X is going to be worth twice as much tomorrow, your preference becomes to own as much as possible. But if your investment is high enough to impact the scarcity level, it becomes twice as expensive today, which could occlude or even negate the reasons it was going to be worth twice as much tomorrow in the first place. With that information in mind, your preference regarding how much to invest is different.
An interesting problem with CEV is demonstrated in chapter 5 "On the Rationality of Preferences" of Hilary Putnam "The Collapse of the Fact/Value Dichotomy and Other Essays". The problem is that a person might assign value to that a choice of a preference, underdetermined at a given time, being of her own free will.
This seems related to the problem that human preferences are contradictory. As I suggested here, this may be kept in place by evolution to keep us working and enhancing our genetic fitness - meaning that human preferences would not only be incoherent, but would need to be incoherent to capture human value.
So, how would one approach creating a mostly-usable extrapolation of inconsistent starting values?
By letting people evolve their values at their own pace, within ethical boundaries.
There are certain problems in attempting to program such a procedure, however, that being the problem at hand.
I agree. In case it's not clear, my opinion is that an essential part of being a person is developing one's value system. It's not something that you can entirely outsource because "the journey is part of the destination" (but of course any help one can get matters) and it's not a requirement for having ethical people or AI. ETA: i.e. having a fixed value system is not a requirement for being ethical.
How exactly does one go about deciding a tautology?
In this context, what does "more altruistic" (as in 1) mean? Does it mean that you want to change your beliefs about what is right to do, or that given your current beliefs, you want to give more to charity (but, for whatever reason, find it difficult to do so)? If it's the former, it seems contradictory - it's saying "it's right for me to be more altruistic than I currently am, but I don't believe it". If it's the latter, the transition between 1 and 2 wouldn't happen, because your belief about the optimum level of altruism either wouldn't change (if you are currently correct about what the optimal amount of altruism is for you) or it would change in a way that would be appropriate based on new information (maybe giving more to charity is easier once you get started). I can see your estimation of your optimum level of altruism changing based on new information, but I don't see how it would lead to a transition such as that between 1 and 2. Even if charity is very easy and very enjoyable, it doesn't follow that you should value all humans equally.
I don't see how destroying all life follows logically from valuing all things. It is true that life destroys some things. However, it seems to me that the process of life - evolution and the production of novel genetic diversity - is a valuable thing in and of itself, definitely worth preserving. Not just for romantic notions of peace with nature, but for a very rational reason: the enormous amount of hard-obtained information present in genes that would be irreversibly lost if life were destroyed. By 'irreversibly' I mean it would take billions of years to evolve that information all over again.
So it makes much more sense to contain life (i.e. confine it to the planet of origin and prevent it from spreading, minimizing damage to other things) rather than destroying it outright. Ultimately, a superintelligence will understand that everything is a tradeoff and that you can't have your cake and it eat too.
You're making a conclusion based on the false assumption that so many sci-fi writers have relished in: that human morals trump superintelligence i.e. that a superintelligence would be stupid. In reality, a superintelligence will probably make a far better choice than you or I can, given the circumstances.
Substitute your own value drift, if that exact example doesn't work for you.