Today's post, Moral Error and Moral Disagreement was originally published on 10 August 2008. A summary (taken from the LW wiki):


How can you make errors about morality?

Discuss the post here (rather than in the comments to the original post).

This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Sorting Pebbles Into Correct Heaps, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.

Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.

17 comments, sorted by Click to highlight new comments since: Today at 2:00 AM
New Comment

When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes.

Just to make sure I understand this, suppose a pencil maximizer and a paperclip maximizer meet each other while tiling deep space. They communicate (or eat parts of each other and evaluate the algorithms embedded therein) and discover that they are virtually identical except for the pencil/paperclip preference. They further discover that they are both the creation of a species of sentient beings who originated in different galaxies and failed the AI test. The sentient species shared far more in common than the difference in pencil/paperclip preference. Neither can find a flaw in the rationality algorithm that the other employs. Is P(my-primary-goal-should-change) < P(my-primary-goal-should-change | the-evidence-in-this-scenario) for either agent? If not, this implies that the agents believe their primary goal to be arbitrary yet still worth keeping intact forever without change, e.g. pencils and paperclips are their basic morality and there was no simpler basic morality like "do what my creators want me to do" in which case the probability of the paperclip/pencil maximization goal should receive a significant update upon discovering that two different species with so much in common accidentally ordered their own destruction by arbitrary artifacts.

Also, imagine that our basic morality is not as anthropomorphically nice as "What will save my friends, and my people, from getting hurt? How can we all have more fun? ..." and is instead "What will most successfully spread my genetic material?". The nice anthropomorphic questions we are aware of may only be a good-enough approximation of our true basic morality that we don't have (or need) conscious access to it. Why should we arbitrarily accept the middle level instead of accepting the "abortion is wrong" or "maximize our genetic material" morals at face value?

I find it interesting that single cells got together and built themselves an almost-friendly AI for the propagation of genetic material that is now talking about replacing genetic material with semiconductors. Or was it the Maximization Of Maximization Memes meme that got the cells going in the first place and is still wildly successful and planning its next conquest?

Is P(my-primary-goal-should-change) < P(my-primary-goal-should-change | the-evidence-in-this-scenario) for either agent? If not, this implies that the agents believe their primary goal to be arbitrary yet still worth keeping intact forever without change, e.g. pencils and paperclips are their basic morality and there was no simpler basic morality like "do what my creators want me to do"

This strikes me as a little anthropomorphic. Maximizers would see their maximization targets as motivationally basic; they might develop quite complex behaviors in service to those goals, but there is no greater meta-motivation behind them. If there was, they wouldn't be maximizers. This is so alien to human motivational schemes that I think using the word "morality" to describe it is already a little misleading, but insofar as it is a morality it's defined in terms of the maximization target: a paperclipper would consider rewriting its motivational core if and only if it could be convinced that that would ultimately generate more paperclips than the alternative.

I wouldn't call that arbitrary, though, at least not from the perspective of the maximizer; doing so would be close to calling joy or happiness arbitrary from a human perspective, although there really isn't any precise analogy in our terms.

Reading makes me think that a rational agent, even if its greatest motivation is to maximize its paperclip production, would be able to determine that its desire for paperclips was more arbitrary than its tools for rationality. It could perform simulations or thought experiments to determine its most likely origins and find that while many possible origins lead to the development of rationality there are only a few paths that specifically generate paperclip maximization. Equally likely are pencil maximization and smiley-face maximization, and even some less likely things like human-friendliness maximization will use the same rationality framework because it works well in the Universe. There's justification for rationality but not for paperclip maximization.

That also means that joy and happiness are not completely arbitrary for humans because they are tools used to maximize evolutionary fitness, which we can identify as the justification for the development of those emotions. Some of the acquired tastes, fetishes, or habits of humans might well be described as arbitrary, though.

Eliezer points out that Bob might not (probably doesn't) know all that is entailed by morality_Bob, because morality_Bob is an idealized abstract dynamic. But then, why the _Bob suffix?

Most people conceive morality to involve both personal ideals which can vary from person to person - I might place more emphasis on family, while you focus on athletic excellence - and interpersonal areas like justice in which "we're all in it together." Let's start with the former. If Bob's a family man and Sally's a super-athlete, do they disagree on what is important? Not necessarily; they may well agree that Bob should go to his sister's concert rather than exercise tonight, and Sally should exercise rather than help plan her brother's wedding. Sally feels guilty when she doesn't exercise and Bob says it's appropriate that she feels guilty. And so on. Of course, either or both of them may be mistaken, and they can coherently disagree, but it would be extremely bizarre for Sally to say, "I can see that your idealized abstract dynamic leads to this emphasis on family, but still what you should do is forget all that and strive for athletic excellence."

But what about justice? If their idealized abstract dynamics for "justice" differ, then that really would be a disagreement! Well, OK, but the difference hasn't been shown, and it's not enough to show that their terminal values differ. After all, as we've just discussed, some terminal values are personal ideals that need not bear directly on justice. And since both Bob and Sally conceive justice interpersonally, the idealized abstract dynamic for justice_Bob makes direct reference to that of justice_Sally and vice versa. If when both are at their rational best (edit: and both honestly pursuing justice), Bob can't convince Sally that X is a fair rule and Sally can't convince Bob that Y is a virtue of just persons, then neither X nor Y are mandated by justice. If they want to keep discussing together how to treat each other, they will have to keep looking to find other rules and virtues that they can agree on.

But doesn't this threaten the possibility that "justice" is empty of content altogether; that there's nothing they can agree on? In concept, yes. In practice, I don't see much probability there, given the psychological unity of humanity. (Encounters with aliens are another story, and it may be simply that no justice is to be had there.)

But couldn't psychopaths reject justice altogether? Sure, but that doesn't mean there's a different kind of justice, justice_P, that psychopaths value. It just means that justice doesn't interest them.

With personal ideals, rational difference has been demonstrated to be likely (IMO at least), but difference need not mean disagreement. With justice, rational difference has not been demonstrated to be likely. Therefore I suggest we drop the _Bob and _Sally suffixes until further notice.

When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes.

I don't get this. Different optimization processes disagree about what to optimize, no?

Different optimization processes disagree about what to optimize, no?

No. The paperclip maximizer believes that it makes no reasoning errors in striving to maximize paperclips, and the pencil maximizer agrees. And vice versa. And neither of them conceives of a property of agent-independent "to-be-optimized-ness", much less attributes such a property to anything.

Edit: Nor, for that matter, do ordinary moralists conceive of an agent-independent "to-be-optimized". "Should" always applies to an agent doing something, not to the universe in general. However, often enough people assume that everyone should try to accomplish a certain goal, in which case people will talk about "what ought to be".

It's the difference between "different preferences about what should be" vs "different opinions about what is".

By 'disagreement' Eliezer means the latter.

If they were both conceptual beings, couldn't they argue about whether it "is right" to maximize pencils or paper clips?

They could, but they don't have to disagree about any observation to do that. 'Should' statements cannot be made from 'is' statements. Confusingly, 'is (morally) right' is a 'should' statement, not an 'is' statement.

'Should' statements cannot be made from 'is' statements.

Do you notice the difficulty in your own statement there?

If I say, "We should derive 'should' statements from 'is' statements", you can't refute my should statement; you can only contradict it. You might try to prove it impossible to derive 'should' from 'is'—but even assuming you succeed, proving an impossibility is by your own statement proving only what is, not what should be.

"Hume's Guillotine" always cuts itself in half first.

You're right, in that I can't refute the core statement of a system of ethics.

Perhaps genies should grant wishes- but developing a system that creates a moral imperative for genies to grant wishes doesn't make genies or grant wishes. Even if you believe that it is morally right and proper to build perpetual motion machines, you don't actually get to build perpetual motion machines.

Okay. Now take step two—try to show that, in fact, a 'should' really cannot be derived from an 'is'.

"Perpetual motion machines cannot be built" can be demonstrated to be true based on empirically-observable facts. If "'Should' statements cannot be made from 'is' statements" is a true 'is' statement, it will also be possible to show it is true based entirely on empirically-observable facts, right?

The usual mistake people make at this point is to claim that various "shoulds" contradict what "is". But what people think should be is not proof of what is. No matter how hard people believe genies should give wishes, it won't bring them into existence. What people believe morality should say doesn't prove what morality is.

(Unless, of course, you argue that morality is whatever people say it should be. But then you're deriving your should - morality - from what is - what people say.)

Nope. Incompleteness shows that there are some statements which are true which cannot be proven to be true.

However, empirically observed facts in the absence of moral imperative do not create a moral imperative. Typically ethics are formed around by a value judgement and then molded and polished by facts. I see that you are trying to trap me by saying that "I believe that this is better" is a fact, rather than allowing the value judgement "This is better" to stand.

Morality is, among other things, subjective. There is no basis in fact to prefer any system over any other system, any more than there is a basis in fact to prefer one genre of movies over another. I prefer internal consistency to internal inconsistency, and I believe that the majority of people who tend to think things through also prefer that, but I have no factual basis for that preference.

Claiming that falling down (as opposed to up) is a moral act, while not technically refutable, is hard to swallow.

They would do well to taboo the vague "right" and maybe discuss what their respective extrapolated volitions are, whether "maximize pencils" is actually pencil-maximizer's preference or an unfortunate result of a hardware glitch which on reflection should be corrected, what constitutes a "pencil" for the purpose of proper pencil-maximization, what is the proper attitude towards risk (dependence of utility on the number of pencils can take many shapes, even if it's a given that it's monotonic and depends on nothing else), etc.

...great. Now I'm wondering whether a paperclip made of lead could be called a pencil and thus reconcile their optimization processes.

Excellent idea. This might well be the result of negotiations between two optimizers, preferable to fighting it out for control of the resources.

Two optimizers might negotiate based on the reality of their goals and power, instead of fighting over who owns the label "right"? What crazy talk!

And don't tell them. I don't want them joining forces.