In You Provably Can't Trust Yourself, Eliezer tried to figured out why his audience didn't understand his meta-ethics sequence even after they had followed him through philosophy of language and quantum physics. Meta-ethics is my specialty, and I can't figure out what Eliezer's meta-ethical position is. And at least at this point, professionals like Robin Hanson and Toby Ord couldn't figure it out, either.

Part of the problem is that because Eliezer has gotten little value from professional philosophy, he writes about morality in a highly idiosyncratic way, using terms that would require reading hundreds of posts to understand. I might understand Eliezer's meta-ethics better if he would just cough up his positions on standard meta-ethical debates like cognitivism, motivation, the sources of normativity, moral epistemology, and so on. Nick Beckstead recently told me he thinks Eliezer's meta-ethical views are similar to those of Michael Smith, but I'm not seeing it.

If you think you can help me (and others) understand Eliezer's meta-ethical theory, please leave a comment!

Update: This comment by Richard Chappell made sense of Eliezer's meta-ethics for me.



New Comment
375 comments, sorted by Click to highlight new comments since: Today at 6:21 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Eliezer's metaethics might be clarified in terms of the distinctions between sense, reference, and reference-fixing descriptions. I take it that Eliezer wants to use 'right' as a rigid designator to denote some particular set of terminal values, but this reference fact is fixed by means of a seemingly 'relative' procedure (namely, whatever terminal values the speaker happens to hold, on some appropriate [if somewhat mysterious] idealization). Confusions arise when people mistakenly read this metasemantic subjectivism into the first-order semantics or meaning of 'right'.

In summary:

(i) 'Right' means, roughly, 'promotes external goods X, Y and Z'

(ii) claim i above is true because I desire X, Y, and Z.

Note that Speakers Use Their Actual Language, so murder would still be wrong even if I had the desires of a serial killer. But if I had those violent terminal values, I would speak a slightly different language than I do right now, so that when KillerRichard asserts "Murder is right!" what he says is true. We don't really disagree, but are instead merely talking past each other.

Virtues of the theory:

(a) By rigidifying on our actual, current desires (or idealizations thereupo... (read more)

I think this is an excellent summary. I would make the following comments:

Confusions arise when people mistakenly read this metasemantic subjectivism into the first-order semantics or meaning of 'right'.

Yes, but I think Eliezer was mistaken in identifying this kind of confusion as the fundamental source of the objections to his theory (as in the Löb's theorem discussion). Sophisticated readers of LW (or OB, at the time) are surely capable of distinguishing between logical levels. At least, I am -- but nevertheless, I still didn't feel that his theory was adequately "non-relativist" to satisfy the kinds of people who worry about "relativism". What I had in mind, in other words, was your objections (2) and (3).

The answer to those objections, by the way, is that an "adequately objective" metaethics is impossible: the minds of complex agents (such as humans) are the only place in the universe where information about morality is to be found, and there are plenty of possible minds in mind-design space (paperclippers, pebblesorters, etc.) from which it is impossible to extract the same information. This directly answers (3), anyway; as for (2), "fallibility" is rescued (on the object level) by means of imperfect introspective knowledge: an agent could be mistaken about what its own terminal values are.

Note that your answer to (2) also answers (1): value uncertainty makes it seem as if there is substantive, fundamental normative disagreement even if there isn't. (Or maybe there is if you don't buy that particular element of EY's theory)
Elizer attempted to deal with that problem by defining a certain set of things as "h-right", that is, morally right from the frame of reference of the human mind. He made clear that alien entities probably would not care about what is h-right, but that humans do, and that's good enough.
That's not a reason to prefer EY's theory to an error theory (according to which properly normative properties would have to be irreducibly normative, but no such properties actually exist).
Richard, Until persuaded otherwise, I agree with you on this point. (These days, I take Richard Joyce to have the clearest defense of error theory, and I just subtract his confusing-to-me defense of fictionalism.) Besides, I think there are better ways of getting something like an 'objective' ethical theory (in something like a 'realist' sense) while still holding that reasons for action arise only from desires, or from relations between desires and states of affairs. In fact, that's the kind of theory I defend: desirism. Though, I'm not too interested anymore in whether desirism is to be called 'objective' or 'realist', even though I think a good case can be made for both.


You're speaking my language, thanks! I hope this is EY's view, because I know what this means. Maybe now I can go back and read EY's sequence in light of this interpretation and it will make more sense to me now.

EY's theory as presented above makes me suspicious that making basic evaluative moral terms rigid designators is a kind of 'trick' which, though perhaps not intended, very easily has the effect of carrying along some common absolutist connotations of those terms where they no longer apply in EY's use of those terms.

At the moment, I'm not so worried about objection (1), but objections (2) and (3) are close to what bother me about EY's theory, especially if this is foundational for EY's thinking about how we ought to be designing a Friendly AI. If we're working on a project as important as Friendly AI, it becomes an urgent problem to get our meta-ethics right, and I'm not sure Eliezer has done it yet. Which is why we need more minds working on this problem. I hope to be one of those minds, even if my current meta-ethics turns out to be wrong (I've held my current meta-ethics for under 2 years, anyway, and it has shifted slightly since adoption).

But, at the moment it remains plausible to me that Eliezer is right, and I just don't see why right now. Eliezer is a very smart guy who has invested a lot of energy into training himself to think straight about things and respond to criticism either with adequate counterargument or by dropping the criticized belief.

invested a lot of energy into training himself to think straight about things and respond to criticism either with adequate counterargument or by dropping the criticized belief

Maybe; I can't say I've noticed that so much myself -- e.g. he just disappeared from this discussion when I refuted his assumptions about philosophy of language (that underpin his objection to zombies), but I haven't seen him retract his claim that zombies are demonstrably incoherent.

Clearly, from his standpoint a lot of things you believed were confused, and he decided against continuing to argue. This is a statement about willingness to engage situations where someone's wrong on the Internet and presence of disagreement, not external evidence about correctness (distinct from your own estimate of correctness of your opponent's position).
You think that "clearly" Eliezer believed many of Richard's beliefs were confused. Which beliefs, do you think?

I won't actually argue, just list some things that seem to be points where Richard talks past the intended meaning of the posts (irrespective of technical accuracy of the statements in themselves, if their meaning intended by Richard was what the posts referred to). Link to the post for convenience.

  • "premise that words refer to whatever generally causes us to utter them": There is a particular sense of "refer" in which we can trace the causal history of words being uttered.
  • "It's worth highlighting that this premise can't be right, for we can talk about things that do not causally affect us. ": Yes, we can consider other senses of "refer", make the discussion less precise, but those are not the senses used.
  • "We know perfectly well what we mean by the term 'phenomenal consciousness'.": Far from "perfectly well".
  • "We most certainly do not just mean 'whatever fills the role of causing me to make such-and-such utterances'" Maybe we don't reason so, but it's one tool to see what we actually mean, even if it explores this meaning in a different sense from what's informally used (as a way of dissolving a potentially wr
... (read more)
This is the first time I saw anyone telling EY that what he wrote is plainly false.
I made a personal list of top frequently-cited-yet-irritatingly-misleading EY posts:

I agree with the first one of those being bad.

Yes, if you're talking about corporations, you cannot use exactly the same math than you do if you're talking about evolutionary biology. But there are still some similarities that make it useful to know things about how selection works in evolutionary biology. Eliezer seems to be saying that if you want to call something "evolution", then it has to meet these strictly-chosen criteria that he'll tell you. But pretty much the only justification he offers is "if it doesn't meet these criteria, then Price's equation doesn't apply", and I don't see why "evolution" would need to be strictly defined as "those processes which behave in a way specified by Price's equation". It can still be a useful analogy.

The rest are fine in my eyes, though the argument in The Psychological Unity of Humankind seems rather overstated for several reasons.

FWIW, cultural evolution is not an analogy. Culture literally evolves - via differential reproductive success of memes...
Do you have recommendations for people/books that take this perspective seriously and then go on to explore interesting things with it? I haven't seen anyone include the memetic perspective as part of their everyday worldview besides some folk at SIAI and yourself, which I find pretty sad. Also, I get the impression you have off-kilter-compared-to-LW views on evolutionary biology, though I don't remember any concrete examples. Do you have links to somewhere where I could learn more about what phenomena/perspectives you think aren't emphasized or what not?

My current project is a book on memetics. I also have a blog on memetics.

Probably the best existing book on the topic is The Meme Machine by Susan Blackmore.

I also maintain some memetics links, some memetics references, a memetics glossary - and I have a bunch of memetics videos.

In academia, memetics is typically called "cultural evolution". Probably the best book on that is "Not by Genes Alone".

Your "evolutionary biology" question is rather vague. The nearest thing that springs to mind is this. Common views on that topic around here are more along the lines expressed in the The Robot's Rebellion. If I am in a good mood, I describe such views as "lacking family values" - and if I am not, they get likened to a "culture of death".

Wow, thanks! Glad I asked. I will start a tab explosion.
Really? That's kind of scary...
His response to it, or that it's done so infrequently? I for one am less worried the less often he writes things that are plainly false, so his being called out rarely doesn't strike me as a cause for concern.

What scares me is that people say EY's position is "plainly false" so rarely. Even if EY is almost always right, you would still expect a huge number of people to say that his positions are plainly false, especially when talking about such difficult and debated questions as those of philosophy and predicting the future.

What scares me is that people say EY's position is "plainly false" so rarely.

What scares me is how often people express this concern relative to how often people actually agree with EY. Eliezer's beliefs and assertions take an absolute hammering. I agree with him fairly often - no surprise, he is intelligent, has a similar cognitive style mine and has spent a whole lot of time thinking. But I disagree with him vocally whenever he seems wrong. I am far from the only person who does so.

If the topics are genuinely difficult, I don't think it's likely that many people who understand them would argue that Eliezer's points are plainly false. Occasionally people drop in to argue such who clearly don't have a very good understanding of rationality or the subject material. People do disagree with Eliezer for more substantive reasons with some frequency, but I don't find the fact that they rarely pronounce him to be obviously wrong particularly worrying.

Most of the people who are most likely to think that EY's positions on things are plainly false probably don't bother registering here to say so. There's one IRC channel populated with smart CS / math majors, where I drop LW links every now and then. Pretty frequently they're met with a rather critical reception, but while those people are happy to tear them apart on IRC, they have little reason to bother to come to LW and explain in detail why they disagree. (Of the things they disagree on, I mainly recall that they consider Eliezer's treatment of frequentism / Bayesianism as something of a strawman and that there's no particular reason to paint them as two drastically differing camps when real statisticians are happy with using methods drawn from both.)
In that case, we got very different impressions about how Eliezer described the two camps; here is what I heard: <channel righteous fury of Eliezer's pure Bayesian soul> It's not Bayesian users on the one hand and Frequentists on the other, each despising the others' methods. Rather, it's the small group of epistemic statisticians and a large majority of instrumentalist ones. The epistemics are the small band of AI researchers using statistical models to represent probability so as to design intelligence, learning, and autonomy. The idea is that ideal models are provably Baysian, and the task undertaken is to understand and implement close approximations of them. The instrumentalist mainstream doesn't always claim that it's representing probability and doesn't feel lost without that kind of philosophical underpinning. Instrumentalists hound whatever problem is at hand with all statistical models and variables that they can muster to get the curve or isolated variable etc. they're looking for and think is best. The most important part of instrumentalist models is the statistician him or herself, which does the Bayesian updating adequately and without the need for understanding. </channel righteous fury of Eliezer's pure Bayesian soul> Saying that the division is a straw man because most statisticians use all methods misses the point. Edit: see for example here and here.
True, but I still wouldn't expect sharp disagreement with Eliezer to be so rare. One contributing factor may be that Eliezer at least appears to be so confident in so many of his positions, and does not put many words of uncertainty into his writing about theoretical issues.

When I first found this site, I read through all the OB posts chronologically, rather than reading the Sequences as sequences. So I got to see the history of several commenters, many of whom disagreed sharply with EY, with their disagreement evolving over several posts.

They tend to wander off after a while. Which is not surprising, as there is very little reward for it.

So I guess I'd ask this a different way: if you were an ethical philosopher whose positions disagreed with EY, what in this community would encourage you to post (or comment) about your disagreements?

if you were an ethical philosopher whose positions disagreed with EY, what in this community would encourage you to post (or comment) about your disagreements?

The presence of a large, sharp, and serious audience. The disadvantage, of course, is that the audience tends not to already be familiar with standard philosophical jargon.

By contrast, at a typical philosophy blog, you can share your ideas with an audience that already knows the jargon, and is also sharp and serious. The disadvantage, of course, is that at the typical philosophy blog, the audience is not large.

These considerations suggest that a philosopher might wish to produce his own competing meta-ethics sequence here if he were in the early stages of producing a semi-popular book on his ideas. He might be less interested if he is interested only in presenting to trained philosophers.

(nods) That makes sense.
Caring about the future of humanity, I suppose, and thinking that SIAI's choices may have a big impact on the future of humanity. That's what motivates me to post my disagreements - in an effort to figure out what's correct.
Unfortunately, believing that the SIAI is likely to have a significant impact on the future of humanity already implies accepting many of its core claims: that the intelligence-explosion view of the singularity is accurate, that special measures to produce a friendly AI are feasible and necessary, and that the SIAI has enough good ideas that it has a reasonable chance of not being beaten to the punch by some other project. Otherwise it's either chasing a fantasy or going after a real goal in a badly suboptimal way, and either way it's not worth spending effort on influencing. That still leaves plenty of room for disagreement with Eliezer, theoretically, but it narrows the search space enough that I'm not sure there are many talented ethical philosophers left in it whose views diverge significantly from the SIAI party line. There aren't so many ethical philosophers in the world that a problem this specialized is going to attract very many of them. As a corollary, I think it's a good idea to downplay the SIAI applications of this site. Human rationality is a much broader topic than the kind of formal ethics that go into the SIAI's work, and seems more likely to attract interesting and varied attention.
5Wei Dai13y
I'm curious if among those you saw left, there were any who you wish had stayed. It seems to me that if one has a a deep need for precise right answers, it's hard to beat participating in the LW community.
Not especially; the disagreements never seem to resolve.
I occasionally go through phases where it amuses me to be heckled about being a deontologist; does that count? (I was at one time studying for a PhD in philosophy and would likely have concentrated in ethics if I'd stayed.)
I suppose? Though if I generalize from that, it seems that the answer to luke's original question is "because not enough people enjoy being heckled." Which, um... well, I guess that's true, in some sense.
It seems to me that EY himself addressed all three of the objections you list (though of course this doesn't imply he addressed them adequately). Moral Error and Moral Disagreement confronts this. My own thinking is that humans tend to have the same underlying (evolved) structures behind our hard-to-articulate meta-ethical heuristics, even when we disagree broadly on object-level ethical issues (and of course hand-pick our articulations of the meta-criteria to support our object-level beliefs- the whole machinery of bias applies here). This implies both that my object-level beliefs can be at odds with their meta-level criteria (if this becomes too obvious for me to rationalize away, I'm more likely to change one or other object-level belief than to change the meta-level heuristic), and that you and I can disagree fundamentally on the object level while still believing that there's something in common which makes argumentation relevant to our disagreement.

Moral Error and Moral Disagreement confronts this

Yeah, I'm the "Richard4" in the comments thread there :-)

OK. I'll reply here because if I reply there, you won't get the notifications.

The crux of your argument, it seems to me, is the following intuition:

Rather, it is essential to the concept of morality that it involves shared standards common to all fully reasonable agents.

This is certainly a property we would want morality to have, and one which human beings naturally assume it must have– but is that the central property of it? Should it turn out that nothing which looks like morality has this property, does it logically follow that all morality is dead, or is that reaction just a human impulse?

(I will note, with all the usual caveats, that believing one's moral sentiments to be universal in scope and not based on preference is a big advantage in object-level moral arguments, and that we happen to be descended from the winners of arguments about tribal politics and morality.)

If a certain set of moral impulses involves shared standards common to, say, every sane human being, then moral arguments would still work among those human beings, in exactly the way you would want them to work across all intelligent beings. Frankly, that's good enough for me. Why give baby-eating aliens in another universe veto powers over every moral intuition of yours?

Thanks for the reply -- I find this a very interesting topic. One thing I should clarify is that my view doesn't entail giving aliens "veto powers", as you put it; an alternative response is to take them to be unreasonable to intrinsically desire the eating of babies. That isn't an intrinsically desirable outcome (I take it), i.e. there is no reason to desire such a thing. Stronger still, we may think it intrinsically undesirable, so that insofar as an agent has such desires they are contrary to reason. (This requires a substantive notion of reason that goes beyond mere instrumental rationality, of course.) In any case, I'd put the crux of my argument slightly differently. The core intuition is just that it's possible to have irresolvable moral disagreements. We can imagine a case where Bob is stubbornly opposed to abortion, and Jane is just as stubbornly in favour of it, and neither agent is disposed to change their mind in light of any additional information. EY's view would seem to imply that the two agents mustn't really disagree. And that just seems a mistake: it's part of our concept of morality that this very concept could be shared by someone who fundamentally (and irresolvably) disagrees with us about what the substantive moral facts are. This is because we're aspiring to conform our judgments to a standard that is outside of ourselves. (If you don't think there are any such objective standards, then that's just to say that there are no normative facts, given my concept of normativity.)
Richard, hello. Human beings are analogous to computers. Morality and other aspects of behavior and cognition are analogous to programs. It is a type error to ask whether a program "really exists" somewhere outside a computer, or is "intrinsic" to a computer, or is "contingent", or something like that. Such questions don't correspond to observations within the world that could turn out one way or the other. You see a computer running a certain program and that's the end of the story. Your mind is a program too, and your moral intuitions are how your algorithm feels from inside, not a direct perception of external reality (human beings are physically incapable of that kind of thing, though they may feel otherwise). I know for a fact that you have no astral gate in your head to pull answers from the mysterious source of morality. But this doesn't imply that your moral intuitions "should" be worthless to you and you "should" seek external authority! There's nothing wrong with mankind living by its internal moral lights. Yes, it's possible that different computers will have different programs. Our world contains billions of similar "moist robots" running similar programs, perhaps because we were all created from design documents that are 99% identical for historical reasons, and also because we influence each other a lot. Your intuition that all "possible" sentient agents must share a common morality is unlikely to survive an encounter with any sentient agent that's substantially different from a human. We can imagine such agents easily, e.g. a machine that will search for proofs to Goldbach's conjecture and turn surrounding matter and energy into computing machinery to that end. Such a machine may be more ingenious than any human in creating other machines, discovering new physics, etc., but will never gravitate toward your intuition that one shouldn't kill babies. Most possible "intelligent agents" (aka algorithms that can hit small targets in large search spaces)
I expect Richard's memeset has understanding of all your points that doesn't move his current position. You're probably exposing him to arguments he has already encountered, so there's little point in expecting a different result. I'm not saying that Richard can't be moved by argument, just not by standard argument that is already known to have failed to move him. He even probably "agrees" with a lot of your points, just with a different and more sophisticated understanding than yours. On the other hand, it might work for the benefit of more naive onlookers.

The intent of my comment wasn't to convince Richard (I never do that), but to sharpen our points and make him clarify whatever genuine insight he possesses and we don't.

That's a motivation I didn't consider. (Agreed.)
Yeah, as Vladimir guessed, this is all familiar. Your last paragraph suggests that you've misunderstood my view. I'm not making an empirical claim to the effect that all agents will eventually converge to our values -- I agree that that's obviously false. I don't even think that all formally intelligent agents are guaranteed to have normative concepts like 'ought', 'reason', or 'morality'. The claim is just that such a radically different agent could share our normative concepts (in particular, our aspiration to a mind-independent standard), even if they would radically disagree with us about which things fall under the concept. We could both have full empirical knowledge about our own and each other's desires/dispositions, and yet one (or both) of us might be wrong about what we really have reason to want and to do. (Aside: the further claim about "reasons" in your last sentence presupposes a subjectivist view about reasons that I reject.)
What use is this concept of "reasonability"? Let's say I build an agent that wants to write the first 1000 Fibonacci numbers in mile-high digits on the Moon, except skipping the 137th one. When you start explaining to the agent that it's an "arbitrary omission" and it "should" amend its desires for greater "consistency", the agent just waves you off because listening to you isn't likely to further its current goals. Listening to you is not rational for the agent in the sense that most people on LW use the term: it doesn't increase expected utility. If by "rational" you mean something else, I'd like to understand what exactly.
I mean 'rational' in the ordinary, indefinable sense, whereby calling a decision 'irrational' expresses a distinctive kind of criticism -- similar to that expressed by the words 'crazy', 'foolish', 'unwise', etc. (By contrast, you can just say "maximizes expected utility" if you really mean nothing more than maximizes expected utility -- but note that that's a merely descriptive concept, not a normative one.) If you don't possess this concept -- if you never have thoughts about what's rational, over and above just what maximizes expected utility -- then I can't help you.
I don't think we can make progress with such imprecise thinking. Eliezer has a nice post about that.
When we're trying to reduce intuitions, there's no helping starting from informal ideas. Another question is that it's not proper to stop there, but Richard doesn't exactly suggest that. A more salient to me question is, What are suggestions about redefining this "rational" intuitive idea good for, if it's supposed to be the source material? Such question even explains how the idea of considering, say, "consciousness", in a more precise sense is methodologically a step in the wrong direction: when words are the data you work with, you should be careful to assign new words to new ideas used for analyzing them.
4Wei Dai13y
In this old comment, he does seem to suggest stopping there. I'm not sure I understand your second paragraph. Are you suggesting that if we come up with a new theory to explain some aspect of consciousness, we should use a word other than "consciousness" in that theory, to avoid potentially losing some of our intuitions about consciousness?
First, EY makes it abundantly clear that two agents can have a fundamental disagreement on values– it's just not the best (or most helpful) assumption when you're talking about two sane human beings with a vast sea of common frameworks and heuristics. Secondly, I'm worried about what you're trying to do with words when you suggest we "take them to be unreasonable to intrinsically desire the eating of babies". If you're making an empirical claim that an alien with fundamentally different terminal values will (say) be uninterested in negotiating mutually beneficial deals, or will make patently suboptimal decisions by its own criteria, or exhibit some other characteristic of what we mean by "unreasonable", then you'd need some strong evidence for that claim. If instead you openly redefine "reasonable" to include "shares our fundamental moral standards", then the property becomes a tautology which no longer excludes "meta-semantic subjectivism", as you put it. So I'm puzzled what you mean.
Talking past each other a bit here. Let me try again. EY allows for disagreement in attitude: you might want one thing, while the babyeaters want something different. Of course I'm not charging him with being unable to accommodate this. The objection is instead that he's unable to accommodate disagreement in moral judgment (at the fundamental level). Normativity as mere semantics, and all that. Your second point rests on a false dichotomy. I'm not making an empirical claim, but nor am I merely defining the word "reasonable". Rather, I'm making a substantive normative (non-empirical) hypothesis about which things are reasonable. If you can't make sense of the idea of a substantive non-empirical issue, you may have fallen victim to scientism.
What is the difference between an in-principle irresolvable disagreement (moral or otherwise), and talking past each other (i.e. talking of different subject matters, or from different argument-processing frameworks)?
What fact have you established by manipulating the definition of a word in this manner? I want a meta-ethical theory that at least describes baby-eaters, because I don't expect to have object-level understanding of human morality that is substantially more accurate than what you'd get if you add baby-eating impulses to it.
Ah! Sorry for carrying coals to Newcastle, then. Let me catch up in that thread.
Yes, this is what I thought EY's theory was. EY? Is this your view?
5Wei Dai12y
This summary of Eliezer's position seems to ignore the central part about computation. That is, Eliezer does not say that 'Right' means 'promotes external goods X, Y and Z' but rather that it means a specific computation that can be roughly characterized as 'renormalizing intuition' which eventually outputs something like 'promotes external goods X, Y and Z'. I think Eliezer would argue that at least some of the objections list here are not valid if we add the part about computation. (Specifically, disagreements and fallibility can result from from lack of logical omniscience regarding the output of the 'morality' computation.) Is the reason for skipping over this part of Eliezer's idea that standard (Montague) semantic theory treats all logically equivalent language as having the same intension? (I believe this is known as "the logical omniscience problem" in linguistics and philosophy of language.)
The part about computation doesn't change the fundamental structure of the theory. It's true that it creates more room for superficial disagreement and fallibility (of similar status to disagreements and fallibility regarding the effective means to some shared terminal values), but I see this as an improvement in degree and not in kind. It still doesn't allow for fundamental disagreement and fallibility, e.g. amongst logically omniscient agents. (I take it to be a metaethical datum that even people with different terminal values, or different Eliezerian "computations", can share the concept of a normative reason, and sincerely disagree about which (if either) of their values/computations is correctly tracking the normative reasons. Similarly, we can coherently doubt whether even our coherently-extrapolated volitions would be on the right track or not.)
7Wei Dai12y
It's not clear to me why there must be fundamental disagreement and fallibility, e.g. amongst logically omniscient agents. Can you refer me to an argument or intuition pump that explains why you think that?
One related argument is the Open Question Argument: for any natural property F that an action might have, be it promotes my terminal values, or is the output of an Eliezerian computation that models my coherent extrapolated volition, or whatever the details might be, it's always coherent to ask: "I agree that this action is F, but is it good?" But the intuitions that any metaethics worthy of the name must allow for fundamental disagreement and fallibility are perhaps more basic than this. I'd say they're just the criteria that we (at least, many of us) have in mind when insisting that any morality worthy of the name must be "objective", in a certain sense. These two criteria are proposed as capturing that sense of objectivity that we have in mind. (Again, don't you find something bizarrely subjectivist about the idea that we're fundamentally morally infallible -- that we can't even question whether our fundamental values / CEV are really on the right track?)

I'd say they're just the criteria that we (at least, many of us) have in mind when insisting that any morality worthy of the name must be "objective", in a certain sense.

What would you say to someone who does not share your intuition that such "objective" morality likely exists?

My main problem with objective morality is that while it's hard to deny that there seem to be mind-independent moral facts like "pain is morally bad", there doesn't seem to be enough such facts to build an ethical system out of them. What natural phenomena count as pain, exactly? How do we trade off between pain and pleasure? How do we trade off between pain in one person, and annoyance in many others? How do we trade off pain across time (i.e., should we discount future pain, if so how)? Across possible worlds? How do we morally treat identical copies? It seems really hard, perhaps impossible, to answer these questions without using subjective preferences or intuitions that vary from person to person, or worse, just picking arbitrary answers when we don't even have any relevant preferences or intuitions. If it turns out that such subjectivity and/or arbitrariness can't be a... (read more)

Compare with formal systems giving first-order theories of standard model of natural numbers. You can't specify the whole thing, and at some point you run into (independent of what comes before) statements for which it's hard to decide whether they hold for the standard naturals, and so you could add to the theory either those statements or their negation. Does this break the intuition that there is some intended structure corresponding to natural numbers, or more pragmatically that we can still usefully seek better theories that capture it? For me, it doesn't in any obvious way.
3Wei Dai12y
It seems to be an argument in favor of arithmetic being objective that almost everyone agree that a certain a set of axioms correctly characterize what natural numbers are (even if incompletely), and from that set of axioms we can derive much (even if not all) of what we want to know about the properties of natural numbers. If arithmetic were in the same situation as morality is today, it would be much harder (i.e., more counterintuitive) to claim that (1) everyone is referring to the same thing by "arithmetic" and "natural numbers" and (2) arithmetic truths are mind-independent. To put it another way, conditional on objective morality existing, you'd expect the situation to be closer to that of arithmetic. Conditional on it not existing, you'd expect the situation to be closer to what it actually is.
I'd say: be an error theorist! If you don't think objective morality exists, then you don't think that morality exists. That's a perfectly respectable position. You can still agree with me about what it would take for morality to really exist. You just don't think that our world actually has what it takes.
5Wei Dai12y
Yes, that makes sense, except that my intuition that objective morality does not exist is not particularly strong either. I guess what I was really asking was, do you have any arguments to the effect that objective morality exists?
EY bites this bullet in the abstract, but notes that it does not apply to humans. An AI with a simple utility function and full ability to analyze its own source code can be quite sure that maximizing that function is the meaning of "that-AI-right" in the sense EY is talking about. But there is no analogue to that situation in human psychology, given how much we now know about self-deception, our conscious and unconscious mental machinery, and the increasing complexity of our values the more we think on them. We can, it's true, say that "the correct extrapolation of my fundamental values is what's right for me to do", but this doesn't guarantee whether value X is or is not a member of that set. The actual work of extrapolating human values (through moral arguments and other methods) still has to be done. So practical objections to this sort of bullet-biting don't apply to this metaethics; are there any important theoretical objections? EDIT: Changed "right" to "that-AI-right". Important clarification.
I don't think that's right, or EY's position (I'd like evidence on that). Who's to say that maximization is precisely what's right? That might be a very good heuristic, but upon reflection the AI might decide to self-improve in a way that changes this subgoal (of the overall decision problem that includes all the other decision-making parts), by finding considerations that distinguish maximizing attitude to utility and the right attitude to utility. It would of course use its current utility-maximizing algorithm to come to that decision. But the conclusion might be that too much maximization is bad for environment or something. The AI would stop maximizing for the reason it's not the most maximizing thing, the same way as a person would not kill for the reason that action leads to a death, even though avoid-causing-death is not the whole morality and doesn't apply universally. See also this comment.
Agreed that on EY's view (and my own), human "fundamental values" (1) have not yet been fully articulated/extrapolated; that we can't say with confidence whether X is in that set. But AFAICT, EY rejects the idea (which you seem here to claim that he endorses?) that an AI with a simple utility function can be sure that maximizing that function is the right thing to do. It might believe that maximizing that function is the right thing to do, but it would be wrong. (2) AFAICT this is precisely what RichardChappell considers implausible: the idea that unlike the AI, humans can correctly believe that maximizing their utility function is the right thing to do. == (1) Supposing there exist any such things, of which I am not convinced. (2) Necessarily wrong, in fact, since on EY's view as I understand it there's one and only one right set of values, and humans currently implement it, and the set of values humans implement is irreducably complex and therefore cannot be captured by a simple utility function. Therefore, an AI maximizing a simple utility function is necessarily not doing the right thing on EY's view.
Sorry, I meant to use the two-place version; it wouldn't be what's right; what I meant is that the completely analogous concept of "that-AI-right" would consist simply of that utility function.
To the extent that you are still talking about EY's views, I still don't think that's correct... I think he would reject the idea that "that-AI-right" is analogous to right, or that "right" is a 2-place predicate. That said, given that this question has come up elsethread and I'm apparently in the minority, and given that I don't understand what all this talk of right adds to the discussion in the first place, it becomes increasingly likely that I've just misunderstood something. In any case, I suspect we all agree that the AI's decisions are motivated by its simple utility function in a manner analogous to how human decisions are motivated by our (far more complex) utility function. What disagreement exists, if any, involves the talk of "right" that I'm happy to discard altogether.
Thanks, Richard, for putting so much effort into your comment! When I find the time to parse this, I'll come back here to comment.
(1): I think it's a prominent naturalistic feature; as EY said above, in a physical universe there are only quantum amplitudes, and if two agents have sufficiently accurate knowledge about the physical configuration of something, including their respective minds, they have to agree about that configuration, regardless of that they possibly have different values. (2): I'm personally a bit confused about Eliezer's constant promotion of a language that de-subjectivizes morality. In most debates "objective" and "subjective" may entail a confusion when viewed in a naturalistic light; however, as I understand Eliezer's stance does boil down to a traditionally subjective viewpoint in the sense that it opposes the religious notion of morality as light shining down from the skies (and the notion of universally compelling arguments). In regards to infallibility, an agent at most times has imperfect knowledge of right; I can't see how subjectivity entails infallibility. I don't even have perfect access to my current values, and there is also a huge set of moral arguments that would compel me to modify my current values if I heard them. (3) The "why right means promoting X and Y" question is addressed by a recursive justification as discussed here and very specifically in the last paragraphs of Meaning of Right. If I ask "why should I do what is right?", that roughly means "why should I do what I should do?" or "why is right what is right?". I happen to be a mind that is compelled by a certain class of moral arguments, and I can reflect on this fact using my current mind, and, naturally, find that I'm compelled by a certain class of moral arguments. EDIT: see also komponisto's comment.
re: infallibility -- right, the objection is not that you could infallibly know that XYZ is right. Rather, the problem is that you could infallibly know that your fundamental values are right (though you might not know what your fundamental values are).
Rephrased, this knowledge is just the notion that you instantiate some computation instead of not doing (or being) anything. This way, my confidence in its truth is very high, although of course not 1.
We know we instantiate some computation. But it's a pre-theoretic datum that we don't know that our fundamental values are right. So EY's theory misdescribes the concept of rightness. (This is basically a variation on Moore's Open Question Argument.)
Huh? I'd be okay with a strong AI that correctly followed my values, regardless of whether they're "right" by any other criterion. If you think you wouldn't be okay with such an AI, I suspect the most likely explanation is that you're confused about the concept of "your values". Namely, if you yearn to discover some simple external formula like the categorical imperative and then enact the outcomes prescribed by that formula, then that's just another fact about your personal makeup that has to be taken into account by the AI. And if you agree that you would be okay with such an AI, that means Eliezer's metaethics is adequate for its stated goal (creating friendly AI), whatever other theoretical drawbacks it might have.
What is objection (1) saying? That asserting there are moral facts is incompatible with the fact that people disagree about what they are? Specifically, when people agree that there is such a thing as a reason that applies to both of them, they disagree about how the reason is caused by reality? Do we not then say they are both wrong about there being one "reason"? I speak English(LD). You speak English(RC). The difference between our languages is of the same character as that between a speaker of Spanish and a speaker of French. I say "I" and you correctly read it as referring to lessdazed. You say "I" and I correctly read it as referring to RichardChapell. I have reasons(LD). You have reasons(RC). Do you think that were we perfect at monitoring what we each meant when we said anything and knew the relevant consequences of actions, the two of us would be capable of disagreeing when one of us asserted something in a sentence using the word "moral"? Why? Or have I misread things?
No, I think there are moral facts and that people disagree about what they are. But such substantive disagreement is incompatible with Eliezer's reductive view on which the very meaning of 'morality' differs from person to person. It treats 'morality' like an indexical (e.g. "I", "here", "now"), which obviously doesn't allow for real disagreement. Compare: "I am tall." "No, I am not tall!" Such an exchange would be absurd -- the people are clearly just talking past each other, since there is no common referent for 'I'. But moral language doesn't plausibly function like this. It's perfectly sensible for one person to say, "I ought to have an abortion", and another to disagree: "No, you ought not to have an abortion". (Even if both are logically omniscient.) They aren't talking past each other. Rather, they're disagreeing about the morality of abortion.
It's not plausible(RC, 7/1/2011 4:25 GMT), but it is plausible(LD, 7/1/2011 4:25 GMT). It's not impossible for people to be confused in exactly such a way. That's begging the question. That intuition pump imagines intelligent people disagreeing, finds it plausible, notices that intelligent people disagreeing proves nothing, then replaces the label "intelligent" with "omniscient" (since that, if proven, would prove something) without showing the work that would make the replacement valid. If the work could be shown, the intuition pump wouldn't be very valuable, as one could just use the shown work for persuasion rather than the thought experiment with the disagreeing people. I strongly suspect that the reason the shown work is unavailable is because it does not exist. Forget morality for one second. Doesn't the meaning of the word "hat" differ from person to person? It's only sensible to say if/because context forestalls equivocation (or tries to, anyway). Retroactively removing the context by coming in the conversation with a different meaning of ought (even if the first meaning of "ought" was "objective values, as I think they are, as I think I want them to be, that are universally binding on all possible minds, and I would maintain under any coherent extrapolation of my values" where the first person is wrong about those facts and the second meaning of "ought" is the first person's extrapolated volition) introduces equivocation. It's really analogous to saying "No, I am not tall". Where the first person says "X would make me happy, I want to feel like doing X, and others will be better off according to balancing equation Y if I do X, and the word "ought" encompasses when those things coincide according to objective English, so I ought to do X", and the second person says "X would make you happy, you want to feel like doing X, and others will not be better off according to balancing equation Z if you do X, and the word "ought" encompasses when those things co
You're confusing metaethics and first-order ethics. Ordinary moral debates aren't about the meaning of "ought". They're about the first-order question of which actions have the property of being what we ought to do. People disagree about which actions have this property. They posit different systematic theories (or 'balancing equations' as you put it) as a hypothesis about which actions have the property. They aren't stipulatively defining the meaning of 'ought', or else their claim that "You ought to follow the prescriptions of balancing equation Y" would be tautological, rather than a substantive claim as it is obviously meant to be.
I know that, which is why I said "Purported debates about the true meaning of 'ought'" rather than "ordinary debates, which are about the true meaning of 'ought'". Please be careful not to beg the question. People agree that there is such a property, but that is something about which they can be wrong. Rather, they aren't trying to stipulatively define the meaning of 'ought', or else their claim that "You ought to follow the prescriptions of balancing equation Y" would be tautological. In fact, due to people's poor self-insight, time limits, and the sometimes over-coarse granularity of language, they do not stipulate their actual balancing equation. Had they perfect insight and ability to represent their insights, it would be such a tautology. They would cease to speak like that had they the additional insight that for it to do the work it is called upon to do,"ought" is a word that needs grounding in the context of the real reasons for action of beings More generally, they are speaking an idiolect even regarding other definitions. It's meant to be such a claim, but it is in error because the speaker is confused about morality, and in a sense is not even wrong. They are claiming some actions have an objective moral valuation binding upon all intelligent beings, but they may as well claim the action has the property of being a square circle - or better yet, a perfect circle for pi is exactly 3, which is something I have witnessed a religious person claim is true. ~~~~~~~~~ I don't understand either why you believe as you do or what good justification you might have for it. I can see why one might want to make truth claims in which it falls out the folk have the least amount of confusion to be embarrassed about and are least wrong, and if one begins with the assumption that there are "moral facts" in the strongest sense, that's a good start. However, that neither prevents one from having to say they are wrong about an enormous amount nor does it prevent one fro
I'm not arguing for moral realism here. I'm arguing against metaethical reductionism, which leaves open either realism OR error theory. For all I've said, people may well be mistaken when they attribute normative properties to things. That's fine. I'm just trying to clarify what it is that people are claiming when they make moral claims. This is conceptual analysis, not metaphysics. I'm pointing out that what you claim to be the meaning of 'morality' isn't what people mean to be talking about when they engage in moral discourse. I'm not presupposing that ordinary people have any great insight into the nature of reality, but they surely do have some idea of what their own words mean. Your contrary linguistic hypothesis seems completely unfounded.
When I was young, I learned that the tooth fairy was really my mother all along. What do you think of that? (This isn't meant to be insulting or anything similar.)
No, you learned that the tooth fairy doesn't exist, and that your mother was instead responsible for the observable phenomena that you had previously attributed to the tooth fairy. (It's a good analogy though. I do think that claiming that morality exists "as a computation" is a lot like claiming that the tooth fairy really exists "as one's mother".)
Yes. No. Usually, the first thing to do when guessing about a random number from 1-100 is to split the possibilities in half by asking if it is more than 50 (or odd. Etc.) The tooth fairy example gets a variety of responses, from people insisting it is just objectively wrong to say "the tooth fairy doesn't exist" to those saying it is just objectively wrong to say the tooth fairy was really my mother. I happen to agree with you about what the best way is to describe what went on in this specific case. However, this is a standard blegg-rube situation that is unusual only in that it is not clear which way is best to describe the phenomenon to others. There is a constellation of phenomena that correlate to each other - the fairy being female, being magic, having diaphanous wings, collecting things for money, those things being stored under pillows, those things being teeth. None of these is qualitatively essential to be a tooth fairy to most people than "having ten fingers" is essential to being human. If tomorrow we learn that magic is real, a female sprite collects teeth from under pillows, and does so on the back of a termite (and has size-changing technology/magic, why not?), most people would naively say "the tooth fairy does not fly, but burrows on the back of a termite". That's OK, but not great if the true nature of the situation is not recognized, and they fall into error if they think "tooth fairy" has a meaning divorced from flight. Likewise, those who say "there was never a 'tooth fairy', there is rather the 'burrowing tooth fairy'" are right that there was never a thing exactly like the classic description, but this group makes an error if they demand the first stop calling the "burrowing tooth fairy" the "tooth fairy". There is more to say, an individual who makes up explanations ad hoc is not communicating, and the relative confluence of idiolects is valid because of the tinkerbell effect. that makes saying "No, you learned that the tooth fairy does
Seriously? I've never heard anyone insist that the tooth fairy really exists (in the form of their mother). It would seem most contrary to common usage (in my community, at least) to use 'Tooth Fairy' to denote "whoever replaced the tooth under my pillow with a coin". The magical element is (in my experience) treated as essential to the term and not a mere "connotation". I've heard of the saying you mention, but I think you misunderstand people when you interpret it literally. My response was not intended as some "peculiar" declaration of mind-independent meaning facts, but rather as a straightforward interpretation of what people who utter such claims have in mind when they do so. (Ask them, "Do you mean that the tooth fairy exists?" and I expect the response, "No, silly, I just mean that my mother is responsible for the coin under my pillow.") So, to clarify: I don't think that there are free-floating "meaning" facts out there independently of our linguistic dispositions. I just dispute whether your definitions adequately capture the things that most people really care about (i.e. treat as essential) when using the terms in question. It's no excuse to say that metaethical reductionism "gets reality right" when the whole dispute is instead over whether they have accommodated (or rather eliminated) some concept of which we have a pre-theoretic grasp. Compare the theological reductionist thesis that "God is love". Love exists, therefore God exists, voila! If someone pointed out that this view is needlessly misleading since love is not what most people mean to be talking about when they speak of 'God' (and it would be more honest to just admit one's atheism), it would be no response to give a lecture about constellations and tinkerbell.
4Wei Dai12y
What if metaethical reductionism is not meant (by some) to accommodate the pre-theoretic grasp of "morality" of most people, but just to accommodate the pre-theoretic grasp of "morality" of people like lessdazed? Could metaethical reductionism be considered a "respectable position" in that sense? And separately, suppose the main reason I'm interested in metaethics is that I am trying to answer a question like "Should I terminally value the lives of random strangers?" and I'm not sure what that question means exactly or how I should go about answering it. In this case, is there a reason for me to care much about the pre-theoretic grasp of most people, as opposed to, say, people I think are most likely to be right about morality?
This is a good example, as people saying this are in some ways doing the opposite of what you advocate, but in other ways, they are doing the same thing. I think people are motivated to say "God is love" out of a desire to avoid being logically compelled to view certain others as wrong (there is a twist, some passive aggressive non-believers say it to claim atheists are wrong when saying theists are wrong). The exact motivation isn't important, but its presence would provide a countervailing force against the sheer silliness of their confusion (which is worse than yours in an important way) and explain how they could possibly make it. The mistake being made is to pretend that there are inherently attached meanings to a word, as if those words simply had that meaning, regardless of context of using words in general, and that word in particular, when that context is totally adequate to explain meaning. When it is clear that contemporary theists and all their ancestors were in error, the hippie says "God is love" to pretend there is no disagreement, and that "God" meant love - instead of what was obviously meant, or often weirdly in addition to the contradictory thing that they admit was meant. You do the opposite in that rather than seek ways to interpret others as agreeing or being right, even when there is obvious disagreement, you seek ways to interpret others as disagreeing or being wrong. You use the exact same mechanism of ignoring context and how language is used. You avoid the worst error of the hippies, which is claiming that others mean both "God is love" and "God is an agent, etc." However, reductionists take the history of language and find words have many connotations, the meaning of "moral" and find it has a long history of meaning many different things, many of several things, and that the meaning people accept as associated to the term has to do with the quantity and quality of many aspects, none of which are intrinsic. You have apparently decide
That depends whether you are going to replace objective morality with subective morality or with error theory. It can be argued that subjectivism is an unstable position that amount to error theory.
This seems to me like begging the question. Can you expand on this?
Thinking more about this, it may have been better if Eliezer had not framed his meta-ethics sequence around "the meaning of right." If we play rationalist's taboo with our moral terms and thus avoid moral terms altogether, what Eliezer seems to be arguing is that what we really care about is not (a) that whatever states of affairs our brains are wired to send reward signals in response to be realized, but (b) that we experience peace and love and harmony and discovery and so on. His motivation for thinking this way is a thought experiment - which might become real in the relatively near future - about what would happen if a superintelligent machine could rewire our brains. If what we really care about is (a), then we shouldn't object if the superintelligent machine rewires our brains to send reward signals only when we are sitting in a jar. But we would object to that scenario. Thus, what we care about seems not to be (a) but (b). In a meta-ethicists terms, we could interpret Eliezer not as making an argument about the meaning of moral terms, but instead as making an argument that (b) is what gives us Reasons, not (a). Now, all this meta-babble might not matter much. I'm pretty sure even if I was persuaded that the correct meta-ethical theory states that I should be okay with releasing a superintelligence that would rewire me to enjoy sitting in a jar, I would do whatever I could to prevent such a scenario and instead promote a superintelligence that would bring peace and joy and harmony and discovery and so on.
I thought being persuaded of a metaethical theory entails that whenever the theory tells you you should do X, you would feel compelled to do X.
This is a cool formulation. It's interesting that there are other things that can happen to you not similar to "being persuaded of a metaethical theory" that entail that whenever you are told to do X you're compelled to do X. (Voodoo or whatever.)
Only if motivational internalism is true. But motivational internalism is false.
What's that?
Here, let me Google that for you.
I could get into how much I hate this kind of rejoinder if you bait me some more. I wasn't asking you for the number of acres in a square mile. Let me just rephrase: I hadn't heard of motivational internalism before, could you expand your comment?
I don't see what plausible reasoning process could lead you to infer this unlikely statement (about motivation, given how many detail would need to be just right for the statement to happen to be true). Also, even if you forbid modifying definition of human brain, things that initiate high-reward signals in our brains (or that we actually classify as "harmony" or "love") are very far from what we care about, just as whatever a calculator actually computes is not the same kind of consideration as the logically correct answer, even if you use a good calculator and aren't allowed sabotage. There are many reasons (and contexts) for reward in human brain to not be treated as indicative of goodness of a situation.
I don't understand your second paragraph. It sounds like you are agreeing to me, but your tone suggests you think you are disagreeing with me.
It was an explanation for why your thought experiment provides a bad motivation: we can just forbid modification of human brains to stop the thought experiment from getting through, but that would still leave a lot of problems, which shows that just this thought experiment is not sufficient motivation.
Sure, the superintelligence thought experiment is not the fully story. One problem with the suggestion of writing a rule to not alter human brains comes in specifying how the machine is not allowed to alter human brains. I'm skeptical about our ability to specify that rule in a way that does not lead to disastrous consequences. After all, our brains are being modified all the time by the environment, by causes that are on a wide spectrum of 'direct' and 'indirect.' Other problems with adding such a rule are given here.
(I meant that subjective experience that evaluates situations should be specified using unaltered brains, not that brains shouldn't be altered.)
You've got my curiosity. What does this mean? How would you realize that process in the real world?
Come on, this tiny detail isn't worth the discussion. Classical solution to wireheading, asking the original and not the one under the influence, referring to you-at-certain-time and not just you-concept that resolves to something unpredicted at any given future time in any given possible world, rigid-designator-in-time.

The closest point I've found to my metaethics in standard philosophy was called "moral functionalism" or "analytical descriptivism".

Cognitivism: Yes, moral propositions have truth-value, but not all people are talking about the same facts when they use words like "should", thus creating the illusion of disagreement.

Motivation: You're constructed so that you find some particular set of logical facts and physical facts impel you to action, and these facts are what you are talking about when you are talking about morality: for example, faced with the problem of dividing a pie among 3 people who all worked equally to obtain it and are all equally hungry, you find the mathematical fact that 1/3, 1/3, 1/3 is an equal division compelling - and more generally you name the compelling logical facts associated with this issue as "fairness", for example.

(Or as it was written in Harry Potter and the Methods of Rationality:

"Mr. Potter, in the end people all do what they want to do. Sometimes people give names like 'right' to things they want to do, but how could we possibly act on anything but our own desires?"

"Well, obviously I couldn't a... (read more)

Eliezer, Thanks for your reply! Hopefully you'll have time to answer a few questions... 1. Can anything besides Gary's preferences provide a justification for saying that "Gary should_gary X"? (My own answer would be "No.") 2. By saying "Gary should_gary X", do you mean that "Gary would X if Gary was fully informed and had reached a state of reflective equilibrium with regard to terminal values, moral arguments, and what Gary considers to be a moral argument"? (This makes should-statements "subjectively objective" even if they are computationally intractable, and seems to capture what you're saying in the paragraph here that begins "But the key notion is the idea that...") 3. Or, perhaps you are saying that one cannot give a concise definition of "should," as Larry D'Anna interprets you to be saying?

Can anything besides Gary's preferences provide a justification for saying that "Gary should_gary X"? (My own answer would be "No.")

This strikes me as an ill-formed question for reasons I tried to get at in No License To Be Human. When Gary asks "What is right?" he is asking the question e.g. "What state of affairs will help people have more fun?" and not "What state of affairs will match up with the current preferences of Gary's brain?" and the proof of this is that if you offer Gary a pill to change his preferences, Gary won't take it because this won't change what is right. Gary's preferences are about things like fairness, not about Gary's preferences. Asking what justifies should_Gary to Gary is either answered by having should_Gary wrap around and judge itself ("Why, yes, it does seem better to care about fairness than about one's own desires") or else is a malformed question implying that there is some floating detachable ontologically basic property of rightness, apart from particular right things, which could be ripped loose of happiness and applied to pain instead and make it good to do evil.

By saying &qu

... (read more)

Damn. I still haven't had my "Aha!" moment on this. I'm glad that ata, at least, appears to have it, but unfortunately I don't understand ata's explanation, either.

I'll understand if you run out of patience with this exercise, but I'm hoping you won't, because if I can come to understand your meta-ethical theory, then perhaps I will be able to explain it to all the other people on Less Wrong who don't yet understand it, either.

Let me start by listing what I think I do understand about your views.

1. Human values are complex. As a result of evolution and memetic history, we humans value/desire/want many things, and our values cannot be compressed to any simple function. Certainly, we do not only value happiness or pleasure. I agree with this, and the neuroscience supporting your position is nicely summarized in Tim Schroeder's Three Faces of Desire. We can value damn near anything. There is no need to design an artificial agent to value only one thing, either.

2. Changing one's meta-ethics need not change one's daily moral behavior. You write about this here, and I know it to be true from personal experience. When deconverting from Christianity, I went from divine command th... (read more)

1-4 yes.

5 is questionable. When you say "Nothing is fundamentally moral" can you explain what it would be like if something was fundamentally moral? If not, the term "fundamentally moral" is confused rather than untrue; it's not that we looked in the closet of fundamental morality and found it empty, but that we were confused and looking in the wrong closet.

Indeed my utility function is generally indifferent to the exact state of universes that have no observers, but this is a contingent fact about me rather than a necessary truth of metaethics, for indifference is also a value. A paperclip maximizer would very much care that these uninhabited universes contained as many paperclips as possible - even if the paperclip maximizer were outside that universe and powerless to affect its state, in which case it might not bother to cognitively process the preference.

You seem to be angling for a theory of metaethics in which objects pick up a charge of value when some valuer values them, but this is not what I think, because I don't think it makes any moral difference whether a paperclip maximizer likes paperclips. What makes moral differences are things like, y'know, life, consciousness, activity, blah blah.

Eliezer, In Setting Up Metaethics, you wrote: I didn't know what "fundamentally moral" meant, so I translated it to the nearest term with which I'm more familiar, what Mackie called "intrinsic prescriptivity." Or, perhaps more clearly, "intrinsic goodness," following Korsgaard: So what I mean to say in (5) is that nothing is intrinsically good (in Korsgaard's sense). That is, nothing has value in itself. Things only have value in relation to something else. I'm not sure whether this notion of intrinsic value is genuinely confused or merely not-understood-by-Luke-Muehlhauser, but I'm betting it is either confused or false. ("Untrue" is the term usually used to capture a statement's being either incoherent or meaningful-and-false: see for example Richard Joyce on error theory.) But now, I'm not sure you agree with (5) as I intended it. Do you think life, consciousness, activity, and some other things have value-in-themselves? Do these things have intrinsic value? Thanks again for your reply. I'm going to read Chappell's comment on this thread, too.

Do you think a heap of five pebbles is intrinsically prime, or does it get its primeness from some extrinsic thing that attaches a tag with the five English letters "PRIME" and could in principle be made to attach the same tag to composite heaps instead? If you consider "beauty" as the logical function your brain's beauty-detectors compute, then is a screensaver intrinsically beautiful?

Does the word "intrinsic" even help, considering that it invokes bad metaphysics all by itself? In the physical universe there are only quantum amplitudes. Moral facts are logical facts, but not all minds are compelled by that-subject-matter-which-we-name-"morality"; one could as easily build a mind to be compelled by the primality of a heap of pebbles.

So the short answer is that there are different functions that use the same labels to designate different relations while we believe that the same labels designate the same functions?
I wonder if Max Tegmark would have written a similar comment. I'm not sure if there is a meaningful difference regarding Luke's question to say that there are only quantum amplitudes versus there are only relations.
6Eliezer Yudkowsky13y
What I'm saying is that in the physical world there are only causes and effects, and the primeness of a heap of pebbles is not an ontologically basic fact operating as a separate and additional element of physical reality, but it is nonetheless about as "intrinsic" to the heap of pebbles as anything. Once morality stops being mysterious and you start cashing it out as a logical function, the moral awfulness of a murder is exactly as intrinsic as the primeness of a heap of pebbles. Just as we don't care whether pebble heaps are prime or experience any affect associated with its primeness, the Pebblesorters don't care or compute whether a murder is morally awful; and this doesn't mean that a heap of five pebbles isn't really prime or that primeness is arbitrary, nor yet that on the "moral Twin Earth" murder could be a good thing. And there are no little physical primons associated with the pebble-heap that could be replaced by compositons to make it composite without changing the number of pebbles; and no physical stone tablet on which morality is written that could be rechiseled to make murder good without changing the circumstances of the murder; but if you're looking for those you're looking in the wrong closet.
Good answer!
It is rather difficult to ask that question in the way you intend it. Particularly if the semantics have "because I say so" embedded rather than supplemented.
BTW, in your post Are Your Enemies Innately Evil?, I think you are making a similar mistake about the concept of evil.
"Innately" is being used in that post in the sense of being a fundamental personality trait or a strong predisposition (as in "Correspondance Bias", to which that post is a followup). And fundamental personality traits and predispositions do exist — including some that actually do predispose people toward being evil (e.g. sociopathy) — so, although the phrase "innately evil" is a bit dramatic, I find its meaning clear enough in that post's context that I don't think it's a mistake similar to "fundamentally moral". It's not arguing about whether there's a ghostly detachable property called "evil" that's independent of any normal facts about a person's mind and history.
He did, by implication, in describing what it's like if nothing is: Clearly, many of the items on EY's list, such as fun, humor, and justice, require the existence of valuers. The question above then amounts to whether all items of moral goodness require the existence of valuers. I think the question merits an answer, even if (see below) it might not be the one lukeprog is most curious about. Unfortunately, lukeprog changed the terms in the middle of the discussion. Not that there is anything wrong with the new question (and I like EY's answer).
What difference would CEV make from a universe in which a Paperclip Maximizer equipped everyone with the desire to maximize paperclips? Of what difference is a universe with as many discrete consciousness entities as possible from one with a single universe-spanning consciousness? If it doesn't make any difference, then how can we be sure that the SIAI won't just implement the first fooming AI with whatever terminal goal it desires? I don't see how you can argue that the question "What is right?" is about the state of affairs that will help people to have more fun and yet claim that you don't think that "it makes any moral difference whether a paperclip maximizer likes paperclips"
If a paperclip maximizer modified everyone such that we really only valued paperclips and nothing else, and we then ran CEV, then CEV would produce a powerful paperclip maximizer. This is... I'm not going to say it's a feature, but it's not a bug, at least. You can't expect CEV to generate accurate information about morality if you erase morality from the minds it's looking at. (You could recover some information about morality by looking at history, or human DNA (if the paperclip maximizer didn't modify that), etc., but then you'd need a strategy other than CEV.) I don't think I understand your second question. That depends on whether the paperclip maximizer is sentient, whether it just makes paperclips or it actually enjoys making paperclips, etc. If those are the case, then its preferences matter... a little. (So let's not make one of those.)
All those concepts seem to be vague. To be sentient, to enjoy. Do you need to figure out how to define those concepts mathematically before you'll be able to implement CEV? Or are you just going to let extrapolated human volition decide about that? If so, how can you possible make claims about how valuable, or how much the preference of a paperclip maximizer matter? Maybe it will all turn out to be wireheading in the end... What is really weird is that Yudkowsky is using the word right in reference to actions affecting other agents, yet doesn't think that it would be reasonable to assign moral weight to the preferences of a paperclip maximizer.
CEV will decide. In general, it seems unlikely that the preferences of nonsentient objects will have moral value. Edit: Looking back, this comment doesn't really address the parent. Extrapolated human volition will be used to determine which things are morally significant. I think it is relatively probable that wireheading might turn out to be morally necessary. Eliezer does think that the preferences of a paperclip maximizer would have moral value if one existed. (If a nonexistent paperclip maximizer had moral worth, so would a nonexistent paperclip minimizer. This isn't completely certain, because paperclip maximizers might gain moral significance from a property other than existence that is not shared with paperclip minimizers, but at this point, this is just speculation and we can do little better without CEV.) A nonsentient paperclip maximizer probably has no more moral value than a rock with "make paperclips" written on the side. The reason that CEV is only based on human preferences is because, as humans, we want to create an algorithm that does what is right and humans are the only things we have that know what is right. If other species have moral value then humans, if we knew more, would care about them. If there is nothing in human minds that could motivate us to care about some specific thing, than what reason could we possibly have for designing an AI to care about that thing?
near future : "you are paper clip maximazer! Kill him!"
What is this supposed to mean?
Paperclips aren't part of fun, on EY's account as I understand it, and therefore not relevant to morality or right. If paperclip maximizers believe otherwise they are simply wrong (perhaps incorrigibly so, but wrong nonetheless)... right and wrong don't depend on the beliefs of agents, on this account. So those claims seem consistent to me. Similarly, a universe in which a PM equipped everyone with the desire to maximize paperclips would therefore be a universe with less desire for fun in it. (This would presumably in turn cause it to be a universe with less fun in it, and therefore a less valuable universe.) I should add that I don't endorse this view, but it does seem to be pretty clearly articulated/presented. If I'm wrong about this, then I am deeply confused.
I don't understand how someone can arrive at "right and wrong don't depend on the beliefs of agents".
I conclude that you use "I don't understand" here to indicate that you don't find the reasoning compelling. I don't find it compelling, either -- hence, my not endorsing it -- so I don't have anything more to add on that front.
If those people propose that utility functions are timeless (e.g. the Mathematical Universe), or simply an intrinsic part of the quantum amplitudes that make up physical reality (is there a meaningful difference?), then under that assumption I agree. If beauty can be captured as a logical function then women.beautiful is right independent of any agent that might endorse that function. The problem of differing tastes, differing aesthetic value, that lead to sentences like "beauty is in the eye of the beholder" are a result of trying to derive functions by the labeling of relations. There can be different functions that designate the same label to different relations. x is R-related to y can be labeled "beautiful" but so can xSy. So while some people talk about the ambiguity of the label beauty and conclude that what is beautiful is agent-dependent, other people talk about the set of functions that are labeled as beauty-function or assign the label beautiful to certain relations and conclude that their output is agent-independent.
(nods) Yes, I think EY believes that rightness can be computed as a property of physical reality, without explicit reference to other agents. That said, I think he also believes that the specifics of that computation cannot be determined without reference to humans. I'm not 100% clear on whether he considers that a mere practical limitation or something more fundamental.
After trying to read No License To Be Human I officially give up reading the sequences for now and postpone it until I learnt a lot more. I think it is wrong to suggest that anyone can read the sequences. Either you've to be a prodigy or a post-graduate. The second comment on that post expresses my own feelings, can people actually follow Yudkowsky's posts? It's over my head.
I agree with you sentiment, but I suggest not giving up so easily. I have the same feeling after many sequence posts, but some of them that I groked were real gems and seriously affected my thinking. Also, borrowing some advice on reading hard papers, it's re-reading that makes a difference. Also, as my coach put it "the best stretching for doing sidekicks is actually doing sidekicks".
I do not necessarily disagree with this, but the following: ... does not prove the claim. Gary would still not take the pill if the question he was asking was "What state of affairs will match up with the current preferences of Gary's brain?". A reference to the current preferences of Gary's brain is different to asking the question "What is a state of affairs in which there is a high satisfaction of the preferences in the brain of Gary?".
Perhaps a better thought experiment, then, is to offer Gary the chance to travel back in time and feed his 2-year-old self the pill. Or, if you dislike time machines in your thought experiments, we can simply ask Gary whether or not he now would have wanted his parents to have given him the pill when he was a child. Presumably the answer will still be no.
If timetravel is to be considered then we must emphasize that when we say 'current preferences' we do not mean "preferences at time, whatever we can make those preferences be" but rather "I want things X, Y, Z to happen, regardless of the state of the atoms that make up me at this or any other time." Changing yourself to not want X, Y or Z will make X, Y and Z less likely to happen so you don't want to do that.
It seems so utterly wrong to me that I concluded it must be me who simply doesn't understand it. Why would it be right to help people to have more fun if helping people to have more fun does not match up with your current preferences. The main reason for why I was able to abandon religion was to realize that what I want implies what is right. That still feels intuitively right. I didn't expect to see many people on LW to argue that there exist preference/(agent/mind)-independent moral statements like 'it is right to help people' or 'killing is generally wrong'. I got a similar reply from Alicorn. Fascinating. This makes me doubt my own intelligence more than anything I've so far come across. If I parse this right it would mean that a Paperclip Maximizer is morally bankrupt?
Well, something I've been noticing is that in their tell your rationalist origin stories, the reason a lot of people give for why they left their religion aren't actually valid arguments. Make of that what you will. Yes. It is morally bankrupt. (or would you not mind turning into paperclips if that's what the Paperclip Maximizer wanted?) BTW, your current position is more-or-less what theists mean when they say atheists are amoral.
Yes, but that is a matter of taste. Why would I ever change my current position? If Yudkowsky told me there was some moral laws written into the fabric of reality, what difference would that make? Either such laws are imperative, so that I am unable to escape them, or I simply ignore them if they are opposing my preferences. Assume all I wanted to do is to kill puppies. Now Yudkowsky told me that this is prohibited and I will suffer disutility because of it. The crucial question would be, does the disutility outweigh the utility I assign to killing puppies? If it doesn't, why should I care?
Perhaps you assign net utility to killing puppies. If you do, you do. What EY tells you, what I tell you, what is prohibited, etc., has nothing to do with it. Nothing forces you to care about any of that. If I understand EY's position, it's that it cuts both ways: whether killing puppies is right or wrong doesn't force you to care, but whether or not you care doesn't change whether it's right or wrong. If I understand your position, it's that what's right and wrong depends on the agent's preferences: if you prefer killing puppies, then killing puppies is right; if you don't, it isn't. My own response to EY's claim is "How do you know that? What would you expect to observe if it weren't true?" I'm not clear what his answer to that is. My response to your claim is "If that's true, so what? Why is right and wrong worth caring about, on that model... why not just say you feel like killing puppies?"
I don't think those terms are useless, that moral doesn't exist. But you have to use those words with great care, because on its own they are meaningless. If I know what you want, I can approach the conditions that would be right for you. If I know how you define morality, I can act morally according to you. But I will do so only if I care about your preferences. If part of my preferences is to see other human beings happy then I have to account for your preferences to some extent, which makes them a subset of my preferences. All those different values are then weighted accordingly. Do you disagree with that understanding?
I agree with you that your preferences account for your actions, and that my preferences account for my actions, and that your preferences can include a preference for my preferences being satisfied. But I think it's a mistake to use the labels "morality" and "preferences" as though they are interchangeable. If you have only one referent -- which it sounds like you do -- then I would recommend picking one label and using it consistently, and not use the other at all. If you have two referents, I would recommend getting clear about the difference and using one label per referent. Otherwise, you introduce way too many unnecessary vectors for confusion. It seems relatively clear to me that EY has two referents -- he thinks there are two things being talked about. If I'm right, then you and he disagree on something, and by treating the language of morality as though it referred to preferences you obscure that disagreement. More precisely: consider a system S comprising two agents A and B, each of which has a set of preferences Pa and Pb, and each of which has knowledge of their own and the other's preferences. Suppose I commit an act X in S. If I've understood correctly, you and EY agree that knowing all of that, you know enough in principle to determine whether X is right or wrong. That is, there isn't anything left over, there's no mysterious essence of rightness or external privileged judge or anything like that. In this, both of you disagree with many other people, such as theists (who would say that you need to consult God's will to make that determination) and really really strict consequentialists (who would say that you need to consult the whole future history of the results of X to make that determination). If I've understood correctly, you and EY disagree on symmetry. That is, if A endorses X and B rejects X, you would say that whether X is right or not is undetermined... it's right by reference to A, and wrong by reference to B, and there's nothing mo
Thanks for this, very enlightening! A very good framing and analysis of my beliefs.
Yeah. While I'm reasonably confident that he holds the belief, I have no confidence in any theories how he arrives at that belief. What I have gotten from his writing on the subject is a combination of "Well, it sure seems that way to me," and "Well, if that isn't true, then I don't see any way to build a superintelligence that does the right thing, and there has to be a way to build a superintelligence that does the right thing." Neither of which I find compelling. But there's a lot of the metaethics sequence that doesn't make much sense to me at all, so I have little confidence that what I've gotten out of it is a good representation of what's there. It's also possible that I'm completely mistaken and he simply insists on "right" as a one-place predicate as a rhetorical trick; a way of drawing the reader's attention away from the speaker's role in that computation. I am fairly sure EY would say (and I agree) that there's no reason to expect them to. Different agents with different preferences will have different beliefs about right and wrong, possibly incorrigibly different. Humans and Babykillers as defined will simply never agree about how the universe would best be ordered, even if they come to agree (as a political exercise) on how to order the universe, without the exercise of force (as the SHFP purpose to do, for example). Um. Certainly, this model says that you can order world-states in terms of their rightness and wrongness, and there might therefore be a single possible world-state that's most right within the set of possible world-states (though there might instead be several possible world-states that are equally right and better than all other possibilities). If there's only one such state, then I guess "right" could designate a future world state; if there are several, it could designate a set of world states. But this depends on interpreting "right" to mean maximally right, in the same sense that "cold" could be understood to designate absol
4Eliezer Yudkowsky13y
Humans and Babykillers are not talking about the same subject matter when they debate what-to-do-next, and their doing different things does not constitute disagreement.
There's a baby in front of me, and I say "Humans and Babykillers disagree about what to do next with this baby." The one replies: "No, they don't. They aren't talking about the same subject when they debate what to do next; this is not a disagreement." "Let me rephrase," I say. "Babykillers prefer that this baby be killed. Humans prefer that this baby have fun. Fun and babykilling can't both be implemented on the same baby: if it's killed, it's not having fun; if it's having fun, it hasn't been killed." Have I left out anything of value in my restatement? If so, what have I left out? More generally: given all the above, why should I care whether or not what humans and Babykillers have with respect to this baby is a disagreement? What difference does that make?
If you disagree with someone, and you're both sufficiently rational, then you can expect to have a good shot at resolving your disagreement by arguing. That doesn't work if you just have fundamentally different motivational frameworks.
I don't know if I agree that a disagreement is necessarily resolvable by argument, but I certainly agree that many disagreements are so resolvable, whereas a complete difference of motivational framework is not. If that's what EY meant to convey by bringing up the question of whether Humans and Babykillers disagree, I agree completely. As I said initially: "Humans and Babykillers as defined will simply never agree about how the universe would best be ordered."
We previously debated the disagreements between those with different values here. The dictionary apparently supports the idea that any conflict is a disagreement. To understand the other side of the argument, I think it helps to look at this: One side has redefined "disagreement" to mean "a difference of opinion over facts"! I think that explains much of the sound and fury surrounding the issue. A "difference of opinion over goals" is not a "difference of opinion over facts". However, note that different goals led to the cigarette companies denying the link between cigarettes and cancer - and also led to oil company AGW denialism - which caused many real-world disagreements.
All of which leaves me with the same question I started with. If I know what questions you and I give different answers to -- be they questions about facts, values, goals, or whatever else -- what is added to my understanding of the situation by asserting that we disagree, or don't disagree? ata's reply was that "we disagree" additionally indicates that we can potentially converge on a common answer by arguing. That also seems to be what EY was getting at about hot air and rocks. That makes sense to me, and sure, it's additionally worth clarifying whether you and I can potentially converge on a common answer by arguing. Anything else? Because all of this dueling-definitions stuff strikes me as a pointless distraction. I use words to communicate concepts; if a word no longer clearly communicates concepts it's no longer worth anything to me.
That doesn't seem to be what the dictionary says "disagreement" means. Maybe if both sides realise that the argument is pointless, they would not waste their time - but what if they don't know what will happen? - or what if their disagreement is intended to sway not their debating partner, but a watching audience?
I agree with you about what the dictionary says, and that people might not know whether they can converge on a common answer, and that people might go through the motions of a disagreement for the benefit of observers.
We talk about what is good, and Babykillers talk about what is eat-babies, but both good and eat-babies perform analogous functions. For building a Friendly-AI we may not give a damn about how to categorize such analogous functions, but I've got a feeling that simply hijacking the word "moral" to suddenly not apply to such similar things, as I think it is usually used, you've successfully increased my confusion through the last year. Either this, or I'm back at square one. Probably the latter.
The fact that killing puppies is wrong follows from the definition of wrong. The fact that Eliezer does not want to do what is wrong is a fact about his brain, determined by introspection.
Because right is a rigid designator. It refers to a specific set of terminal values. If your terminal values don't match up with this specific set of values, then they are wrong, i.e. not right. Not that you would particularly care, of course. From your perspective, you only want to maximize your own values and no others. If your values don't match up with the values defined as moral, so much for morality. But you still should be moral because should, as it's defined here, refers to a specific set of terminal values - the one we labeled "right." (Note: I'm using the term should exactly as EY uses it, unlike in my previous comments in these threads. In my terms, should=should_human and on the assumption that you, XiXiDu, don't care about the terminal values defined as right, should_XiXiDu =/= should)
I'm getting the impression that nobody here actually disagrees but that some people are expressing themselves in a very complicated way. I parse your comment to mean that the definition of moral is a set of terminal values of some agents and should is the term that they use to designate instrumental actions that do serve that goal?
Your second paragraph looks correct. 'Some agents' refers to humanity rather than any group of agents. Technically, should is the term anything should use when discussing humanity's goals, at least when speaking Eliezer. Your first paragraph is less clear. You definitely disagree with others. There are also some other disagreements.
Correct, I disagree. What I wanted to say with my first paragraph was that I might disagree because I don't understand what others believe because they expressed it in a way that was too complicated for me to grasp. You are also correct that I myself was not clear in what I tried to communicate. ETA That is if you believe that disagreement fundamentally arises out of misunderstanding as long as one is not talking about matters of taste.
In Eliezer's metaethics, all disagreement are from misunderstanding. A paperclip maximizer agrees about what is right, it just has no reason to act correctly.
To whoever voted the parent down, this is edit nearly /edit exactly correct. A paperclip maximizer could, in principle, agree about what is right. It doesn't have to, I mean a paperclip maximizer could be stupid, but assuming it's intelligent enough, it could discover what is moral. But a paperclip maximizer doesn't care about what is right, it only cares about paperclips, so it will continue maximizing paperclips and only worry about what is "right" when doing so helps it create more paperclips. Right is a specific set of terminal values that the paperclip maximizer DOESN"T have. On the other hand you, being human, do have those terminal values on EY's metaethics.
Agreed that a paperclip maximizer can "discover what is moral," in the sense that you're using it here. (Although there's no reason to expect any particular PM to do so, no matter how intelligent it is.) Can you clarify why this sort of discovery is in any way interesting, useful, or worth talking about?
It drives home the point that morality is an objective feature of the universe that doesn't depend on the agent asking "what should I do?"
Huh. I don't see how it drives home that point at all. But OK, at least I know what your intention is... thank you for clarifying that.
Fascinating. I still don't understand in what sense this could be true, except maybe the way I tried to interpret EY here and here. But those comments simply got downvoted without any explanation or attempt to correct me, therefore I can't draw any particular conclusion from those downvotes. You could argue that morality (what is right?) is human and other species will agree that from a human perspective what is moral is right is right is moral. Although I would agree, I don't understand how such a confusing use of terms is helpful.
Morality is just a specific set of terminal values. It's an objective feature of the universe because... humans have those terminal values. You can look inside the heads of humans and discover them. "Should," "right," and "moral," in EY's terms, are just being used as a rigid designators to refer to those specific values. I'm not sure I understand the distinction between "right" and "moral" in your comment.
I was the second to vote down the grandparent. It is not exactly correct. In particular it claims "all disagreement" and "a paperclip maximiser agrees", not "could in principle agree". While the comment could perhaps be salvaged with some tweaks, as it stands it is not correct and would just serve to further obfuscate what some people find confusing as it is.
I concede that I was implicitly assuming that all agents have access to the same information. Other than that, I can think of no source of disagreements apart from misunderstanding. I also meant that if paperclip maximizer attempted to find out what is right and did not make any mistakes, it would arrive at the same answer as a human, though there is not necessarily any reason for it to try in the first place. I do not think that these distinctions were nonobvious, but this may be overconfidence on my part.
Can you say more about how the sufficiently intelligent paperclip maximizer goes about finding out what is right?
Depends on how the question is asked. Does the paperclip maximizer have the definition of the word right stored in its memory? If so, it just consults the memory. Otherwise, the questioner would have to either define the word or explain how to arrive at a definition. This may seem like cheating, but consider the analogous case where we are discussing prime numbers. You must either already know what a prime number is, or I must tell you, or I must tell you about mathematicians, and you must observe them. As long as a human and a paperclip maximizer both have the same information about humans, they will both come to the same conclusions about human brains, which happen to encode what is right, thus allowing both the human and the paperclip maximizer to learn about morality. If this paperclip maximizer then chooses to wipe out humanity in order to get more raw materials, it will knows that its actions are wrong; it just has no term in its utility function for morality.
Sure, agreed: if I tell the PM that thus-and-such is labeled "right," or "moral," or "fleabag," then it will know these things, and it won't care. I have entirely lost track of why this is important.
Eliezer believes that you desire to do what is right. It is important to remember that what is right has nothing to do with whether you desire it. Moral facts are interesting because they describe our desires, but they would be true even if our desires were different. In general, these things are useful for programming FAI and evaluating moral arguments. We should not allow our values to drift too far over time. The fact that wireheads want to be wireheaded is not a a valid argument in favour of wireheading. A FAI should try to make reality match what is right, not make reality match people's desires (the latter could be accomplished by changing people's desires). We can be assured that we are acting morally even if there is no magic light from the sky telling us that we are. Moral goals should be pursued. Even if society condones that which is wrong, it is still wrong. Studying the human brain is necessary in order to learn more about morality. When two people disagree about morality, one or both of them is wrong.
Sure. And if it turns out that humans currently want something different than what we wanted a thousand years ago, then it follows that a thousand years ago we didn't want what was right, and now we do... though if you'd asked us a thousand years ago, we'd have said that we want what is right, and we'd have arrived at that conclusion through exactly the same cognitive operations we're currently using. (Of course, in that case we would be mistaken, unlike the current case.) And if it turns out that a thousand years from now humans want something different, then we will no longer want what is right... though if you ask us then, we'll say we want what is right, again using the same cognitive operations. (Again, in that case we would be mistaken.) And if there turn out to be two groups of humans who want incompatible things (for example, because their brains are sufficiently different), then whichever group I happen to be in wants what is right, and the other group doesn't... though if you ask them, they'll (mistakenly) say they want what is right, again using the same cognitive operations. All of which strikes me as a pointlessly confusing way of saying that I endorse what humans-sufficiently-like-me currently want, and don't endorse what we used to want or come to want or what anyone else wants if it's too different from that. Talking about whether some action is right or wrong or moral seems altogether unnecessary on this view. It is enough to say that I endorse what I value, and will program FAI to optimize for that, and will reject moral arguments that are inconsistent with that, and etc. Sure, if I valued something different, I would endorse that instead, but that doesn't change anything; if I were hit by a speeding train, I'd be dead, but it doesn't follow that I am dead. I endorse what I value, which means I consider worlds in which there is less of what I value worse than worlds in which there is more of what I value -- even if those worlds also include ve
If the people a thousand years ago might have wanted what is right, but were mistaken as to what they really wanted. People do not understand their own brains. (You may agree with this; it is unclear from your wording.) Even if they really did have different desires they would not be mistaken. Even if they used the same sound - 'right' - they would be attaching a different meaning to it, so it would be a different word. They would be incorrect if they did not recognize our values as right in Eliezer-speak. This is admitted a nonintuitive meaning. I do not know if there is a clearer way of saying things and I am unsure of what aspects of most people's understanding of the word Eliezer believes this to capture. The alternative does not seem much clearer. Consider Eliezer's example of pulling a child off of some train tracks. If you see me do so, you could explain it in terms of physics/neuroscience. If you ask me about it, I could mention the same explanation, but I also have another one. Why did seeing the child motivate me to save it? Yes, my neural pathways caused it, but I was not thinking about those neural pathway; that would be a level confusion. I was thinking about what is right. Saying that I acted because of neuroscience is true, but saying nothing else promotes level confusion. If you ask me what should happen if I were uninvolved or if my brain were different, I would not change my answer from if I were involved because should is a 1-place function. People do get confused about these things, especially when talking about AI, and that should be stopped. For many people, Eliezer did not resolve confusion, so we need to do better, but default language is no less clear than Eliezer-speak. (To the extent that I agree with Eliezer, I came to this agreement after having read the sequences, but directly after reading other arguments.)
I agree that people don't fully understand their own brains. I agree that it is possible to have mistaken beliefs about what one really wants. I agree that on EY's view any group that fails to identify our current values as right is mistaken. I think EY's usage of "right" in this context leads to unnecessary confusion. The alternative that seems clearer to me, as I've argued elsewhere, is to designate our values as our values, assert that we endorse our values, engage in research to articulate our values more precisely, build systems to optimize for our values, and evaluate moral arguments in terms of how well they align with our values. None of this requires further discussion of right and wrong, good and evil, salvatory and diabolical, etc., and such terms seem like "applause lights" better-suited to soliciting alliances than anything else. If you ask me why I pulled the child off the train tracks, I probably reply that I didn't want the child to die. If you ask me why I stood on the platform while the train ran over the child, I probably reply that I was paralyzed by shock/fear, or that I wasn't sure what to do. In both cases, the actual reality is more complicated than my self-report: there are lots of factors that influence what I do, and I'm not aware of most of them. I agree with you that people get confused about these things. I agree with you that there are multiple levels of description, and mixing them leads to confusion. If you ask me whether the child should be pulled off the tracks, I probably say "yes"; if you ask me why, I probably get confused. The reason I get confused is because I don't have a clear understanding of how I come to that conclusion; I simply consulted my preferences. Faced with that confusion, people make up answers, including answers like "because it's right to do so" or "because it's wrong to let the child die" or "because children have moral value" or "because pulling the child off the tracks has shouldness" or a million ot
I agree with everything non-linguistic If we get rid of words like right, wrong, and should, then we are forced to either come up with new words or use 'want' and 'desire'. The first option is confusing and the second can make us seem like egoists or like people who think that wireheading is right because wireheaded people desire it. To someone unfamiliar with this ethical theory, it would be very misleading. Even many of the readers of this website would be confused if we only used words like 'want'. What we have now is still far from optimal.
...and 'preference' and 'value' and so forth. Yes. If I am talking about current human values, I endorse calling them that, and avoiding introducing new words (like "right") until there's something else for those words to designate. That neither implies that I'm an egoist, nor that I endorse wireheading. I agree with you that somebody might nevertheless conclude one or both of those things. They'd be mistaken. I don't think familiarity with any particular ethical theory is necessary to interpret the lack of a word, though I agree with you that using a word in the absence of a shared theory about its meaning leads to confusion. (I think most usages of "right" fall into this category.) If you are using 'right' to designate something over and above current human values, I endorse you using the word... but I have no idea at the moment what that something is.
I tentatively agree with your wording, though I will have to see if there are any contexts where it fails. By definition, wouldn't humans be unable to want to pursue such a thing?
Not necessarily. For example, if humans value X, and "right" designates Y, and aliens edit our brains so we value Y, then we would want to pursue such a thing. Or if Y is a subset of X, we might find it possible to pursue Y instead of X. (I'm less sure about that, though.) Or various other contrived possibilities. But supposing it were true, why would it matter?
Yes, my statement was way too strong. In fact, it should be much weaker than even what you say; just start a religion that tells people to value Y. I was attempting to express an actual idea that I had with this sentence originally, but my idea was wrong, so never mind. What does this mean? Supposing that something were right, what would it matter to humans? You could get it to matter to humans by exploiting their irrationality, but if CEV works, it would not matter to that. What would it even mean for this to be true? You'd need a definition of right.
How is this helpful? Here is how I would paraphrase the above (as I understand it): Human brains cause human action through an ambivalent decision process. What does this tell about wireheading? I think wireheading might increase pleasure but at the same time feel that it would be wrong. So? All that means is that I have complex and frequently ambivalent preferences and that I use an inaccurate and ambivalent language to describe them. What important insight am I missing?
The important thing about wireheading in this context is that desires after being wireheaded do not matter. The pleasure is irrelevant for this purpose; we could just as easily imagine humans being wireheaded to feel pain, but to desire continuing to feel pain. The point is that what is right should be pursued because it is right, not because people desire it. People's desires are useful as a way of determining what is right, but if it is known that people desires were altered in some way, they stop providing evidence as to what is right. This understanding is essential to a superintelligence considering the best way to alter peoples brains.
That's expressed very clearly, thanks. I don't want to sound rude, I honestly want to understand this. I'm reading your comment and can't help but think that you are arguing about some kind of universal right. I still can't pinpoint the argument. Why isn't it completely arbitrary if we desire to feel pain or pleasure? Is the right answer implied by our evolutionary history? That's a guess, I'm confused. Aren't our desires altered constantly by mutation, nurture, culture and what we experience and learn? Where can you find the purity of human desire?
I get that you are having trouble understanding this; it is hard and I am much worse at explaining thing in text than in person. What is right is universal in the sense that what is right would not change if our brains were different. The fact that we care about what is right is caused by our evolutionary history. If we evolved differently, we would have different values, wanting what is gleerp rather than what is right. The differences would be arbitrary to most minds, but not to us. One of the problems of friendliness is ensuring that it is not arbitrary to the AI either. There are two types of this; we may learn more about our own values, which is good and which Eliezer believes to be the cause of "moral progress", or our values may really change. The second type of changes to our desires really are bad. People actually do this, like those who refuse to expose themselves to violence because they think that it will desensitize them from violence. They are really just refusing to take Gandhi's murder pill, but on a smaller scale. If you have a transtemporal disagreement with your future self on what action you future self should take, your future self will win, because you will no longer exist. The only way to prevent this is to simply refuse to allow your values to change, preventing your future self from disagreeing with you in the first place. I don't know what you mean by "purity of human desire".
Yep, with the caveat that endoself added below: "should" refers to humanity's goals, no matter who is using the term (on EY's theory and semantics).
And if you modify this to say a certain subset of what you want -- the subset you'd still call "right" given omniscience, I think -- then it seems correct, as far as it goes. It just doesn't get you any closer to a more detailed answer, specifying the subset in question. Or not much closer. At best it tells you not to worry that you 'are' fundamentally evil and that no amount of information would change that.
For what it's worth, I'm also one of those people, and I never did have religion. I don't know if there's a correlation there.
It is useful to think of right and wrong as being some agent's preferences. That agent doesn't have to be you - or even to exist IRL. If you are a sadist (no slur intended) you might want to inflict pain - but that would not make it "right" - in the eyes of conventional society. It is fairly common to use "right" and "wrong" to describe society-level preferences.
Why would a sadistic Boltzmann brain conclude that it is wrong to be a sadistic Boltzmann brain? Whatever some society thinks is completely irrelevant to an agent with outlier preferences.
Morality serves several functions: * It is a guide relating to what to do; * It is a guide relating to what behaviour to punish; * It allows for the signalling of goodness and virtue; * It allows agents to manipulate others, by labelling them or their actions as bad; The lower items on the list have some significance, IMO.
Gary's preference is not itself justification, rather it recognizes moral arguments, and not because it's Gary's preference, but for its own specific reasons. Saying that "Gary's preference states that X is Gary_right" is roughly the same as "Gary should_Gary X". (This should_T terminology was discouraged by Eliezer in the sequences, perhaps since it invites incorrect moral-relativistic thinking, as if any decision problem can be assumed as own by any other, and also makes you think of ways of referring to morality, while seeing it as a black box, instead of looking inside morality. And you have to look inside even to refer to it, but won't notice that until you stop referring and try looking.) To a first approximation, but not quite, since it might be impossible to know what is right, for any computation not to speak of a mere human, only to make right guesses. Every well-defined question has in a sense a "subjectively objective" answer: there's "subjectivity" in the way the question has to be interpreted by an agent that takes on a task of answering it, and "objectivity" in the rules of reasoning established by such interpretation, that makes some possible answers incorrect with respect to that abstract standard. I don't quite see how this is opposed to the other points of your comment. If you actually start unpacking the notion, you'll find that it's a very long list. Alternatively, you might try referring to that list by mentioning it, but that's a tricky task for various reasons, including the need to use morality to locate (and precisely describe the location of) the list. Perhaps we can refer to morality concisely, but it's not clear how.
I had no idea what Eliezer was talking about originally until I started thinking in terms of should_T. Based on that and the general level of confusion among people trying to understand his metaethics, I concluded that EY was wrong - more people would understand if he talked in terms of should_T. Based on some of the back and forth here, I'm revising that opinion somewhat. Apparently this stuff is just confusing and I may just be atypical in being able to initially understand it better in those terms.
Yes, natural laws. If Gary's preferences do not align with reality then Gary's preferences are objectively wrong'. When people talk about morality they implicitly talk about fields like decision theory, game theory or economics. The mistake is to take an objective point of view, one similar to CEV. Something like CEV will result in some sort of game theoretic equilibrium. Yet each of us is a discrete agent that does not maximally value the extrapolated volition of other agents. People usually try to objectify, find a common ground, a compromise. This leads to all sorts of confusion between agents with maximally opposing terminal goals. In other words, if you are an outlier then there does exist no common ground and therefore something like CEV will be opposed. ETA ' I should clarify what I mean with that sentence (if I want that people understand me). I assume that Gary has a reward function and is the result of an evolutionary process. Gary should alter its preferences as they do not suit his reward function and decrease his fitness. I realize that in a sense I just move the problem onto another level. But if Gary's preferences can not be approached then they can be no justification for any action towards an implied goal. At that point the goal-oriented agent that is Gary will be functionally defunct and other more primitive processes will take over and consequently override Gary's preferences. In this sense reality demands that Gary should change his mind.
Why consider physical facts separately? Can't they be thought of as logical facts, in the context of agent's epistemology? (You'll have lots of logical uncertainty about them, and even normative structures will look more like models of uncertainty, but still.) Is it just a matter of useful heuristic separation of the different kinds of data? (Expect not, in your theory, in some sense.)
But are those truth-values intersubjectively recognizable? The average person believes morality to be about imperative terminal goals. You ought to want that which is objectively right and good. But there does exist no terminal goal that is objectively desirable. You can assign infinite utility to any action and thereby outweigh any consequences. What is objectively verifiable is how to maximize the efficiency in reaching a discrete terminal goal.
If you mean intersubjectively say it. Objectively has a slightly different meaning. In particular, see 'objectively subjective'.
I changed it.

In a nutshell, Eliezer's metaethics says you should maximize your preferences whatever they may be, or rather, you should_you maximize your preferences, but of course you should_me maximize my preferences. (Note that I said preferences and not utility function. There is no assumption that your preferences HAVE to be a utility function, or at least I don't think so. Eliezer might have a different view). So ethics is reduced to decision theory. In addition, according to Eliezer, human have tremendous value uncertainty. That is, we don't really know what our terminal values are, so we don't really know what we should be maximizing. The last part, and the most controversial around here I think, is that Eliezer thinks that human preferences are similar enough across humans that it makes sense to think about should_human.

There are some further details, but that's the nutshell description. The big break from many philosophers, I think, is considering edit ones own /edit preferences the foundation of ethics. But really, this is in Hume (on one interpretation).

edit: I should add that the language I'm using to describe EY's theory is NOT the language that he uses himself. Some people find my language more enlightening (me, for one), others find EY's more enlightening. Your mileage may vary.

Eliezer is a bit more aggressive in the use of 'should'. What you are describing as should Eliezer has declared to be would_want while 'should' is implicitly would_want, with no allowance for generic instantiation. That is he is comfortable answering "What should a Paperclip Maximiser do when faced with Newcomb's problem?" with "Rewrite itself to be an FAI". There have been rather extended (and somewhat critical) discussions in comment threads of Eliezer's slightly idiosyncratic usage of 'should' and related terminology but I can't recall where. I know it was in a thread not directly related to the subject!
You're right about Eliezer's semantics. Count me as one of those who thought his terminology was confusing, which is why I don't use it when I try to describe the theory to anyone else.
Are you sure? I thought "should" could mean would_want. Note I could follow this by saying "That is he is comfortable answering "What should a Paperclip Maximiser do when faced with Newcomb's problem?" with "Rewrite itself to be an FAI".", but that would be affirming the consequent ;-), i.e. I know he says such a thing, but my and your formulation both plausibly explain it, as far as I know.
I had a hard time parsing "you should_you maximize your preferences, but of course you should_me maximize my preferences." Can someone break that down without jargon and/or explain how the "should_x" jargon works?
I think the difficulty is that in English "You" is used for "A hypothetical person". In German they use the word "Man" which is completely distinct from "Du". It might be easier to parse as "Man should_Raemon maximize Raemon's preferences, but of course man should_Matt maximize Matt's preferences." On the jargon itself, Should_X means "Should, as X would understand it".
"Man" is the generalization of the personal subject. You can translate it with "one".
I think it's better phrased by putting Man in all instances of Raemon. Also: \ is the escape character on LW, so if you want to type an actual asterisk or underscore (or \ itself), instead of using it for formatting purposes, put a \ in front of it. This way they will not be interpreted as marking lists, italics, or bold.
Hang on, is that Raemon's preferences we're talking about or....
Your preferences are a utility function if they're consistent, but if you're a human, they aren't.
Consistent in what sense? Utility function over what domain? Under what prior? In this context, some unjustified assumptions, although understandably traditional to a point where objecting is weird.
I'd appreciate clarification on what you mean by "You should_me maximize my preferences." I understand that the "objective" part is that we could both come to agree on the value of should_you and the value of should_me, but what do you mean when you say that I should_MattSimpson maximize your preferences? I certainly balk at the suggestion that there is a should_human, but I'd need to understand Eliezer in more detail on that point. And yes, if one's own preferences are the foundation of ethics, most philosophers would simply call this subject matter practical rationality rather than morality. "Morality" is usually thought to be a term that refers to norms with a broader foundation and perhaps even "universal bindingness" or something. On this point, Eliezer just has an unusual way of carving up concept space that will confuse many people. (And this is coming from someone who rejects the standard analytic process of "conceptual analysis", and is quite open to redefining terms to make them more useful and match the world more cleanly.) Also, even if you think that the only reasons for action that exist come from relations between preferences and states of affairs, there are still ways to see morality as a system of hypothetical imperatives that is "broader" (and therefore may fit common use of the term "morality" better) than Eliezer's meta-ethical theory. See for example Peter Railton or 1980s Philippa Foot or, well, Alonzo Fyfe and Luke Muehlhauser. We already have a term that matches Eliezer's use of "ought" and "should" quite nicely: it's called the "prudential ought." The term "moral ought" is usually applied to a different location in concept space, whether or not it successfully refers. Anyway, are my remarks connecting with Eliezer's actual stated position, do you think?
I mean that according to my preferences, you, me, and everyone else should maximize them. If you ask what should_MattSimpson be done, the short answer is maximize my preferences. Similarly, if you ask what should_lukeproq be done, the short answer is to maximize your preferences. It doesn't matter who does the asking. If you ask should_agent should be done, you should maximize agent's preferences. There is no "should" only should_agent's. (Note, Eliezer calls should_human "should." I think it's an error of terminology, personally. It obscures his position somewhat). Then Eliezer's position is that all normativity is prudential normativity. But without the pop-culture connotations that come with this position. In other words, this doesn't mean you can "do whatever you want." You probably do, in fact, value other people, you're a human after all. So murdering them is not ok, even if you know you can get away with it. (Note that this last conclusion might be salvageable even if there is no should_human.) As for why Eliezer (and others here) think there is a should_human (or that human values are similar enough to talk about such a thing), the essence of the argument rests on ev-psych, but I don't know the details beyond "ev-psych suggests that our minds would be very similar."
Okay, that make sense. Does Eliezer claim that murder is wrong for every agent? I find it highly likely that in certain cases, an agent's murder of some person will best satisfy that agent's preferences.
Murder is certainly not wrong_x for every agent x - we can think of an agent with a preference for people being murdered, even itself. However, it is almost always wrong_MattSimpson and (hopefully!) almost always wrong_lukeproq. So it depends on which question your are asking. If you're asking "is murder wrong_human for every agent?" Eliezer would say yes. If you're asking "is murder wrong_x for every agent x?" Eliezer would say no. (I realize it was clear to both you and me which of the two you were asking, but for the benefit of confused readers, I made sure everything was clear)
I would be very surprised if EY gave those answers to those questions. It seems pretty fundamental to his view of morality that asking about "wrong_human" and "wrong_x" is an important mis-step. Maybe murder isn't always wrong, but it certainly doesn't depend (on EY's view, as I understand it) on the existence of an agent with a preference for people being murdered (or the absence of such an agent).
That's because for EY, "wrong" and "wrong_\human" mean the same thing. It's semantics. When you ask "is X right or wrong?" in the every day sense of the term, you are actually asking "is X right_human or wrong_human?" But if murder is wrong_human, that doesn't mean it's wrong_clippy, for example. In both cases you are just checking a utility function, but different utility functions give different answers.
It seems clear from the metaethics posts is that if a powerful alien race comes along and converts humanity into paperclip-maximizers, such that making many paperclips comes to be right_human, EY would say that making many paperclips doesn't therefore become right. So it seems clear that at least under some circumstances, "wrong" and "wrong_human" don't mean the same thing for EY, and that at least sometimes EY would say that "is X right or wrong?" doesn't depend on what humans happen to want that day. Now, if by "wrong_human" you don't mean what humans would consider wrong the day you evaluate it, but rather what is considered wrong by humans today, then all of that is irrelevant to your claim. In that case, yes, maybe you're right that what you mean by "wrong_human" is also what EY means by "wrong." But I still wouldn't expect him to endorse the idea that what's wrong or right depends in any way on what agents happen to prefer.
No one can change right_human, it's a specific utility function. You can change the utility function that humans implement, but you can't change right_human. That would be like changing e^x or 2 to something else. In other words, you're right about what the metaethics posts say, and that's what I'm saying too. edit: or what jimrandomh said (I didn't see his comment before I posted mine)
What if we use 'human' as a rigid designator for unmodified-human. Then in case aliens convert people into paperclip-maximizers, they're no longer human, hence human_right no longer applies to them, but itself remains unchanged.
human_right still applies to them in the sense that they still should do what's human_right. That's the definition of should. (Remember, should refers to a specific set of terminal values, those that humans happen to have, called human_right) However, these modified humans, much like clippy, don't care about human_right and so won't be motivated to act based on human_right (except insofar as it helps make paperclips). I'm not necessarily disagreeing with you because it's a little ambiguous how you used the word "applies." If you mean that the modified humans don't care about human_right anymore, I agree. If you mean that the modified humans shouldn't care about human_right, then I disagree.
I'm not sure why it's necessary to use 'should' to mean morally_should, it could just be used to mean decision-theoretic_should. E.g. if you're asked what a chess playing computer program should do to win a particular game, you could give a list of moves it should make. And when a human asks what they should do related to a moral question, you can first use the human_right function to determine what is the desired state of the world that they want to achieve, and then ask what you should do (as in decision-theoretic_should, or as in what moves/steps you need to execute, in analogy to the chess program) to create this state. Thus morality is contained within the human_right function and there's no confusion over the meaning of 'should'.
As long as you can keep the terms straight, sure. EY's argument was that using "should" in that sense makes it easier to make mistakes related to relativism.
OK. At this point I must admit I've lost track of why these various suggestively named utility functions are of any genuine interest, so I should probably leave it there. Thanks for clarifying.
In that case, we would draw a distinction between right_unmodified_human and right_modified_human, and "right" would refer to the former.
Murder as I define it seems universally wrong_victim, but I doubt you could literally replace "victim" with any agent's name.
I find the talk of "should_MattSimpson" very unpersuasive given the availability of alternative phrasings such as "approved_MattSimpson" or "valued_MattSimpson". I have read below that EY discourages such talk, but it seems that's for different reasons than mine. Could someone please point me to at least one post in the sequence which (almost/kinda/sorta) motivates such phrasings?
Alternate phrasings such as those you listed would probably be less confusing, i.e. replacing "should" in "should_X" with "valued" and reserving "should" for "valued_human".
They would be missing some important distinctions between what we think of as our moral values and what we think of as "chocolate/vanilla" preferences. For one obvious example, consider an alien ray gun that 'switches the way I feel' about two things, X and Y, without otherwise affecting my utility function or anything else of value to me. If X were, say, licorice jelly beans (yum) and Y were, say, buttered popcorn jelly beans (yuck), then I wouldn't be too deeply bothered by the prospect of being zapped with this gun. (Same for sexual preference, etc.) But if X were "autonomy of individuals" and Y were "uniformity of individuals", I would flee screaming from the prospect of being messed with that way, and would take some extreme actions (if I knew I'd be zapped) to prevent my new preferences from having large effects in the world. Now we can develop whole theories about what this kind of difference consists in, but it's at least relevant to the question of metaethics. In fact, I think that calling this wider class of volitions "preferences" is sneaking in an unfortunate connotation that they "shouldn't really matter then".
This sounds, to me, like it's just the distinction between terminal and instrumental values. I don't terminally value eating licorice jelly beans, I just like the way they taste and the feeling of pleasure they give me. If you switched the tastes of buttered popcorn jelly beans (yuck indeed) and licorice jelly beans, that would be fine by me. Hell, it would be an improvement since no one else likes that flavor (more for me!). The situation is NOT the same for "autonomy of individuals" and "uniformity of individuals" before I really do have terminal values for these things, apart from the way they make me feel.
How do you know that? What would you expect to experience if your preference for individual autonomy in fact derived from something else?
It was meant as a hypothetical. I don't actually know.
Ah. Sorry; I thought you were endorsing the idea.
Huh? You simply weigh "chocolate/vanilla" preferences differently than decisions that would affect goal-oriented agents.
I agree that by using a single term for the wider class of volitions -- for example, by saying both that I "prefer" autonomy to uniformity and also that I "prefer" male sexual partners to female ones and also that I "prefer" chocolate to vanilla -- I introduce the connotation that the distinctions between these various "preferences" aren't important in the context of discourse. To call that an unfortunate connotation is question-begging. Sometimes we deliberately adopt language that elides a distinction in a particular context, precisely because we don't believe that distinction ought to be made in that context. For example, in a context where I believe skin color ought not matter, I may use language that elides the distinction between skin colors. I may do this even if I care about that distinction: for example, if I observe that I do, in fact, care about my doctor's skin color, but I don't endorse caring about it, I might start using language that elides that distinction as a way of changing the degree to which I care about it. So it seems worth asking whether, in the particular context you're talking about, the connotations introduced by the term "preferences" are in fact unfortunate. For instance, you class sexual preference among the "chocolate/vanilla" preferences for which the implication that they "shouldn't really matter" is appropriate. I would likely have agreed with you twenty years ago, when I had just broken up with my girlfriend and hadn't yet started dating my current husband. OTOH, today I would likely "flee screaming" from a ray that made me heterosexual, since that would vastly decrease the value to me of my marriage. Of course, you may object that this sort of practical consequence isn't what you mean. But there are plenty of people who would "flee screaming" from a sexual-preference-altering ray for what they classify as moral reasons, without reference to practical consequences. And perhaps I'm one of them... after all, it's not clear to
These are two different people, many objections from the fact they disagree one ought to have from the fact that one and some random other contemporary person disagree.
And yet, a lot of our culture presumes that there are important differences between the two. E.g., culturally we think it's reasonable for someone at 20 to make commitments that are binding on that person at 40, whereas we think it's really strange for someone at 20 or 40 to make commitments that are binding on some random other contemporary person.
Ah, sexual preference was a poor example in general– in my case, being single at the moment means I wouldn't be injuring anybody if my preferences changed. Were I in a serious relationship, I'd flee from the ray gun too.
Thanks for this clarification. I personally don't get that connotation from the term "preferences," but I'm sure others do. Anyway, so... Eliezer distinguishes prudential oughts from moral oughts by saying that moral oughts are what we ought to do to satisfy some small subset of our preferences: preferences that we wouldn't want changed by an alien ray gun? I thought he was saying that I morally should_Luke do what will best satisfy a global consideration of my preferences.
No, no, no- I don't mean that what I pointed out was the only distinction or the fundamental distinction, just that there's a big honking difference in at least one salient way. I'm not speaking for Eliezer on what's the best way to carve up that cluster in concept-space.
Oh. Well, what do you think Eliezer has tried to say about how to carve up that cluster in concept-space?
We'd need to do something specific with the world, there's no reason any one person gets to have the privilege, and creating an agent for every human and having them fight it out is probably not the best possible solution.
3Wei Dai13y
I don't think that adequately addresses lukeprog's concern. Even granting that one person shouldn't have the privilege of deciding the world's fate, nor should an AI be created for every human to fight it out (although personally I don't think an would-be FAI designer should rule these out as possible solutions just yet), that leaves many other possibilities for how to decide what to do with the world. I think the proper name for this problem is "should_AI_designer", not "should_human", and you need some other argument to justify the position that it makes sense to talk about "should_human". I think Eliezer's own argument is given here:
No, this is called preference utilitarianism. Not only controversial here. Even when just being the messenger, during discussions on morality, I usually get called out on that. The hope on ev-psy as argument for very common values plus the expectation that value differences are more often differences in knowledge than differences in culture or subjective preferences is not shared very widely.
Usually utilitarianism means maximize the utility of all people/agents/beings of moral worth (average or sum depending on the flavor of utilitarianism). Eliezer's metaethics says only maximize your own utility. There is a clear distinction. Edit: but you are correct about considering preferences the foundation of ethics. I should have been more clear
Isn't that bog-standard ethical egoism? If that is the case, then I really misunderstood the sequences.
Maybe. Sometimes ethical egoism sounds like it says that you should be selfish. If that's the case, than no, they are not the same. But sometimes it just sounds like it says you should do whatever you want to do, even if that includes helping others. If that's the case, they sound the same to me. edit: Actually, that's not quite right. On the second version, egoism give the same answer as EY's metaethics for all agents who have "what is right" as their terminal values, but NOT for any other agent. Egoism in this sense defines "should" as "should_X" where X is the agent asking what should be done. For EY, "should" is always "should_human" no matter who is asking the question.
Indeed, but I'd like to point out that this is not an answer about what to do or what's good and bad, merely the rejection of a commonly claimed (but incorrect) statement about what structure such an answer should have.
I think think I disagree, but I'm not sure I understand. Care to explain further?
(Note: This comment contains positions which came from my mind without an origin tag attached. I don't remember reading anything by Eliezer which directly disagrees with this, but I don't represent this as anyone's position but my own.) "Standard" utilitarianism works by defining a separate per-agent utility functions to represent each person's preferences, and averaging (or summing) them to produce a composite utility function which every utilitarianism is supposed to optimize. The exact details of what the per-agent utility functions look like, and how you combine them, differ from flavor to flavor. However, this structure - splitting the utility function up into per-agent utility functions plus an agent utility function - is wrong. I don't know what a utility function that fully captured human values would look like, but I do know that it can't be split and composed this way. It breaks down most obviously when you start varying the number of agents; in the variant where you sum up utilities, an outcome where many people live lives just barely worth living seems better than an outcome where fewer people live amazingly good lives (but we actually prefer the latter); in the variant where you average utilities, an outcome where only one person exists but he lives an extra-awesome life is better than an outcome where many people lead merely-awesome lives. Split-agent utility functions are also poorly equipped to deal with the problem of weighing agents against each other. if there's a scenario where one person's utility function diverges to infinity, then both sum- and average-utility aggregation claim that it's worth sacrificing everyone else to make sure that happens (the "utility monster" problem). And the thing is, writing a utility function that captures human values is a hard and unsolved problem, and splitting it up by agent doesn't actually bring us any closer; defining the single-agent function is just as hard as defining the whole thing.
I was about to cite the same sorts of things to explain why they DO disagree about what is good and bad. In other words, I agree with you about utilitarianism being wrong about the structure of ethics in precisely the way you described, but I think that also entails utilitarianism coming to different concrete ethical conclusions. If a murderer really likes murdering - it's truly a terminal value - the utilitarian HAS to take that into account. On Eliezer's theory, this need not be so. So you can construct a hypothetical where the utilitarian has to allow someone to be murdered simply to satisfy a (or many) murderer's preference where on Eliezer's theory, nothing of this nature has to be done.
That is a problem for average-over-agents utilitarianism, but not a fatal one; the per-agent utility function you use need not reflect all of that agent's preferences, it can reflect something narrower like "that agent's preferences excluding preferences that refer to other agents and which those agents would choose to veto". (Of course, that's a terrible hack, which must be added to the hacks to deal with varying population sizes, divergence, and so on, and the resulting theory ends up being extremely inelegant.)
True enough, there are always more hacks a utilitarian can throw on to their theory to avoid issues like this.
Are you sure of this? It sounds a lot like scope insensitivity. Remember, lives barely worth living are still worth living. Again, this seems like scope insensitivity.
Uh, well, it seems then that my memory tricked me. I remembered otherwise. Though given his thoughts on extrapolation and his hopes that this will be coherent and human-universal, it would collapse into the same.
Yeah, that's probably right. But notice that even in that case, unlike the utilitarian, there are no thorny issues about how to deal with non-human agents. If we run into an alien that has a serious preference for raping humans, the utilitarian only has ad-hoc ways of deciding whether or not the alien's preference counts. Eliezer's metaethics handles it elegantly: check your utility function. Of course, that's easier said than done in the real world, but it does solve many philosophical problems associated with utilitarianism.
There is a way of testing metaethical theories, which is to compare their predictions or suggestions again common first-level ethical intuitions. It isnt watertight as the recalcitrant meatethicist can always say that the intuitions are wrong... anyway, trying it out n EY-metaethics, as you have stated it, doesn't wash too well, since there is an implication that those who value murder should murder, those who value paperclips should maximise paperclips, etc. Some will recognise that as a form of the well known and widely rejected theory of ethical egoism. OTOH, you may not have presented the theory correctly. For instance, the "Coherent" in CEV may be important. EY may have the get-out that murderers and clippies don't have enough coherence in their values to count as moral.
I don't think the coherence part is particularly relevant here. Consider two people, you (Peter) and me (Matt). Suppose I prefer to be able to murder people and you prefer that no one ever be murdered. Suppose I have the opportunity to murder someone (call him John) without getting caught or causing any other relevant positive or negative consequences (both under your preferences and mine). What should I do? Well, I should_Matt murder John. My preferences say "yay murder" and there are no downsides, so I should_Matt go ahead with it. But I should_Peter NOT murder John. Your preferences say "boo murder" and there are no other benefits to murdering John, so I should_Peter just leave John alone. But what should I do? Tell me what you mean by should and I'll tell you. Presumably you mean should_Peter or should_(most people), in which case, then I shouldn't murder. (EY's theory would further add that I don't, in fact, value murder as an empirical claim - and that would be correct, but it isn't particularly relevant to the hypothetical. It may, however, be relevant to this method of testing metaethical theories, depending on how you intended to use it.) Let me fix that sentence for you: In other words, there is no "should," unless you define it to be a specific should_x. EY would define it as should_(human CEV) or something similar, and that's the "should" you should be running through the test. It isn't. Egoism says be selfish. There's no reason why someone can't have altruistic preferences, and in fact people do. (Unless that's not what you mean by egoism, but sure, this is egoism, but that's a misleading definition and the connotations don't apply).
There are a lot of candidates for what I could mean by "should" under which you shouldn't murder. Should-most-people woulld imply that.. It is an example of a non-Yudkovskian theory that doens't have the problem of the self-centered vesion of his theory. So is Kantian metathics: you should not murder because you would not wish muder to be Universal Law. And how is that supposed to help? Are you implying that nothing counts as a counterexample to a metaethical theory unless it relates to should_Peter, to what the theory is telling me to do. But as it happens, I do care about what metaethical theories tell other people to do, just as evidence that I haven;t personally witnessed still could count against a scientific claim. That isn't a fact. It may be an implication of the theory, but i seem to have good reason to reject the theory. That seems to be the same get-out clause as before: that there is somehting about the Coherenet and/or the Extrapolated that fixes the Michael-should-murder problem. But if there is, it should have been emphasised in your original statement of EY;s position. As originally stated, it has the same problems as egoism.
What I'm trying to say is that within the theory there is no "should" apart from should_X's. So you need to pin down which should_X you're talking about when you run the theory through the test - you can ask "what should_Matt Matt do?" and "what should_Matt Peter do?", or you can ask "what should_Peter Matt do?" and what "should_Peter Peter do?", but it's unfair to ask "what should_Matt Matt do?" and "what should_Peter Peter do?" - you're changing the definition of "should" in the middle of the test! Now the question is, which should_X should you use in the test? If X is running the theory through the test, X should use should_X since X is checking the theory against X's moral intuitions. (If X is checking the test against Y's moral intutions, then X should use should_Y). In other words, X should ask, "what should_X Matt do?" and "what should_X Peter do?". If there is a such a thing as should_human, then if X is a human, this amounts to using should_human. As a side note, to display"a_b" correctly, type "a\_b"
We have intutions that certain things are wrong -- murder, robbery and so forth -- and we have the intution that those things are wrong, not just wrong-for-peope-that-don't-like-them. This intuition of objectivity is what makes ethics a problem, in conjunction with the absence of obvious moral objects as part of the furniture of the world. ETA: again, a defence of moral subjectivism seems to be needed as part of CEV..
Traditional moral subjectivism usually says that what X should do depends on who X is in some intrinsic way. In other words, when you ask "what should X do?", the answer you get is the answer to "what should_X X do?" On EY's theory, when you ask "what should X do?", the answer you get is the answer to "what should_Y X do?" where Y is constant across all X's. So "should" is a rigid designator -- is corresponds to the same set of values no matter who we're asking about. Now the subjectivity may appear to come in because two different people might have a different Y in mind when they ask "what should X do?" The answer depends on who's asking! Subjectivity! Actually, no. The answer only depends on what the asker means by should. If should = should_Y, then it doesn't matter who's asking or who they're asking about, we'll get the same answer. If should = should_X, the same conclusion follows. The apparent subjectivity comes from thinking that there is a separate "should" apart from any "should_X, and then subtly changing the definition of "should" when someone different asks or someone different is asked about. Now many metaethicists may still have a problem with the theory related to what's driving it's apparent subjectivity, but calling it subjective is incorrect. I'll note that the particular semantics I'm using are widely regarded to confuse readers into thinking the theory is a form of subjectivism or moral relativism -- and frankly, I agree with the criticism. Using this terminology just so happen to be how I finally understood the theory, so it's appealing to me. Let's try a different terminology (hat tip to wedrifid): every time I wrote should_X, read that as would_want_X. In other words, should_X = would_want_X = X's implicit preferences -- what X would want if X were able to take into account all n-order preferences she has in our somewhat simplified example. Then, in the strongest form of EY's theory, should = would_want_Human. In other words, only would_wa
Y is presuably varying wth somethjng, or why put it in?. I don't follow. Thinkking there is a should that is separate from any should_X is the basis of objecivity. The basis of subjectivity is having a quesstion that can be valdily answered by reference to a speakers beliefs and desires alone. "What flavour of ice cream would I choose" works that way. So does any other case of acti g ona prefrerence, any other "would". Since you have equated shoulds with woulds, the shoulds are subjective as well.. There are objective facts about what a subject would do, just as it isan objective fact that sos-and-so has a liking for Chocoalte Chip, but these objective facts don't negate the existence of subjectivity. Something is objectice and not subjective where there are no valud answers based on reference to a subjects beliefs and desires. I don't think that is the case here. The claim that only should_Human is normative contradicts the claim that any would-want isa a should-want. If normativity kicks in for any "would", what does bringing in the human level add. Well, that version of the theory is objective, or intersubjecive enough. It just isnt the same as the version of the theory that equates individual woulds and shoulds. And it relies on a convergence that might not arrive in practice.
To make it clear that "should" is just a particular "should_Y." Or, using the other terminology, "should" is a particular "would_want_Y." I agree with this. If the question was "how do I best satisfy my preferences?" then the answer changes with who the speaker is. But, on the theory, "should" is a rigid designator and refers ONLY to a specific should_X (or would_want_X if you prefer that terminology). So if the question is "what should I do?" That's the same as asking "what should_X I do?" or equivalently "what would_want_X I do?" The answer is the same no matter who is asking. The "X" is there because 1) the theory says that "should" just is a particular "should_X," or equivalently a particular "would_want_X" and 2) there's some uncertainty about which X belongs there. In EY's strongest form of the theory, X = Human. A weaker form might say X = nonsociopath human. Just to be clear, "should_Y" doesn't have any normativity unless Y happens to be the same as the X in the previous paragraph. "Should_Y" isn't actually a "should" - this is why I started calling it "would_want_Y" instead. But it is. Consider the strong form where should = would_want_Human. Suppose an alien race came and modified humans so that their implicit preferences were completely changed. Is should changed? Well, no. "should" refers to a particular preference structure - a particular mathematical object. Changing the preference structure that humans would_want doesn't change "should" any more than changing the number of eyes a human has changes "2." Or to put it another way, distinguish between would_want_UnmodifiedHuman and would_want_ModifiedHuman. Then should = would_want_UnmodifiedHuman. "Should" refers to a particular implicit preference structure, a particular mathematical object, instantiated in some agent or group of agents. Hopefully this is clear now, but it doesn't, even if I was calling them all "should_Y."
In the usages he has made EY actually seems to say there is a "should", which we would describe as should. For other preferences he has suggested would_want. So if John wants to murder people he should not murder people but would_want to murder them. (But that is just his particular semantics, the actual advocated behavior is as you describe it.) When it comes to CEV Eliezer has never (that I have noticed) actually acknowledged that Coherent Extrapolated Volition can be created for any group other than "humanity". Others have used it as something that must be instantiated for a particular group in order to make sense. I personally consider any usage of "CEV" where the group being extrapolated is not given or clear from the context to be either a mistake or sneaking in connotations.
I don't remember the would_want semantics anywhere in EY's writings, but I see the appeal - especially given how my discussion with Peterdjones is going,
It was in a past conversation on the subject of what Eliezer means by "should" and related terms. That was the answer he gave in response to the explicit question. In actual writings there hasn't been a particular need to refer concisely to the morality of other agents independently of their actual preferences. When describing Baby Eaters, for example, natural language worked just fine.
My current prefernces? Why shouldn't I change them?
What wedrifid said. But also, what is the criterion by which you would change your (extrapolated) preferences? This criterion must contain some or all of the things that you care about. Therefore, by definition it's part of your current (extrapolated) preferences. Edit: Which tells you that under "normal" circumstances you won't prefer to change your preferences.
It would probably be a higher-order preference, like being more fair, more consistent, etc. That would require a lot of supplementaty assumptions. For instance, if I didn't care about consistency, i wouldn't revise my prefernces to be more consistent. I might also "stick" if I cared about consistency and knew myself to be consistent. But how often does that happen?
My intuition is that if you have preferences over (the space of possible preferences over states of the world), that implicitly determines preferences over states of the world - call these "implicit preferences". This is much like if you have a probability distribution over (the set of probability distributions over X), that determines a probability distribution over X (though this might require X to be finite or perhaps something weaker). So when I say "your preferences" or "your extrapolated preferences" I'm referring to your implicit preferences. In other words, "your preferences" refers to what you your 1st order preferences over the state of the world would look like if you took into account all n-order preferences, not the current 1st order preferences with which you are currently operating. Edit: Which is just another way of saying "what wedrifid said." One interpretation of CEV is that it's supposed to find these implicit preferences, assuming that everyone has the same, or "similar enough", implicit preferences.
Where does the "everyone" come in? Your initial statement of EY;s metaethics is that it is about my preferences, hoever implicit or extrapolated. Are individual's extrapolated preferences supposed to converge or not? That's a very important issue. If they do converge, then why the emphasis on the difference between should_Peter and should_Matt? If they don't converge, how do you avoid Prudent Predation. The whle thing's as clear as mud.
One part of EY's theory is that all humans have similar enough implicit preferences that you can talk about implicit human preferences. CEV is supposed to find implicit human preferences. Others have noted that there's no reason why you can't run CEV on other groups, or a single person, or perhaps only part of a single person. In which case, you can think of CEV(X) as a function that returns the implicit preferences of X, if they exist. This probably accounts for the ambiguity.
There's no reason you can't as an exercise in bean counting or logic chopping,, but there is a question as to what that would add up to metaethically. If individual extrapolations converge, all is good. If not, then CEV is a form of ethical subjectivism, and if that is wrong, then CEV doens't work. Traditional philosophical concerns have not been entirely sidestepped.
Current extrapolated preferences. That is, maximise whatever it is that you want to change your preferences to.

When I read the meta-ethics sequence I mostly wondered why he made it so complicated and convoluted. My own take just seems a lot simpler --- which might mean it's wrong for a simple reason, too. I'm hoping someone can help.

I see ethics as about adopting some set of axioms that define which universes are morally preferable to others, and then reasoning from those axioms to decide whether an action, given the information available, has positive expected utility.

So which axioms should I adopt? Well, one simple, coherent answer is "none": be entirely nihilist. I would still prefer some universes over others, as I'd still have all my normal non-moral preferences, such as appetites etc. But it'd be all about me, and other people's interests would only count so far as they were instrumental to my own.

The problem is that the typical human mind has needs that are incompatible with nihilism. Nihilism thus becomes anti-strategic: it's an unlikely path to happiness. I feel the need to care about other people, and it doesn't help me to pretend I don't.[1]

So, nihilism is an anti-strategic ethical system for me to adopt, because it goes against my adapted and culturally learned intuiti... (read more)

I think your take is pretty much completely correct. You don't fall into the trap of arguing whether "moral facts are out there" or the trap of quibbling over definitions of "right", and you very clearly delineate the things you understand from the things you don't.
Isn't it a bit late for that question for any human, by the time a human can formulate the question? You don't really have the option of adopting it, just espousing it (including to yourself). No? You really could, all else equal, because all the (other) humans have, as you said, very similar axioms rather than terrible ones.
Your argument against nihilism is fundamentally "I feel the need to care about other people, and it doesn't help me to pretend I don't". (I'll accept for the purpose of this conversation that the empty ethical system deserves to be called "nihilism". I would have guessed the word had a different meaning, but let's not quibble over definitions.) That's not an argument against nihilism. If I want to eat three meals a day, and I want other people not to starve, and I want my wife and kids to have a good life, that's all stuff I want. Caring for other people is entirely consistent with nihilism, it's just another thing you want. Utiliarianism doesn't solve the problem of having a bunch of contradictory desires. It just leaves you trying to satisfy other people's contradictory desires instead of your own. However, I am unfamiliar with Peter Singer's version. Does it solve this problem?
I think the term nihilism is getting in the way here. Let's instead talk about "the zero axiom system". This is where you don't say that any universes are morally preferable to any others. They may be appetite-preferable, love-for-people-close-to-you preferable, etc. If no universes are morally preferable, one strategy is to be as ruthlessly self-serving as possible. I predict this would fail to make most people happy, however, because most people have a desire to help others as well as themselves. So a second strategy is to just "go with the flow" and let yourself give as much as your knee-jerk guilt or sympathy-driven reactions tell you to. You don't research charities and you still eat meat, but maybe you give to a disaster relief appeal when the people suffering are rich enough or similar enough to you to make you sympathetic. All I'm really saying is that this second approach is also anti-strategic once you get to a certain level of self-consistency, and desire for further self-consistency becomes strong enough to over-rule desire for some other comforts. I find myself in a bind where I can't care nothing, and I can't just follow my emotional moral compass. I must instead adopt making the world a better place as a top-level goal, and work strategically to make that happen. That requires me to adopt some definition of what constitutes a better universe that isn't rooted in my self-interest. In other words, my self-interest depends on having goals that don't themselves refer to my self-interest. And those goals have to do that in entirely good-faith. I can't fake this, because that contradicts my need for self-consistency. In other words, I'm saying that someone becomes vegetarian when their need for a consistent self-image about whether they behave morally starts to over-rule the sensory, health and social benefits of eating meat. Someone starts to tithe to charity when their need for moral consistency starts to over-rule their need for an extra 10% of thei
I can't interpret your post as a reply to my post. Did you perhaps mean to post it somewhere else? My fundamental question was, how is a desire to help others fundamentally different from a desire to eat pizza? You seem to be defining a broken version of the zero ethical system that arbitrarily disregards the former. That's a strawman. If you want to say that the zero ethical system is broken, you have to say that something breaks when people try to enact their desires, including the desires to help others. Sorry, that's incoherent. Someone is helped if they get things they desire. If your entire set of desires is to help others, then the solution is that your desires (such as eating pizza) don't matter and theirs do. I don't think you can really do that. If you can do that, then I hope that few people do that, since somebody has to actually want something for themselves in order for this concept of helping others to make any sense. (I do believe that this morality-is-selfless statement probably lets you get positive regard from some in-group you desire. Apparently I don't desire to have that in-group.)
I did intend to reply to you, but I can see I was ineffective. I'll try harder. Fundamentally, it's not. I'm saying that there's three versions here: 1. The strawman where there's no desire to help others. Does not describe people's actual desires, but is a self-consistent and coherent approach. It's just that it wouldn't work for most people. 2. Has a desire to help others, but this manifests in behaviour more compatible with guilt-aversion than actually helping people. This is not self-consistent. If the aim is actually guilt-aversion, this collapses back to position 1), because the person must admit to themselves that other people's desires are only a correlate of what they want (which is to not feel guilty). 3. Has a desire to help others, and pursues it in good faith, using some definition of which universes are preferable that does not weight their own desires over the desires of others. There's self-reference here, because the person's desires do refer to other people's desires. But you can still maximise the measure even with the self-reference. But you do have other desires. You've got a desire for pizza, but you've also got a desire to help others. So if a 10% income sacrifice meant you get 10% less pizza, but someone else gets 300% more pizza, maybe that works out. But you don't give up 100% of your income and live in a sack-cloth.
Thanks, I think I understand better. We have some progress here: * We agree that the naive model of a selfish person who doesn't have any interest in helping others hardly ever describes real people . * We seem to agree that guilt-aversion as a desire doesn't make sense, but maybe for different reasons. I think it doesn't make sense because when I say someone desires X, I mean that they prefer worlds with property X over worlds lacking that property, and I'm only interested in X's that describe the part of the world outside of their own thought process. For the purposes of figuring out what someone desires, I don't care if they want it because of guilt aversion or because they're hungry or some other motive; all I care is that I expect them to make some effort to make it happen, given the opportunity, and taking into account their (perhaps false) model of how the world works. Maybe I do agree with you enough on this that the difference is unimportant. You said: I think you're assuming here that people who claim a desire to help people and are really motivated by guilt-aversion are ineffective. I'm not sure that's always true. Certainly, if they're ineffective at helping people due to their own internal process, in practice they don't really want to help people. I don't know what it means to "weight their own desires over the desires of others". If I'm willing to donate a kidney but not donate my only liver, and the potential liver recipient desires to have a better liver, have I weighted my own desires over the desires of others? Maybe you meant "weight their own desires to the exclusion of the desires of others". We might disagree about what it means to help others. Personally, I don't care much about what people want. For example, I have a friend who is alcoholic. He desires alcohol. I care about him and have provided him with room and board in the past when he needed it, but I don't want him to get alcohol. So my compassion for him is about me want
Yes, I think we're converging onto the interesting disagreements. This is largely an empirical point, but I think we differ on it substantially. I think if people don't think analytically, and even a little ruthlessly, they're very ineffective at helping people. The list of failure modes is long. People prefer to help people they can see at the expense of those out of sight who could be helped more cheaply. They're irrationally intolerant of uncertainty of outcome. They're not properly sensitive to scale. I haven't cited these points, but hopefully you agree. If not we can dig a little deeper into them. I just meant that self-utility doesn't get a huge multiplier when compared against others-utility. In the transplant donation example, you get just as much out of your liver as whoever you might give it to. So you'd be going down N utilons and they'd be going up N utilons, and there would be a substantial transaction cost of M utilons. So liver donation wouldn't be a useful thing to do. In another example, imagine your organs could save, say, 10 lives. I wouldn't do that. There are two angles here. The first is about strategy. You don't improve the world by being a sucker who can be taken advantage of. You do have to fight your corner, too, otherwise you just promote free-riding. If all the do-gooders get organ harvested, the world is probably not better off. But even if extremes of altruism were not anti-strategic, I can't say I'd do them either. There are lots of actions which I would have to admit result in extreme loss of self-utility and extreme gain in net utility that I don't carry out. These actions are still moral, it's just that they're more than I'm willing to do. Some people are excessively uncomfortable about this, and so give up on the idea of trying to be more moral altogether. This is to make the perfect the enemy of the good. Others are uncomfortable about it and try to twist their definition of morality into knots to conform to what they're wi
Okay, I agree that what you want to do works most of the time, and we seem to agree that you don't have good solution to the alcoholism problem, and we also seem to agree that acting from a mishmash of heuristics without any reflection or attempts to make a rational whole will very likely flounder around uselessly. Not to imply that our conversation was muddled by the following, but: we can reformulate the alcoholism problem to eliminate the addiction. Suppose my friend heard about that reality show guy who was killed by a stingray and wanted to spend his free time killing stingrays to get revenge. (I heard there are such people, but I have never met one.) I wouldn't want to help him with that, either.
There's a strip of an incredibly over-the-top vulgar comic called space moose that gets at the same idea. These acts of kindness aren't positive utility, even if the utility metric is based on desires, because they conflict with the desires of the stingrays or other victims. Preferences also need to be weighted somehow in preference utilitarianism, I suppose by importance to the person. But then hmm, anyone gets to be a utility monster by just really really really really wanting to kill the stringrays. So yeah there's a problem there. I think I need to update, and abandon preference utilitarianism even as a useful correlate of whatever the right measure would be.
While it's gratifying to win an argument, I'd rather not do it under false pretenses: We need a solution to the utility monster problem if we're going to have a Friendly AI that cares about people's desires, so it's better to solve the utility monster problem than to give up on preference utilitarianism in part because you don't know how to solve the utility monster problem. I've sketched proposed solutions to two types of utility monsters, one that has one entity with large utility and one that has a large number of entities with modest utility. If these putative solutions seem wrong to you, please post bugs, fixes, or alternatives as replies to those comments. I agree that preference utilitarianism has the problem that it doesn't free you from choosing how to weight the preferences. It also has the problem that you have to separate yourself into two parts, the part that gets to have its preference included in the weighted sum, and the part that has a preference that is the weighted sum. In reality there's only one of you, so that distinction is artificial.
Why distinguish between moral and non-moral preferences? Why are moral preferences more mutable than non-moral ones? Also, a lot of this applies to your specific situation, so it is more morality than metaethics.
The basic drive to adopt some sort of ethical system is essentially the same as other preferences, and is non-mutable. It's a preference to believe that you are making the world a better place, rather than a worse place. This introduces a definitional question of what constitutes a good world and what constitutes a bad world, which is something I think people can change their minds about. Having written that, one question that occurs to me now is, is the basic preference to believe that you're making the world a better place, or is it to simply believe you're a good person? I prefer people who make the world a better place, so the two produce the same outcomes for me. But other people might not. If you instead had a preference for people who followed good principles or exhibited certain virtues, you wouldn't feel it necessary to make the world a better place. I shouldn't assume that such people don't exist. So maybe instead of talking adopting a definition of which universes are good and bad, I should talk about adopting a definition of good and bad people. If you define a good person by the consequences of their actions, then you'd go on to define which universes are good and bad. But otherwise you might instead define which principles are good, or which virtues.

QM appears to be the sequence that even the people who say they've read the sequences didn't read (judging by low votes and few commenters).

That's too bad; it may have been my favorite.

In You Provably Can't Trust Yourself, Eliezer tried to figured out why his audience didn't understand his meta-ethics sequence even after they had followed him through philosophy of language and quantum physics. Meta-ethics is my specialty, and I can't figure out what Eliezer's meta-ethical position is.

Is your difficulty in understanding how Eliezer thinks about ethics or in working out what side he fights for in various standardised intellectual battles? The first task seems fairly easy. He thinks like one would expect an intelligent reductionist programmer-type to think. Translating that into philosopher speak is somewhat more challenging.

I'm okay with Eliezer dismissing lots of standard philosophical categories as unhelpful and misleading. I have much the same attitude toward Anglophone philosophy. But anything he or someone else can do to help me understand what he is saying will be appreciated.
Non-anglophone philosophy is worse. (Phenomenology, deconstructionism,...)
No doubt.

As I understand it, Eliezer has taken the position that human values are too complex for humans to reliably formalize, and that all formalizations presented so far are or probably are incorrect. This may explain some of your difficulty in trying to find Eliezer's preferred formalization.


It is one thing to formalize values, another one to formalize a Meta-Ethics.

Given all the actually content-bearing comments in this discussion, the +7 points of this remark somehow saddens me.
One project is the descriptive one of moral psychology and moral anthropology. Because Coherent Extrapolated Volition begins with data from moral psychology and moral anthropology, that descriptive project is important for Eliezer's design of Friendly AI. Certainly, I agree with Eliezer that human values are too complex to easily formalize, because our terminal values are the product of millions of years of messy biological and cultural evolution. "Morality" is a term usually used in speech acts to refer to a set of normative questions about what we ought to do, or what we ought to value. Even if you're an ethical reductionist as I am, and reduce 'ought' such that it is a particular species of 'is', there are lots of ways to do that, and I'm not clear on how Eliezer does it.
Moral psychology and anthropology are pretty useless, because morality is too complex for humans to manually capture with accuracy, and too fragile to allow capturing without accuracy. We need better tools.
Your first claim doesn't follow from the (correct) supporting evidence. In actually implementing a CEV or other such object, it's true that one daren't program in specific object-level moral truths derived from human study. The implementation should be much more meta-level, in order to not get locked into bad assumptions. However, you and I can think of classes of possible implementation failures that might be missed if we had too naive a theory of moral psychology. Maybe a researcher who didn't know about the conscious/unconscious divide at all would come to the same implementation algorithm as one who did, but it's not out of the question that our limited knowledge could be relevant.

Are you looking to have it summarized in the terminology of standard moral philosophy?

Are there any specific questions you could ask about it?

(The main thing I found to be insufficiently unpacked is the notion of moral arguments — it's not clear to me exactly what types of arguments would qualify, as he sees it — but other than that, I think I understand it well enough to answer questions about it.)

Sure, let me try some specific questions. I'll start with what I think is clear to me about Eliezer's views: (1) Whatever moral facts exist, they must be part of the natural world. (Moral naturalism.) (2) Moral facts are not written into the "book" of the universe - values must be derived from a consideration of preferences. (In philosophical parlance, this would be something like the claim that "The only sources of normativity are relations between preferences and states of affairs.") I'll propose a third claim that I'm not so sure Eliezer would endorse: (3) What I "should" do is determined by what actions would best fulfill my preferences. (This is just a shorter way of saying that I "should" do "what I would do to satisfy my terminal values if I had correct and complete knowledge of what actions would satisfy my terminal values.") In this sense, morality is both "subjective" and "objective". It is subjective in the sense that what is "right" for me to do at any given time is determined in part by my own brain states (my preferences, which result from my terminal values). But it is objective in the sense that there are objectively correct answers about what actions will or will not best satisfy my preferences. I could even be wrong about what will best satisfy my preferences. Have I interpreted Eliezer correctly so far?

(1) Whatever moral facts exist, they must be part of the natural world. (Moral naturalism.)

In a manner of speaking, yes. Moral facts are facts about the output of a particular computation under particular conditions, so they are "part of the natural world" essentially to whatever extent you'd say the same thing about mathematical deductions. (See Math is Subjunctively Objective, Morality as Fixed Computation, and Abstracted Idealized Dynamics.)

(2) Moral facts are not written into the "book" of the universe - values must be derived from a consideration of preferences. (In philosophical parlance, this would be something like the claim that "The only sources of normativity are relations between preferences and states of affairs.")

No. Caring about people's preferences is part of morality, and an important part, I think, but it is not the entirety of morality, or the source of morality. (I'm not sure what a "source of normativity" is; does that refer to the causal history behind someone being moved by a moral argument, or something else?)

(The "Moral facts are not written into the 'book' of the universe" bit is correct.)

(3) What

... (read more)
4Eliezer Yudkowsky13y
I endorse the above.
Thanks for this! Concerning preferences, what else is part of morality besides preferences? A "source of normativity" is just anything that can justify a should or ought statement. The uncontroversial example is that goals/desires/preferences can justify hypothetical ought statements (hypothetical imperatives). So Eliezer is on solid footing there. What is debated is whether anything else can justify should or ought statements. Can categorical imperatives justify ought statements? Can divine commands do so? Can non-natural moral facts? Can intrinsic value? And if so, why is it that these things are sources of normativity but not, say, facts about which arrangements of marbles resemble Penelope Cruz when viewed from afar? My own position is that only goals/desires/preferences provide normativity, because the other proposed sources of normativity either don't provide normativity or don't exist. But if Eliezer thinks that something besides goals/desires/preferences can provide normativity, I'd like to know what that is. I'll do some reading and see if I can figure out what your last paragraph means; thanks for the link.
"Preference" is used interchangeably with "morality" in a lot of discussion, but here Adam referred to an aspect of preference/morality where you care about what other people care about, and stated that you care about that but other things as well. I don't think introducing categories like this is helpful. There are moral arguments that move you, and a framework that responds to the right moral arguments which we term "morality", things that should move you. The arguments are allowed to be anything (before you test them with the framework), and real humans clearly fail to be ideal implementations of the framework. (Here, the focus is on acceptance/rejection of moral arguments; decision theory would have you generate these yourself in the way they should be considered, or even self-improve these concepts out of the system if that will make it better.)
Oh, right, but it's still all preferences. I can have a preference to fulfill others' preferences, and I can have preferences for other things, too. Is that what you're saying? It seems to me that the method of reflective equilibrium has a partial role in Eliezer's meta-ethical thought, but that's another thing I'm not clear on. The meta-ethics sequence is something like 300 pages long and very dense and I can't keep it all in my head at the same time. I have serious reservations about reflective equilibrium (ala Brandt, Stich, and others). Do you have any thoughts on the role of reflective equilibrium in Eliezer's meta-ethics?
Possibly, but you've said that opaquely enough that I can imagine you intending a meaning I'd disagree with. For example, you refer to "other preferences", while there is only one morality (preference) in the context of any given decision problem (agent), and the way you care about other agents doesn't necessarily reference their "preference" in the same sense we are talking about our agent's preference. This is reflected in the ideas of morality being an abstract computation (something you won't see a final answer to), and the need for morality being found on a sufficiently meta level, so that the particular baggage of contemporary beliefs doesn't distort the picture. You don't want to revise the beliefs about morality yourself, because you might do it in a human way, instead of doing that in the right way.
Ah, have you not actually read through the whole sequence yet? I don't recommend reading it out of order, and I do recommend reading the whole thing. Mainly because some people in this thread (and elsewhere) are giving completely wrong summaries of it, so you would probably get a much clearer picture of it from the original source.
I've read the series all the way through, twice, but large parts of it didn't make sense to me. By reading the linked post again, I'm hoping to combine what you've said with what it says and come to some understanding.
"Inseparably Right" discusses that a bit, though again, I don't recommend reading it out of order. These stand out to me as wrong questions. I think the sequence mostly succeeded at dissolving them for me; "Invisible Frameworks" is probably the most focused discussion of that.
I do take some confort in the fact that at least at this point, even pros like Robin Hanson and Toby Ord couldn't make sense of what Eliezer was arguing, even after several rounds of back-and-forth between them. But I'll keep trying.
I read your last paragraph 5 times now and still can't make sense of it. One should drink water if one wants satisfy one's thirst. Here should is loosely used to mean that it is the optimal instrumental action to reach one's terminal goal. One should not kill is however a psychological projection of one's utility function. Here should means that one doesn't want others to engage in killing. The term should is ambiguous and vague, that's all there is to it, that's the whole problem.
Agreed 100% with this. Of course, it doesn't follow that what humans talk about when we talk about morality has the properties we talk about it having, or even that it exists at all, any more than analogous things follow about what humans talk about when we talk about Santa Claus or YHWH. To say that "I happen to care about being moral" implies that it could be some other way... that I might have happened to care about something other than being moral. That is, it implies that instead of caring about "the life of [my] friends and [my] family and [my] Significant Other and [my]self" and etc. and etc. and etc., the superposition of which is morality (according to EY), I might have cared about... well, I don't know, really. This account of morality is sufficiently unbounded that it's unclear what it excludes that's within the range of potential human values at all. I mean, sure, it excludes sorting pebbles into prime-numbered heaps, for example. But for me to say "instead of caring about morality, I might have cared about sorting pebbles into prime-numbered heaps" is kind of misleading, since the truth is I was never going to care about it; it isn't the sort of thing people care about. People aren't Pebblesorters (at least, absent brain damage). And it seems as though, if pebblesorting were the kind of thing that people sometimes cared about, then the account of morality being given would necessarily say "Well, pebblesorting is part of the complex structure of human value, and morality is that structure, and therefore caring about pebblesorting is part of caring about morality." If this account of morality doesn't exclude anything that people might actually care about, and it seems like it doesn't, then "I happen to care about being moral" is a misleading thing to say. It was never possible that I might care about anything else.
Well, psychopaths don't seem to care about morality so much. So we can at least point to morality as a particular cluster among things people care about.
That's just it; it's not clear to me that we can, on this account. Sure, there are things within morality that some people care about and other people don't. Caring about video games is an aspect of morality, for example, and some people don't care about video games. Caring about the happiness of other people is an aspect of morality, and some people (e.g., psychopaths) don't care about that. And so on. But the things that they care about instead are also parts of morality, on this account. But, OK, perhaps there's some kind of moral hierarchy on this account. Perhaps it's not possible to "be moral" on this account without, for example, caring about the happiness of other people... perhaps that's necessary, though not sufficient. In which case "I happen to care about being moral" means that I happen to care about a critical subset of the important things, as opposed to not caring about those things. OK, fair enough. I can accept that.
At least as I understood it, yes (though I'm unsure of the "my" in "my terminal values" part, as I cannot glue it together with his vision of CEV and a singleton).

My super summarized summary would be something like this: There're a certain set of values (well, a certain sort of computation to judge the value of some state of affairs, including updates in the way we compute it, and the things that it approves of are what we are concerned with) that we call "morality".

We humans simply happen to be the sorts of beings that care about this morality stuff as opposed to caring about, say, maximizing paperclips.

Further, it is better (by which I mean "more moral") to be moral than to be paperclipish. We ... (read more)

If we can't say why we morally-should care about our particular values, why should we deem them moral?

An unusual amount of the comments here are feeling unnecessary to me, so let me see if I understand this.

I have a utility function which assigns an amount of utility (positive or negative) to different qualities of world-states. (Just to be clear, ‘me being exhausted’ is a quality of a world-state, and so is ‘humans have mastered Fun Theory and apply it in a fun-maximizing fashion to humankind.’) Other humans have their own utility functions, so they may assign a different amount of utility to different qualities of world-states.

I have a place in my utilit... (read more)

Dorikka, If that's what Eliezer means, then this looks like standard practical rationality theory. You have reasons to act (preferences) so as to maximize your utility function (except that it may not be right to call it a "utility function" because there's no guarantee that each person's preference set is logically consistent). The fact that you want other people to satisfy their preferences, too, means that if enough other people want world-state X, your utility function will assign higher utility to world-state X than to world-state Y even if world-state Y has more utility in your utility function when not counting the utility in your utility function assigned to the utility functions of other people. But I don't think that's all of what Eliezer is saying because, for example, he keeps talking about the significance of a test showing that you would be okay being hit with an alien ray gun that changed your ice cream preference from chocolate to vanilla, but you wouldn't be okay being hit with an alien ray gun that changed your preferences from not-wanting-to-rape-people to wanting-to-rape-people. He also writes about the importance of a process of reflective equilibrium, though I'm not sure to what end.
To handle value uncertainty. If you don't know your terminal values, you have to discover them somehow.
Is that it? Eliezer employs reflective equilibrium as an epistemological method for figuring out what your terminal values are?
As I understand it, yes.
Or at least how to balance between them. Though there might be more to it than that. edit: more precisely (in EY's terms), to figure out how to balance the various demands of morality which, as it happens, is included in your terminal values.
I'm completely lost about that. I don't see how vanilla preferences differ from rape preferences. We just happen to weigh them differently. But that is solely a fact about our evolutionary history.
Vanilla preferences are instrumental. I prefer chocolate because of the pleasure I get from eating it. If the alien ray made me want to eat vanilla ice cream rather than chocolate ice cream while still enjoy chocolate ice cream more, I would prefer not be hit by it.
All I'm talking about is how I compute my utility function. I'm not postulating that my way of assigning utility lines up with any absolute facts, so I don't see how the fact that our brains were evolved is relevant. Is there a specific part of my post that you don't understand or that you disagree with?
I agree with you, I disagree with Yudkowksy (or don't understand him). By what you wrote you seem to disagree with him as well.
Could you link me to the post of Eliezer's that you disagree with on this? I'd like to see it.
This comment, as I wrote here. I don't understand this post.
I think that there may be a failure-to-communicate going on because I play Rationalist's Taboo with words like 'should' and 'right' when I'm not talking about something technical. In my mind, these words assert the existence of an objective morality, so I wouldn't feel comfortable using them unless everyone's utility functions converged to the same morality -- this seems really really unlikely so far. So, instead I talk about world-states that my utility function assigns utility to. What I think that Eliezer's trying to get at in No License To Be Human is that you shouldn't (for the sake of not creating rendering your stated utility function inconsistent with your emotions) be a moral relativist, and that you should pursue your utility function instead of wireheading your brain to make it feel like you're creating utility. I think that I've interpreted this correctly, but I'd appreciate Eliezer telling me whether I have or not.
Hm. I can say truthfully that I don't care whether I like vanilla or chocolate ice cream more. I suppose that the statement of my utility with regard to eating vanilla vs. chocolate ice cream would be 'I assign higher utility to eating the flavor of ice cream which tastes better to me.' That is, I only care about a state of my mind. So, if the circumstances changed so I could procure that state of mind by other means (ex: eating vanilla instead of chocolate ice cream), I would have no problem with that. The action that I would take after being hit by the alien ray gun does not give me any less utility after being hit by the alien ray gun than the action that I take now gives me in the present. So I don't care whether I get hit by the ray gun. But my statement of utility with regard to people being raped would be "I assign much lower utility to someone being raped them not being raped." Here, I care about a state of the world outside of my mind. The action that I would take after being hit by the alien ray gun (rape) has less utility under my current utility function than (~rape), so my current utility function would assign negative utility to being hit by the ray gun. This much makes sense to me. I don't know what 'reflective equilibrium' means; this may be because I didn't really make it through the metaethics sequence. After I formulated what I've said in this comment and the above one, I wasn't getting much out of it. Edit: Inserted some italics for the main difference between the two scenarios and removed a set of italics. No content changes.

In my studies of philosophy, I've mostly just tried to figure out what's correct, and not bothered to learn who came up with and believes what or to keep track of the controversies.

It occurs to me that in you're doing the opposite - thinking about what Eliezer believes, rather than about what's correct. And that seems to have translated into taking a list of standard conroversies, and expecting one of a list of standard responses to each. And the really interesting thing is, you don't seem to have found them. It seems that, for each of those questions, the... (read more)


No, I have my own thoughts on what is correct, and have written hundreds of pages about what I think is correct. Check my blog if you're curious.

But for right now, I just want to at least understand what Eliezer's positions are.

An off-topic question:

In a sense should always implies if. Can anyone point me to a "should" assertion without an implied if? If humans implicitly assume an if whenever they say should then the term is never used to propose a moral imperative but to indicate an instrumental goal.

You shall not kill if:

  • You want to follow God's law.
  • You don't want to be punished.
  • You want to please me.

It seems nobody would suggest there to be an imperative that killing is generally wrong. So where does moral realism come from?

Um, I'll suggest that. Killing: generally wrong.
Do you agree with EY on Torture vs Dust Specks? If you agree, would killing one person be justified to save 3^^^3 from being killed? If you agree, would you call killing to be right in that case?
I say bring on the specks.
I find that topic troubling. I find it comforting to know how others would decide here. So please allow me to ask another question. Would you personally die to save 3^^^3 from being killed? I thought about it myself and I would probably do it. But what is the lower bound here? Can I find an answer to such a question if I read the sequences, or at least how I can come up with my own answer?
I would personally die to save 3^^^3 persons' lives if that were the option presented to me. The sequences do not comprise a DIY guide to crafting an ethical theory. I came up with mine while I was in grad school for philosophy.
I realize I might have misunderstood moral realism. I thought moral realism proposes that there do exist agent-independent moral laws. What I meant is that nobody would suggest that the propostion 'Killing: generally wrong' is a subvenient property.
I'm pretty sure you are wrong. You have realism confused with 'universality'. Moral realism applies to the situation when you say "It is forbidden that Mary hit John" and I say "It is permissible that Mary hit John". If realism holds, then one of us is in error - one of those two statements is false. Compare to you thinking Mary is pretty and my disagreeing. Here, neither of us may be in error, because there may be no "fact of the matter" regarding Mary's prettiness. It is just a difference of opinion. Moral realism states that moral judgments are not just matters of opinion - they are matters of fact. If you had said 'observer-independent' rather than 'agent-independent', then you would have been closer to the concept of moral realism.
So moral realism is a two-valued logic? I didn't know there was a difference.
More like "Moral realism is the doctrine stating that moral questions should be addressed using a two-valued logic. As opposed, say, to aesthetic questions."
So moral realism proposes that there are sorts of moral formalisms whose truth values are observer independent, because their logic is consistent, but not agent-independent because moral formalisms are weighted subjectively based on the preferences of agents. Therefore we have a set of moral formalisms that are true facts about the world as they are endorsed by some agents but weighted differently by different agents. If you could account for all moral formalisms and how they are weighted by how many agents, would this constitute some sort of universal utility function and its equilibrium equal a world-state that could be called right?
I'm afraid that I am still not being understood. Firstly, the concepts of universalism and moral realism still make sense even if agent preferences have absolutely no impact on morality. Secondly, the notion that 'moral formalisms' can be true or false makes me squirm with incomprehension. Third, the notion that true formalisms get weighted in some way by agents leads me to think that you fail to understand the terms "true" and "false". Let me try a different example. Someone who claims that correct moral precepts derive their justification from the Koran is probably a moral realist. He is not a universalist though, if he says that Allah assigns different duties and obligations to men and women - to believers and non-believers.
What do you mean by "agent-independent"?
That two agents can differ in their behavior and perception of actions but that any fundamental difference about a set of moral laws can be considered a failure-mode as those laws are implied by the lower levels of the universe the two agents are part of. I thought that moral realism proposes that 'Killing: generally wrong' is on the same level as 'Faster than light travel: generally wrong', that moral laws are intersubjective verifiability and subject to empirical criticism. I didn't think that anyone actually believes that 'Killing: generally wrong' can be derived as an universal and optimal strategy.
I'm pretty sure I don't understand anything you just said. Sorry.
Could you elaborate on your reasoning behind the propostion 'Killing: generally wrong'? Maybe that would allow me to explain myself and especially reformulate my question if there is anyone who thinks that killing is wrong regardless of an agent's preferences.
Persons have a right not to be killed; persons who have waived or forfeited that right, and non-persons, are still entities which should not be destroyed absent adequate reason. Preferences come in with the "waived" bit, and the "adequate reason" bit, but even if nobody had any preferences (...somehow...) then it would still be wrong to kill people who retain their right not to be killed (this being the default, assuming the lack of preferences doesn't paradoxically motivate anyone to waive their rights), and still be wrong to kill waived-rights or forfeited-rights persons, or non-persons, without adequate reason. I'm prepared to summarize that as "Killing: generally wrong".
Fascinating. This view is utterly incomprehensible to me. I mean, I understand what you are saying, but I just can't understand how or why you would believe such a thing. The idea of "rights" as things that societies enact makes sense to me, but universal rights? I'd be interested on what basis you believe this. (A link or other reference is fine, too.)
I derived my theory by inventing something that satisfied as many of my intuitive desiderata about an ethical theory as possible. It isn't perfect, or at least not yet (I expect to revise it as I think of better ways to satisfy more desiderata), but I haven't found better.
What's the justification for taking your intuitive desiderata as the most (sole?) important factor in deciding on an ethical theory? As opposed to any of many other strategies, such as finding the theory which if followed would result in the greatest amount of (human?) fun, or find the theory that would be accepted by the greatest number of people who are almost universally (> 99%) regarded as virtuous people, or ...
Unless "theory which would maximize human fun" or "theory that would be well-received among people popularly understood to be virtuous" are desiderata of mine, why the in the world should I use them? It would be circular to use them to craft my ethical theory because my ethical theory says to, incomprehensible to use them because somebody else's says to, and unmotivated to use them for any other reason.
Yes, obviously. The question was in the first paragraph, not the second, which you seem to have gotten hung up on. The question, again, was: what's the justification for taking your intuitive desiderata as the most (sole?) important factor in deciding on an ethical theory? I gave examples of some strategies for choosing an ethical theory that some other people might choose only to show that it's not obviously clear that your strategy is the sole or best strategy. So the question, again, is why do you think that particular strategy is the best one (assuming you considered others and you do believe that's best)?
I'm not clear on what you're suggesting. Are you asking why I used my intuitive desiderata, as opposed to someone else's desiderata, or desiderata I picked out of a hat, or evitanda just to be contrary, or not seeking an ethical theory at all, or...? What's the salient alternative here that I'm meant to justify dismissing?
I'm asking why you decided that "choose the theory that best satisfies my intuitive desiderata" was the best method of choosing a theory. What justifies that method of "choosing a theory", if there is a justification and you did in fact think about it beforehand? If you did think about it, presumably you decided that was the best method of choosing a theory for some reason(s), and I'm asking what those reasons might be. One alternative, for example, might be for me to critically analyze my intuitions beforehand and be skeptical that all my intuitions are good for me (in the sense that acting on those intuitions best furthers all my interests weighted accordingly), and I might then choose to disgard some of my intuitive desiderata or weight them in some way before proceeding with whatever else I've decided on as a method of choosing. I might decide to just accept the theory that is most respected by my parents, or my priest, or the ethics professors that I most admire. I might decide to accept a theory on the basis of anticipating the results that believing in the theory will have on me and choosing the theory with the best anticipated effect. I haven't given the justifications here, because these are just examples, but if I were to follow one of those strategies, I would almost certainly have reasons for thinking that strategy was better than others I considered. Those reasons are what I was asking you about. Just to head off another potential misunderstanding, I'm not suggesting that you should have considered any of these or that any of these are better strategies. They're just given as evidence of the fact that your strategy is not the only one. I'm very curious what was so vague or poorly expressed or confusing in my original post if you (or anybody else) can identify something in particular.
Are you looking for a causal history or a theoretical justification...? Meh, I'll just summarize both together. Trying to unite my desiderata into a single theory that doesn't eat itself proved a good means of reconciling or prioritizing my intuitions where they conflicted. (For instance, I had warring intuitions over whether to privilege the null action or commit myself to moral luck, and chose the former because my intuition against moral luck was stronger than my wariness of the doing-allowing distinction.) I find having reconciled/prioritized desiderata more comfortable and actionable, and codifying them into a decision procedure makes them easier to act on consistently. I found all the theories I'd run across in academic puttering around to be deeply unsatisfactory in one or more ways; no authority figures I respected enough to consider emulating put forth coherent theories of their own. (One of my undergrad professors, I admired enough that I might have considered doing this, explicitly or just implicitly by letting him argue it to me in real time before I was equipped to argue back very well, but he didn't talk about his personal views during ethics class and didn't specialize in the field so I never found a paper on it by him or anything.) That meant I had to either not have one (which would lead to awkward silences when people in grad school asked me for my ethical opinions, and an uncomfortable lack of opinion when writing ethics papers, and no decision procedure to follow when I was uncertain of some real-life choice's moral status), or make up my own. To make one up with "the best anticipated effect" would presuppose consequentialism, which I rejected pretty much as soon as I heard it. I wanted the ethical theory that would lead to me giving the right answers according to me, in a principled way, to ethical dilemmas where I already had a right answer in mind (e.g. let's not murder homeless people for their organs thankyouverymuch), and let me pick my w
Thanks for the explanation. I was looking more for theoretical justification (if theoretical justification played a part for you in deciding how to choose an ethical theory). What I had in mind was, if you were going to try to convince other people that they should choose an ethical theory for the same reasons that you chose yours and should adopt the same theory you did, what would be the arguments that you would use to persuade them (limited to good-faith arguments that you actually believe rather than rhetorical strategies aimed primarily at convincing)? And there's a little of that in your answer here. Thanks for your time.
Because she wanted to (where 'wanted to' indicates after fully reflecting on all relevant factors). Doing anything other than what she wanted to do would basically be signalling bullshit. Those are all things that might be included if they are intuitive desirata of Alicorn's or she believes they are instrumentally useful in creating a theory that satisfies said desiderata. Either that or she is lying to signal naivety or submission.
I might have disagreed with this a few months ago, so, just in case people with brains similar enough to mine are reading this, I will make this as clear as possible. She had to do what she wanted to do. As in deterministically had to. There is no physical object other than her brain that makes her decisions. There is no chain of causality that could cause her to make a decision that does not start with the desires in her brain. EDIT: Eliezer has a better one for this: "Mr. Potter, in the end people all do what they want to do. Sometimes people give names like 'right' to things they want to do, but how could we possibly act on anything but our own desires?"
Yep! But I would stop short of saying that "people all do what they want to do". People tend not to reflect enough on their desires; they may act out of habit; they may not act on them even when they know what they are; and people may have hierarchies or communities of conflicting desires so that there isn't even a clear answer to "what do I want?"
Yes, I agree with this. The quote seemed wrong to me the first time I read it, which is why I forgot about it and had to add it to my post afterward. This seems like part of the reason why.
Have Tourettes.
If I had Tourettes, I would not call the part of my brain with Tourettes "me".
Kind of the point. Our actions are not directly determined by our desires.
I would not call an action that I do not decide to bring about "my action". What are we disagreeing on apart from wording? One can only do what is right if one desires to do what is right. There are many barriers between that and what actually gets done (which is why FAI is a good idea). A brain with Tourettes and one without Tourettes but with the same desires are effectively the same decision making process in different environments, up to the approximation that brains are decision making processes.
If only the courts accepted that as a defense. "If I say it aint me you must set free!"
If my body were prone to murdering people and I were unable to stop this, I would consent to being jailed. I would advocate some form of isolation or similar for anyone with this problem.
If we taboo for a sec the words "right", "wrong", "should" and "should not", how would I best approximate the concept of universal rights? Here's how: "Nearly everyone has a sense of personal sovereignty, in the sense that there exist elements of the universe that a person considers belonging to said person -- so that if another agent acts to usurp or wrest control of such elements, a strong emotion of injustice is provoked. This sense of personal sovereignty will often conflict with the sense of others, especially if the sense of injustice of inflated to include physical or intellectual property: but if we minimize the territories to certain natural boundaries (like person's bodies and minds), we can aggregate the individual territories to a large map of the universe, so that it will have huge tons of grey disputed areas but also some bright areas clearly labelled 'Alex's body belongs to Alex's sovereignty' or 'Bob's body falls to Bob's sovereignty'. "
What you say seems contrived to me. You could have uttered the exact opposite and it wouldn't change anything about the nature of reality as a whole but solely the substructure that is Alicorn.
Indeed, I have never claimed to have reality-altering superpowers such that I can make utterances that accomplish this. What's your point?
In my original comment I asked if anyone would (honestly) suggest that 'killing is wrong' is a moral imperative, that it is generally wrong. You asserted exactly that in your reply. I thought you misunderstood what I have been talking about. Now I am not so sure anymore. If that is really your opinion then I have no idea how you arrived at that belief.
2^10=1024 The fact that I chose this equation is not built into the the universe in the same way as Faster than light travel: generally wrong. In fact, I chose differently in other Everett branches. The equation is still true. The fact that Alicorn came to have these specific moral beliefs is similarly nonfundamental, but killing is still objectively Alicorn_wrong.
That is a way you can translate the use of should into a convenient logical model. But it isn't the way humans instinctively use the verbal symbol.

The standard debates ask wrong questions, there's little point answering them, you'd spend all the time explaining your preferred ways of disambiguating the hopelessly convoluted standard words. Unsurprisingly, Eliezer's metaethics doesn't actually solve all of decision theory, so it makes a lot of steps in the right direction, while still necessarily leaving you confused even if you understood every step. You'd need to ask more specific questions, clarification for specific claims. I agree that regurgitating a body of knowledge usually helps it compost, but a mere summary probably won't do the trick.

New to LessWrong?