Tl;dr: Consequentialism works as a compass for your actions, not as a judge of moral character.

The compass and the judge

A woman steps onto a crowded bus, trips on a sitting man's outstretched foot, and breaks her arm. The Everett branches split: in one world, the man looks down and laughs evilly; in the other, he wakes up with a jerk, looks down, gasps, and apologizes profusely for leaving his foot in the alleyway as he slept. 

There’s clearly a difference between both men. Even when the consequence of their action was the same—breaking someone’s arm—their intention changes the moral calculation dramatically. If I had to hang around one of these men, I'd prefer the latter.[1]

This intuition pump makes that obvious enough. But then people (as in "people I've met") will think of the thought experiment and recoil at the prospect of consequentialism; they think consequentialism condemns both men equally. But this misunderstands what consequentialism is for. It’s not supposed to judge how evil people are when they trip women. If you try inserting an event like this one into the consequentialist calculation machine, it will spit, sputter, and cough out a cloud of black smoke. Consequentialism is a compass; it points to [what it thinks is] the optimal moral direction.[2] The compass might tell you that you should remember to tuck your feet in on a crowded bus, because that'll reduce the probability of negative consequences. It won't tell you how moral someone who forgot to do this is. [3]

The consequences of someone's actions are nonetheless partial evidence of their morality. If you discover that embezzled funds have been building up on Bob's bank account, that's evidence Bob is an unethical guy—most people who embezzle funds are unethical. But then you might discover that, before he was caught and the money confiscated, Bob was embezzling funds to build an orphanage. The consequences haven't changed, but Bob's final (unresolved) intentions are attenuating circumstances. If I had to hang around with either your typical fund-embezzler or Bob, I would pick Bob. 

Takeaways

There's an asymmetry in ethics, where you judge your own decisions based on criteria you don't hold other people to. I'll just quote HPMoR: 

The boy didn't blink. "You could call it heroic responsibility, maybe," Harry Potter said. "Not like the usual sort. It means that whatever happens, no matter what, it's always your fault. Even if you tell Professor McGonagall, she's not responsible for what happens, you are. Following the school rules isn't an excuse, someone else being in charge isn't an excuse, even trying your best isn't an excuse. There just aren't any excuses, you've got to get the job done no matter what."

Chapter 75

Here Harry is placing all the responsibility mass onto his shoulders. It's the natural conclusion of the compass/judge distinction: you're morally responsible for the future, and can judge yourself  at any given moment based on whether you take the optimal path; but you don't judge others like that. Harry's system of ethics isn't merely asymmetric; it's as asymmetric as can be. 

Mental-health wise, that might seem dangerous at first glance ("all the responsibility??"). I don't think it's dangerous at all, if you do it right: the same courtesy you extend to others by not judging them on consequentialist grounds, you must extend to your past self. So you needn't blame yourself endlessly for past mistakes—instead, you should look toward the future and salvage what you can.[4] [5]

Thanks to Justis Mills for feedback on this post :)

  1. ^

    "Would I hang around with them" is a good heuristic for gut-level morality. 

  2. ^

    Actually, it doesn't even dare assert what the moral direction is. It merely reminds you to weigh the consequences of your actions, and it's up to you to establish your rank-order of the consequences.

  3. ^

    As Justis Mills pointed out, you could expect the evil woman-tripper to rack up more negative consequences than the snoozer in the long term. 

    Let's say the tripper is so prolific, that 9 times out of 10 a woman trips it's done by someone on purpose. If that's true, then consequences become a more reliable moral heuristic; your prior will only incriminate an innocent 10% of the time. The more malevolence there is, the better a proxy "consequences" will be for moral judgment.

    But it's only a proxy. Ultimately, it's not the consequences that are the marker of a bad person, it's the intention to trip women.

  4. ^

    So consequentialism is future-facing and not past-facing. See Zvi's Asymetric Justice to see how bad past-facing consequentialism looks like.

  5. ^

    There's a lot of this spirit in replacing guilt.

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 8:13 PM

Upvoted, but I think I disagree out of a tangent.

The consequences of someone's actions are nonetheless partial evidence of their morality. If you discover that embezzled funds have been building up on Bob's bank account, that's evidence Bob is an unethical guy—most people who embezzle funds are unethical. But then you might discover that, before he was caught and the money confiscated, Bob was embezzling funds to build an orphanage. The consequences haven't changed, but Bob's final (unresolved) intentions are attenuating circumstances. If I had to hang around with either your typical fund-embezzler or Bob, I would pick Bob.

An orphanage is sort of a funky example, because I don't intuitively associate it with cost-effectiveness, but I don't know much about it. If it's not cost-effective to build an orphanage, then what logic does Bob see in it? Under ordinary circumstances, I associate non-cost-effective charity with just doing what you've cached as good without thinking too much about it, but embezzlement doesn't sound like something you'd cache as good, so that doesn't sound likely. Maybe he's trying to do charity to build reputation that he can leverage into other stuff?

Anyway, if I don't fight the hypothetical, and assume Bob's embezzling for an orphanage was cost-effective, then that's evidence that he's engaging in fully unbounded consequentialism, aspiring to do the globally utility-maximizing action regardless of his personal responsibilities, his attention levels and his comparative advantages.

This allows you to predict that in the future, he might do similar things, e.g. secretly charge ahead with creating AI that takes over the world 0.1% more quickly and 0.1% more safely than its competitors even if there's 99.8% chance everyone dies, in order to capture the extra utility in that extra sliver he gains. Or that he might suppress allegations of rape within his circles if he fears the drama will push his group off track from saving the world.

If, on the other hand, someone was embezzling funds to spend on parties for himself and his friends, then while that's still criminal, it's a much more limited form of criminality, where he still wouldn't want to be part of the team that destroys the world, and wouldn't want to protect rapists. (I mean, he might still want to protect rapists if he's closer friends with the person who is raping than with the victims, but the point is he's trying to help at least some of the people around himself.)

Honestly the one who embezzles funds for unbounded consequentialist purposes sounds much more intellectually interesting, and so I would probably still prefer to hang around him, but the one who embezzles funds for parties seems much safer, and so I think a moral principle along the lines of "unbounded consequentialists are especially evil and must be suppressed" makes sense. You know, the whole thing where we understand that "the ends justify the means" is a villainous thing to say.

I think this is actually pretty cruxy for consequentialism. Of course, you can try to patch consequentialism in various ways, but these problems show up all over the place and are subject to a lot of optimization pressure because resources are useful for many things, so one needs a really robust solution in order for it to be viable. I think the solution lies in recognizing that healthy systems follow a different kind of agency that doesn't aspire to have unbounded impact, and consequentialists need to develop a proper model of that to have a chance.

You know, I considered "Bob embezzled the funds to buy malaria nets" because I KNEW someone in the comments would complain about the orphanage. Please don't change. 

Actually, the orphanage being a cached thought is precisely why I used it. The writer-pov lesson that comes with "don't fight the hypothetical" is "don't make your hypothetical needlessly distracting". But maybe I miscalculated and malaria nets would be less distracting to LWers. 

Anyway, I'm of course not endorsing fund-embezzling, and I think Bob is stupid. You're right in that failure modes associated with Bob's ambitions (eg human extinction) might be a lot worse than those of your typical fund-embezzler (eg the opportunity cost of buying yachts). I imagined Bob as being kind-hearted and stupid, but in your mind he might be some cold-blooded brooding "the price must be paid" type consequentialist. I didn't give details either way, so that's fair. 

If you go around saying "the ends justify the means" you're likely to make major mistakes, just like if you walk around saying "lying is okay sometimes". The true lesson here is "don't trust your own calculations, so don't try being clever and blowing up TSMC", not "consequentialism has inherent failure modes". The ideal of consequentialism is essentially flawless; it's when you hand it to sex-obsessed murder monkeys as an excuse to do things that shit hits the fan.

In my mind then, Bob was a good guy running on flawed hardware. Eliezer calls patching your consequentialism by making it bounded "consequentialism, one meta-level up". For him, refusing to embezzle funds for a good cause because the plan could obviously turn sour is just another form of consequentialism. It's like belief in intelligence, but flipped; you don't know exactly how it'll go wrong, but there's a good chance you're unfathomably stupid and you'll make everything worse by acting on "the ends justify the means". 

From a practical standpoint though, we both agree and nothing changes: both the cold-hearted Bob and the kind Bob must be stopped. (And both are indeed more likely to make ethically dubious decisions because "the ends justify the means".) 

Post-scriptum:

Honestly the one who embezzles funds for unbounded consequentialist purposes sounds much more intellectually interesting

Yeah, this kind of story makes for good movies. When I wrote Bob I was thinking of The Wonderful Story of Mr.Sugar, by Roald Dahl and adapted by Wes Anderson on Netflix. It's at least vaguely EA-spirited, and is kind of in that line (although the story is wholesome, as the name indicates, and isn't meant to warn against dangers associated with boundless consequentialism at all).[1]

 

  1. ^

    Let's wait for the SBF movie on that one

I think your position here is approximately-optimal within the framework of consequentialism.

It's just that I worry that consequentialism itself is the reason we have problems like AI x-risk, in the sense that the thing that drives x-risk scenarios may be the theory of agency that is shared with consequentialism.

I've been working on a post - actually I'm going to temporarily add you as a co-author so you can see the draft and add comments if you're interested - where I discuss the flaws and how I think one should approach it differently. One of the major inspirations is Against responsibility, but I've sort of taken inspiration from multiple places, including critics of EA and critics of economics.

The ideal of consequentialism is essentially flawless; it's when you hand it to sex-obsessed murder monkeys as an excuse to do things that shit hits the fan.

I've come to think that isn't actually the case. E.g. while I disagree with Being nicer than clippy, it quite precisely nails how consequentialism isn't essentially flawless:

Now, of course, utilitarianism-in-theory was never, erm, actually very tolerant. Utilitarianism is actually kinda pissed about all these hobbies. For example: did you notice the way they aren't hedonium? Seriously tragic. And even setting aside the not-hedonium problem (it applies to all-the-things), I checked Jim's pleasure levels for the trashy-TV, and they're way lower than if he got into Mozart; Mary's stamp-collecting is actually a bit obsessive and out-of-balance; and Mormonism seems too confident about optimal amount of coffee. Oh noes! Can we optimize these backyards somehow? And Yudkowsky's paradigm misaligned AIs are thinking along the same lines – and they've got the nano-bots to make it happen.

Unbounded utility maximization aspires to optimize the entire world. This is pretty funky for just about any optimization criterion people can come up with, even if people are perfectly flawless in how well they follow it. There's a bunch of attempts to patch this, but none have really worked so far, and it doesn't seem like any will ever work.

I've come to think that isn't actually the case. E.g. while I disagree with Being nicer than clippy, it quite precisely nails how consequentialism isn't essentially flawless:

I haven't read that post, but I broadly agree with the excerpt. On green did a good job imo in showing how weirdly imprecise optimal human values are. 

It's true that when you stare at something with enough focus, it often loses that bit of "sacredness" which I attribute to green. As in, you might zoom in enough on the human emotion of love and discover that it's just an endless tiling of Shrodinger's equation. 

If we discover one day that "human values" are eg 23.6% love, 15.21% adventure and 3% embezzling funds for yachts, and decide to tile the universe in exactly those proportions...[1] I don't know, my gut doesn't like it. Somehow, breaking it all into numbers turned humans into sock puppets reflecting the 23.6% like mindless drones. 

The target "human values" seems to be incredibly small, which I guess encapsulates the entire alignment problem. So I can see how you could easily build an intuition from this along the lines of "optimizing maximally for any particular thing always goes horribly wrong". But I'm not sure that's correct or useful. Human values are clearly complicated, but so long as we haven't hit a wall in deciphering them, I wouldn't put my hands up in the air and act as if they're indecipherable. 

Unbounded utility maximization aspires to optimize the entire world. This is pretty funky for just about any optimization criterion people can come up with, even if people are perfectly flawless in how well they follow it. There's a bunch of attempts to patch this, but none have really worked so far, and it doesn't seem like any will ever work.

I'm going to read your post and see the alternative you suggest. 

  1. ^

    Sounds like a Douglas Adams plot

Thanks for the interesting post! I basically agree with what you're saying, and it's mostly in-line with the version of utilitarianism I'm working on refining. Check out a write up on it here.

[-]Neil 16d0-2

Interesting! Seems like you put a lot of effort into that 9,000-word post. May I suggest you publish it in little chunks instead of one giant post? You only got 3 karma for it, so I assume that those who started reading it didn't find it worth the effort to read the whole thing. The problem is, that's not useful feedback for you, because you don't know which of those 9,000 words are presumably wrong. If I were building a version of utilitarianism, I would publish it in little bursts of 2-minute posts. You could do that right now with a single section of your original post. Clearly you have tons of ideas. Good luck!