At risk of re-hashing some things which have already been covered a lot, I wanted to outline some of my current thinking on ethics/morality/meta-ethics. I haven't yet found a distinction between these things which feels to me like more than a distinction of the moment. What I'm talking about for the purposes of this post is game-theoretic reasoning which has a tendency to promote cooperation and get good results. I'll call this "game-theoretic morality" here.

I suspect there isn't something like an objectively correct game-theoretic morality. The pragmatically best approach depends too much on what universe you're in. Players can enforce weird equilibria in iterated Prisoner's Dilemma. If you find yourself in a strange playing field, all sorts of irrational-looking strategies may be optimal. That being said, we can try to capture working principles for the situations we find we tend to get into, and hope they don't generalize too badly.

Coalition Dynamics

I think of game-theoretic morality largely in terms of coalition dynamics. In some sense, the ideal outcome is for everyone to be on the same team, maximizing a combined utility function. Unfortunately, that's not always possible. A pure altruist, who values everyone equally, is exploitable; unqualified altruism isn't a winning strategy (or rather, isn't always a winning strategy), even from the perspective of global coordination. A more pragmatic strategy is to give consideration to others in a way which incentivises joining your coalition. This often allows you to convince selfish agents to "grow the circle of empathy", creating overall better outcomes through coordination.

This line of thinking leads to things like Nash bargaining and Shapley value. Everyone in the coalition coordinates to provide maximum value to the coalition as a whole, but not treating everyone equally. Members of the coalition are valued based on their contribution to the coalition. If you want the setup to be more egalitarian, that's a matter of your values (which your coalition partners should take into account), but it's not part of the ideal game-theoretic morality.

This point is similar to Friendship is Utilitarian. Even if you have egalitarian altruistic goals, and even if your potential allies don't, it can be overall better to form alliances in which you help the people who you expect will help you most in return.

If this sounds a little too cold-hearted, there's probably a good reason for that. When I said "things like Nash bargaining and Shapely value", I was purposefully leaving things open to interpretation. I don't know what the right formal model of what I'm talking about is. I suspect there's some use for extending benefit of the doubt, in general. For example, in Prisoner's Dilemma, if your strategy is to estimate the probability p that the other person will cooperate and then cooperate with probability p yourself, the result is unstable when playing with other similar agents. However, if you cooperate with probability p+0.001, then both people are trying to be a little more cooperative than the other. You'll cooperate 100% of the time with others following the same strategy, while sacrificing very little in other situations. Common knowledge that you'll extend a little more trust than is "strictly justified" can go a long way!

By the way, in one sense, the "True Prisoner's Dilemma" is impossible between agents of the sort I'm imagining. They see the game set-up and the payoff table, and immediately figure out the Nash bargaining solution (or something like it), and re-write their own utility function to care about the other player. From this perspective, the classical presentation of Prisoner's Dilemma as a game between humans doesn't provide such bad intuitions after all.

Preference Utilitarianism

Preference utilitarianism makes a lot more sense within this kind of coalition than alternatives like hedonic utilitarianism. We help allies in what they care about, not what we think they ideally should care about. You're allowed to care about the happiness of others. Allies in your coalition will support your wishes in this respect, to the extent that you've earned it (and perhaps a little more). But, as with egalitarianism, that's a matter of your personal preference, not a matter of game-theoretic morality.

Deontology

Another wrinkle in the story is timeless decision theory, which gives something more like rule utilitarianism rather than the more common act utilitarianism. This is quite close to deontology, if not identical. In particular, it sounds quite close to Kant's categorical imperative to me.

Arguably, timeless decision theory does not exactly give rule utilitarianism: trying to take the action which you would want relevantly similar decision makers to take in relevantly similar situations is not necessarily the same as trying to act according to the set of rules which are highest-utility. Creating rules (such as "do not kill", "do not lie") risks over-generalizing in a way which trying to follow the best policy doesn't. However, this is good for humans: we can't expect to work out all the instances correctly on the spot, especially accounting for biases. Furthermore, clear rules are going to be better for coalition coordination than just generally trying to take the best actions (although there's room for both).

A common objection is that deontology is about duty, not about consequences; that even if rule utilitarians do arrive at the same conclusions, they do it for different reasons. However, from a coalition perspective, I'm not sure "duty" is such a bad way of describing the reason for following the rules.

Contractualism

The kind of reasoning here has some similarities to Scott Alexander's attempt to derive utilitarianism from contractualism.

Now, I won't try to pretend that I understand contractualism all that well, but I think orthodox contractualism (as opposed to Scott Alexander's version) does something more like a "min" operation rather than summing utility. From the SEP article:

Since individuals must be objecting on their own behalf and not on behalf of a group, this restriction to single individuals' reasons bars the interpersonal aggregation of complaints; it does not allow a number of lesser complaints to outweigh one person's weightier complaint.

I surprise myself by thinking something similar to this applies to the ideal coalition dynamics.

Harsanyi's Utilitarian Theorem is a very strong argument for the utilitarian practice of aggregating utilities by summing them. However, when we take coalition dynamics into account, we see that there's a need to keep everyone in the coalition happy. Utilitarianism will happily kill a few group members or expose them to terrible suffering for the greater good. If coalition members can foresee this fate, they will likely leave the coalition.

This situation is somewhat improved if the coalition members are using something like timeless decision theory, since they will have a greater tendency to commit to beneficial arrangements. However, assuming a typical "veil of ignorance" seems too strong -- this is like assuming that all the agents come from the same timeless perspective (a position where they're ignorant of which agent they'll become). This would allow a perfect Harsanyi coordination, but only because everyone starts out agreeing by assumption.

If there's a great degree of honor in the coalition, or other commitment mechanisms which enforce what's best for the group overall in the Harsanyi sense, then this isn't a concern. However, it seems to me that some sort of compromise between optimizing the minimum and optimizing the average will be needed. Perhaps it'd be more like optimizing the average subject to the constraint that no one is so bad off that they will leave, or optimizing the average in some way that takes into account that some people will leave.

Population Ethics

Perhaps the most famous objection to utilitarianism is the repugnant conclusion. However, from the coalition-dynamics perspective, the whole line of reasoning relies on improper comparison of utility functions of differing coalitions. You determine whether to expand a coalition by checking the utility of that act with respect to the current coalition. A small coalition with a high average preference satisfaction isn't better or worse than a large one with a medium average preference satisfaction; the two are incomparable. There's no difference between total utilitarianism and average utilitarianism if applied in the right way. A member is added to a coalition if adding that member benefits the existing coalition (from an appropriate timeless-rule perspective); adding members in this way can't result in lives barely worth living (at least, not in expectation).

This conclusion is likely weakened by benefit-of-the-doubt style reasoning. Still, the direct argument to the repugnant conclusion is blocked here.

Conclusion

The rules say we have to use consequentialism, but good people are deontologists, and virtue ethics is what actually works.

-Eliezer Yudkowsky

Part of what I'm trying to get at here is that every major candidate for normative ethics makes points which are importantly true, and that they seem easier to reconcile than (I think) is widely recognized.

On the other hand, I'm trying to argue for a very specific version of utilitarianism, which I haven't even fully worked out. I think there's a lot of fertile ground here for investigation.

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 10:27 PM

However, if you cooperate with probability p+0.001, then both people are trying to be a little more cooperative than the other. You'll cooperate 100% of the time with others following the same strategy, while sacrificing very little in other situations.

Really haven't thought much about this, but my brain wants to say that this strategy must clearly be exploitable somehow.

I won't try and argue that this strategy in particular is ideal. (FYI, this is the strategy called "nicerbot".) However, the general pattern I'm using it to point at, where you give just a little benefit of the doubt, is only slightly exploitable as a rule. This will often be worth it due to a number of cases where the strategy helps force good equilibria.

By the way, in one sense, the "True Prisoner's Dilemma" is impossible between agents of the sort I'm imagining. They see the game set-up and the payoff table, and immediately figure out the Nash bargaining solution (or something like it), and re-write their own utility function to care about the other player.

This seems strange to me. My intuitions about agent design say that you should practically never rewrite your own utility function. The thing that "re-write their own utility function" here points to seems to something more accurately described as "making an unbreakable commitment", which seems like it could be done via a separate mechanism than literally rewriting your utility function. Humans seem to do something in that space (i.e. we have desires and commitments, both of which feel quite different and separate from the inside).

I agree, that's a more accurate description. The sense in which "true prisoner's dilemma" is impossible is the sense in which your utility function is the cooperative one you commit to. It makes sense to think in terms of your "personal" (original) utility function and an "acting" utility function, or something like that.

I still think this undermines the point of the "true prisoner's dilemma", since thinking of humans gives decent intuitions about this sort of reasoning.

I very much agree with the broad gist of the post, but also have many specific points that I disagree with. This feels like a post for which inline-commenting or at a special content block that allows a comment-thread to start from that place would be extremely useful.

In the absence of that, I will write multiple replies to separate parts of the post, so that we can keep the discussion threads apart.