Hey LW. I've just written a draft of an article, that I'm trying to edit to link to useful prior notes on various concepts. One of those concepts is the idea that we evolved to "punish the non-punishers", in order to ensure the costs of social punishment are shared by everyone.

The thing is, I vividly remember reading an article from the rationalsphere that introduced me to this idea many years ago, but I cannot seem to find it anywhere. The idea is mentioned many places where I could have seen the original article, including on Overcoming Bias, Slate Star Codex, and Less Wrong, but all my searches seem to find passing mentions in comments, or throwaway mentions in articles assuming you already know what it means.

Since my article will otherwise continue this apparent tradition of assuming everyone knows what it means and not linking to an introduction, can anyone point me to the article I'm recalling, or perhaps suggest another good non-technical introduction to the idea (i.e. not this paper), so readers of the article I'm writing who aren't yet familiar with the idea can learn about it? Thanks!

New Answer
New Comment

2 Answers sorted by

Grr, this was harder than I expected.

I think your best bet may be Eliezer's Tolerating Tolerance, with the key quote below.

Cooperation is unstable, in both game theory and evolutionary biology, without some kind of punishment for defection.  So it's one thing to subtract points off someone's reputation for mistakes they make themselves, directly.  But if you also look askance at someone for refusing to castigate a person or idea, then that is punishment of non-punishers, a far more dangerous idiom that can lock an equilibrium in place even if it's harmful to everyone involved.
The danger of punishing nonpunishers is something I remind myself of, say, every time Robin Hanson points out a flaw in some academic trope and yet modestly confesses he could be wrong (and he's not wrong).  Or every time I see Michael Vassar still considering the potential of someone who I wrote off as hopeless within 30 seconds of being introduced to them.  I have to remind myself, "Tolerate tolerance!  Don't demand that your allies be equally extreme in their negative judgments of everything you dislike!"

Note that this is only a citation about what 'punishing non-punishers' means and why it's bad (causes bad equilibria to be sticky), not a mechanism for how such a meta-norm would come about in a society.

There is also a short Overcoming Bias post by Robin called Why Towns Conform.

In general, social norms are enforced via two key informal mechanisms:
1. When norms are usually followed, rare violators are often undesirable in objective ways. They may lack intelligence or self-control, for example. So people avoid violating such norms to avoid sending bad signals about themselves.
2. Meta-norms often require observers of norm violations to punish violators, such as by refusing to associate with them. This includes observers of a failure to punish a failure to punish, and so on.

But mostly the post is pretty oblique and doesn't give mechanism for how such norms would have evolved.

(Robin also talks about meta-norms a lot more generally, for example his excellent series on automatic norms.)

Then you've also got Scott's epic Meditations on Moloch. I'd have to re-read all the examples to see if they fit the specific worry of punishing non-punishers, but it does start them with the following:

Bostrom makes an offhanded reference of the possibility of a dictatorless dystopia, one that every single citizen including the leadership hates but which nevertheless endures unconquered. It’s easy enough to imagine such a state. Imagine a country with two rules: first, every person must spend eight hours a day giving themselves strong electric shocks. Second, if anyone fails to follow a rule (including this one), or speaks out against it, or fails to enforce it, all citizens must unite to kill that person. Suppose these rules were well-enough established by tradition that everyone expected them to be enforced.
So you shock yourself for eight hours a day, because you know if you don’t everyone else will kill you, because if they don’t, everyone else will kill them, and so on. Every single citizen hates the system, but for lack of a good coordination mechanism it endures. From a god’s-eye-view, we can optimize the system to “everyone agrees to stop doing this at once”, but no one within the system is able to effect the transition without great risk to themselves.
And okay, this example is kind of contrived. So let’s run through – let’s say ten – real world examples of similar multipolar traps to really hammer in how important this is.

The only place I found that tried to explain how such norms would come about, is Eliezer's story Three Worlds Collide, explaining a possible history that would lead to... well I'll just say (content warning: infant death, cannibalism), and leave the full quote below ((hover over it to read).

"It's a truism in evolutionary biology that group selection can't work among non-relatives.  The exception is if there are enforcement mechanisms, punishment for defectors - then there's no individual advantage to cheating, because you get slapped down.  That's what happened with the Babyeaters.  They didn't restrain their individual reproduction because the more children they put in the tribal pen, the more children of theirs were likely to survive.  But the total production of offspring from the tribal pen was greater, if the children were winnowed down, and the survivors got more individual resources and attention afterward.  That was how their species began to shift toward a k-strategy, an individual survival strategy.  That was the beginning of their culture.

"And anyone who tried to cheat, to hide away a child, or even go easier on their own children during the winnowing - well, the Babyeaters treated the merciful parents the same way that human tribes treat their traitors.

"They developed psychological adaptations for enforcing that, their first great group norm.  And those psychological adaptations, those emotions, were reused over the course of their evolution, as the Babyeaters began to adapt to their more complex societies.  Honor, friendship, the good of our tribe - the Babyeaters acquired many of the same moral adaptations as humans, but their brains reused the emotional circuitry of infanticide to do it.


The Lady Sensory spoke up.  "I don't suppose... we could convince them they were wrong about that?"


"Even if you could persuade them, it might not be a good idea," said the Xenopsychologist.  "If you convinced the Babyeaters to see it our way - that they had committed a wrong of that magnitude - there isn't anything in the universe that could stop them from hunting down and exterminating themselves.  They don't have a concept of forgiveness; their only notion of why someone might go easy on a transgressor, is to spare an ally, or use them as a puppet, or being too lazy or cowardly to carry out the vengeance.  The word for wrong is the same symbol as mercy, you see."  The Xenopsychologist shook her head.  "Punishment of non-punishers is very much a way of life, with them.  A Manichaean, dualistic view of reality.  They may have literally believed that we ate babies, at first, just because we didn't open fire on them."

I also managed to find an old comment about punishing non-punishers by you PJEby.

Yeah, those are the things I found, but none of them are the thing I remember, which was something that explained how punishment is costly (risky) for the punisher due to free-riding by non-punishers, so we evolved the desire to punish non-punishers in order to ensure nobody gets away with free-riding. None of these articles cover that, which is a surprise to me since I had to have read that idea somewhere, and it feels to me like something that's part of the rationalsphere zeitgeist, yet I can't seem to place where I actually read it.

Anyway, I'm thinking

... (read more)
3Ben Pace4y
That does sound like an interesting post. And yeah, linking to this question indeed seems like a fine substitute for now :)

I think the thing you're looking for is traditionally called "third-party punishment" or "altruistic punishment", c.f. https://en.wikipedia.org/wiki/Third-party_punishment . Wikipedia cites Bendor, Jonathon; Swistak, Piot (2001). "The Evolution of Norms". American Journal of Sociology. 106 (6): 1493–1545. doi:10.1086/321298, which seems at least moderately non-technical at a glance.


I think I first encountered this in my Moral Psychology class at MIT (syllabus at http://web.mit.edu/holton/www/courses/moralpsych/home.html ), and I believe the citation was E. Fehr & U. Fischbacher 'The Nature of Human Altruism' Nature 425 (2003) 785-91.  The bottom of the first paragraph on page 787 in https://www.researchgate.net/publication/9042569_The_Nature_of_Human_Altruism ("In fact, it can be shown theoretically thateven a minority of strong reciprocators suffices to discipline amajority of selfish individuals when direct punishment is possible.") seems related but not exactly what you're looking for.

6 comments, sorted by Click to highlight new comments since: Today at 6:53 PM
One of those concepts is the idea that we evolved to "punish the non-punishers", in order to ensure the costs of social punishment are shared by everyone.

Before thinking of how to present this idea, I would study carefully whether it's true. I understand there is some disagreement regarding the origins of third-party punishment. There is a big literature on this. I won't discuss it in detail, but here are some examples of perspectives which deviate from that taken in the quoted passage.

Joe Henrich writes:

This only makes sense as cultural evolution. Not much third party punishment in many small-scale societies .

So in Henrich's view, we didn't even (biologically) evolve to punish wrong-doers (as third parties), let alone non-punishers. Third-party punishment is a result of cultural, not biological, evolution, in his view.

Another paper of potential relevance by Tooby and Cosmides and others:

A common explanation is that third-party punishment exists to maintain a cooperative society. We tested a different explanation: Third-party punishment results from a deterrence psychology for defending personal interests. Because humans evolved in small-scale, face-to-face social worlds, the mind infers that mistreatment of a third party predicts later mistreatment of oneself.

Another paper by Pedersen, Kurzban and McCullough argues that the case for altruistic punishment is overstated.

Here, we searched for evidence of altruistic punishment in an experiment that precluded these artefacts. In so doing, we found that victims of unfairness punished transgressors, whereas witnesses of unfairness did not. Furthermore, witnesses’ emotional reactions to unfairness were characterized by envy of the unfair individual's selfish gains rather than by moralistic anger towards the unfair behaviour. In a second experiment run independently in two separate samples, we found that previous evidence for altruistic punishment plausibly resulted from affective forecasting error—that is, limitations on humans’ abilities to accurately simulate how they would feel in hypothetical situations. Together, these findings suggest that the case for altruistic punishment in humans—a view that has gained increasing attention in the biological and social sciences—has been overstated.

Before thinking of how to present this idea, I would study carefully whether it's true.

I'm probably referring to the idea in a much narrower context, specifically our inclination to express outrage (or even just mild disapproval) as a form of low-cost, low-risk social punishment, and for that inclination to apply just as well to people who appear insufficiently disapproving or outraged.

The targets of this inclination may vary culturally, and it might be an artifact or side-effect of the hardware, but I'd be surprised if there were societies where nothing was ever a subject that people disapproved of other people not being disapproving of. Disapproving of the same things is a big part of what draws societies together in the first place, so failing to disapprove of the common enemy seems like something that automatically makes you "probably the enemy".

(But my reasons and evidence for thinking this way will probably be clearer in the actual article, as it's about patterns of motivated reasoning that seem to reliably pop up in certain circumstances... but then again my examples are not terribly diverse, culturally speaking.)

A different Cosmides-and-Tooby (and Michael E. Price) take:

Unfortunately, these results do not make the evolution of adaptations for collective action any less mysterious. Because punishing a free rider would generally have entailed some nontrivial cost, each potential punisher has an incentive to defect—that is, to avoid this cost by not punishing acts of free riding. Thus, the provision of punishment is itself a public good: Each individual has an incentive to free ride on the punishment activities of others. Hence, second-order free riders should be fitter (or better off) than punishers. Without a way of solving this second-order free rider problem, cooperation should unravel, with nonparticipation and nonpunishment the equilibrium outcome. Even worse, this problem reappears at each new level, revealing an infinite regress problem: Punishment needs to be visited on free riders on the original public good, and on those who do not punish free riders, and on those who do not punish those who do not punish free riders, and so on.

I've referred to this as super cooperators and super defectors, but those terms aren't used in the literature. I remember Joshua Greene citing something interesting in this space in his EAG talk in like 2015 or so but can't find much. The video of his talk doesn't have slides. The vivid memory I have is the surprising discovery of the super defectors, ie people who enforce against cooperators, and the tidbit that it only takes a small number of supers in either direction to flip the whole network over to the other equilibrium.

Yes, I managed to find posts about people who punish cooperators, the LW post by Martin Sustrik called "Anti-Social Punishment" and this old OB post by Robin called "Looking Too Good".

It might fall under stuff on "the free rider problem".

New to LessWrong?