The concept of "virtue signalling" has been a bit polarizing. Some people find it seems to explain a lot of the world; others find its lack of precision makes it almost an empty insult against others' behavior.

The model I'm about to propose doesn't capture everything people mean by "virtue signalling"; I'm not sure it even captures the heart of it. But it does demonstrate a rigorous basis for something that deserves the name "virtue signalling".

Suppose a pool of 100 players are assigned to play iterated Prisoner's Dilemma-like games, with the following properties:

0) We will relabel 'cooperate' and 'defect' actions as 'unselfish' and 'selfish', to better reflect the varying payoff matrices.

1) A match consists of 100 games.

2) 99 of the games are Deadlock, in which both players playing selfish is optimal for both players. Payoff matrix [[(1, 1), (3, 0)], [(0, 3), (2, 2)]]

3) One game, sprinkled in at random, is a Prisoner's Dilemma, with very high-magnitude payoffs, and in particular very negative ones if the other player plays selfish. Payoff matrix [[(0, 0)(100, -1500)], [(-1500, 100), (-1400, -1400)]

4) Playing a match with someone who will play selfish on a Prisoner's Dilemma is negative expected value, regardless of your play or their play on Deadlocks.

4) Playing a match with someone who plays unselfish on PD is positive expected value if you both play unselfish on Deadlocks, but of course even more positive if you both play selfish on Deadlocks.

Now suppose that the players face one another in a Round Robin structure, but get to accept or decline each match after the first. To inform their decision, they can see limited information on the other player: specifically, whether or not they have ever made a selfish play.

Now let us consider this tournament as a measure of a group's coordination ability. The least coordinated possible group would play straight selfish on their first match, then refuse to play any more matches, coming out with a total of (99 Deadlocks x 2 points per Deadlock - 1400 points per Prisoner's Dilemma x 1 Prisoner's Dilemma) = -1202 points per member. An optimally coordinated group, in contrast, would play selfish on every deadlock and unselfish on every Prisoner's Dilemma and play every match; each member would net (99 matches) x (99 Deadlocks x 2 points per deadlock + 1 PD x 0 points per PD) = 19602 points.

But the above strategy is vulnerable to defectors. Attempting the above strategy in a population with X defectors would give an honest player ((99 - X) matches with nice players) x (99 x 2)) + (X matches with defectors x -1302) = (99 - X) x 198 - 1302 X = 19602 - 1500X, while a defector gets (99 - (X-1)) matches with nice players x (99 x 2 + 100) + (X - 1) matches with defectors x -1202 = (100 - X) x 298 + (X - 1) x -1202 = 31002 - 1500X, which is strictly superior. So the optimal strategy is not a Nash equilibrium.

So let's consider an intermediate level of coordination, one willing to extend trust provisionally. 'Good' players will always play unselfish and decline games with those who have ever played selfish. If all players are 'good', they will each get (99 matches x 99 Deadlocks per match x 1 point per Deadlock) = 9801 points. A would-be defector will only get 397 points, so there's no incentive to defect. This is, I suspect but cannot prove, the optimal Nash equilibrium for the broader game; and all players get a noticeably suboptimal return in order to prove that they will not defect. That, I say, is virtue signalling.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 11:34 AM

Finally got around to reading this. I like this idea that signaling virtue is largely about proving cooporation in cases where unvirtuous behavior would be better for everyone because it shows you'll be virtuous in those cases where being unvirtuous is worse since you are so committed to virtue as to be virtuous when it is costly. This gives a nice formalization of that idea around a particular game to show how it can arise and be stable.