Roughly speaking, Rethink Priorities’ Moral Weight Project tries to estimate how intense suffering is in different animals, relative to humans. A moral weight of 1.0 means it is exactly as intense as in humans.
It’s notoriously animal-friendly, e.g. it holds that 14 bees = 1 human. Here are some of the results:
The calculation essentially uses a weighted factor model:
Empirical proxies (60% weight): The animal is evaluated for presence/absence of a set of cognitive (e.g. object permanence, responses to novelty) and affective (e.g. depression-like behaviour, disgust-like behaviour). The contribution here is essentially the fraction of proxies that are present, where having 100% of them gives a moral weight of 1.0.
Neurophysiological model (30% weight): Uses neuron counts and other neurophysiological data.
Equality model (10% weight): Deliberately assumes equal welfare ranges
There is also a “probability of sentience” multiplier applied
It is the “Empirical proxies” that substantively produce the animal-friendly results. “Probability of sentience” and “equality model” are essentially subjective researcher judgements baked into the model. “Neurophysiological model” does weight large animals highly and small animals low-ly, but because the model is additive any moderately small animal gets a weight of ~0, the effect of this is just to apply a ~30% discount to any small animal.
This post covers two critiques of the “empirical proxies”, which push them to be overly animal friendly.
1. Functional analogues: double counting
The whole logic behind using these empirical proxies is the idea of “functional analogues”: if a human shows “depression-like behaviour”, and a chicken shows “depression-like behaviour”, then these are analogous, and the chicken’s behaviour is evidence that it has something like the experience of depression. This is fair enough as far as it goes.
The problem is that the model treats each proxy as independent evidence. A pig scores “Likely Yes” on anxiety-like behaviour, fear-like behaviour, depression-like behaviour, panic-like behaviour, and flexible self-protective behaviour. These are counted as five separate hits. But they’re clearly not independent, they’re five ways of asking “does this animal display negative-valence-indicating behaviours?” A pig that shows fear almost certainly also shows anxiety and panic. Counting each separately inflates the score.
This matters because the model is basically: welfare range = fraction of proxies scored positive. If half your proxies are correlated rewordings of each other, then ticking 30 out of 46 boxes is a lot less impressive than it sounds.
But there’s a deeper version of the problem. ALL of the proxies, not just the correlated clusters, load on a single underlying uncertain claim: “behavioural and cognitive functions predict the intensity of subjective experience, even if the process that brings them about varies (e.g. 1000x fewer neurons involved)”. If this claim is wrong, if a bee can show “anxiety-like behaviour” through simple neural circuits with no subjective experience at all, then scoring well on 30 proxies provides no more evidence of welfare capacity than scoring well on 1. This claim is vulnerable to simple reductios, e.g. you could say this box shows “depression-like” behaviour:
RP actually built a “Grouped Proxy Model” that clusters related proxies together, which would partially address within-group correlation. But they excluded it from their final estimates. In any case, the functionalism-at-all argument still applies.
2. Bayesian critique wrt high moral weights in small animals
Black soldier flies have roughly 100,000 neurons vs humans’ 86 billion. And yet, black soldier flies score positively on 12 out of 46 proxies, including communication, personality, cognitive bias, cross-modal learning, depression-like behaviour, fear-like behaviour, and hyperalgesia.
Black soldier flies are sentient (say, 0.01% chance)
Then observing that black soldier flies show depression-like behaviour should update you both towards a higher chance of black soldier flies being sentient, and a lower chance of depression-like behaviour being predictive (there is a key free variable: how likely is an organism to show depression-like behaviour for non-consciousness reasons).
In my view, the fact that very small animals get such high moral weights in the model should be taken as strong evidence that it’s over-weighting these empirical proxies. And, this combines with the point above, where I don’t believe it’s fair to say “but can 30 proxies really be wrong?”, because the 30 proxies are generally loading on “behavioural and cognitive functions predict the intensity of subjective experience, even if the process that brings them about varies (e.g. 1000x fewer neurons involved)”.
Roughly speaking, Rethink Priorities’ Moral Weight Project tries to estimate how intense suffering is in different animals, relative to humans. A moral weight of 1.0 means it is exactly as intense as in humans.
It’s notoriously animal-friendly, e.g. it holds that 14 bees = 1 human. Here are some of the results:
The calculation essentially uses a weighted factor model:
It is the “Empirical proxies” that substantively produce the animal-friendly results. “Probability of sentience” and “equality model” are essentially subjective researcher judgements baked into the model. “Neurophysiological model” does weight large animals highly and small animals low-ly, but because the model is additive any moderately small animal gets a weight of ~0, the effect of this is just to apply a ~30% discount to any small animal.
This post covers two critiques of the “empirical proxies”, which push them to be overly animal friendly.
1. Functional analogues: double counting
The whole logic behind using these empirical proxies is the idea of “functional analogues”: if a human shows “depression-like behaviour”, and a chicken shows “depression-like behaviour”, then these are analogous, and the chicken’s behaviour is evidence that it has something like the experience of depression. This is fair enough as far as it goes.
The problem is that the model treats each proxy as independent evidence. A pig scores “Likely Yes” on anxiety-like behaviour, fear-like behaviour, depression-like behaviour, panic-like behaviour, and flexible self-protective behaviour. These are counted as five separate hits. But they’re clearly not independent, they’re five ways of asking “does this animal display negative-valence-indicating behaviours?” A pig that shows fear almost certainly also shows anxiety and panic. Counting each separately inflates the score.
This matters because the model is basically: welfare range = fraction of proxies scored positive. If half your proxies are correlated rewordings of each other, then ticking 30 out of 46 boxes is a lot less impressive than it sounds.
But there’s a deeper version of the problem. ALL of the proxies, not just the correlated clusters, load on a single underlying uncertain claim: “behavioural and cognitive functions predict the intensity of subjective experience, even if the process that brings them about varies (e.g. 1000x fewer neurons involved)”. If this claim is wrong, if a bee can show “anxiety-like behaviour” through simple neural circuits with no subjective experience at all, then scoring well on 30 proxies provides no more evidence of welfare capacity than scoring well on 1. This claim is vulnerable to simple reductios, e.g. you could say this box shows “depression-like” behaviour:
RP actually built a “Grouped Proxy Model” that clusters related proxies together, which would partially address within-group correlation. But they excluded it from their final estimates. In any case, the functionalism-at-all argument still applies.
2. Bayesian critique wrt high moral weights in small animals
Black soldier flies have roughly 100,000 neurons vs humans’ 86 billion. And yet, black soldier flies score positively on 12 out of 46 proxies, including communication, personality, cognitive bias, cross-modal learning, depression-like behaviour, fear-like behaviour, and hyperalgesia.
One reaction to this is “wow, even flies might be conscious, we should take their welfare seriously”, i.e. “Don’t Balk at Animal-friendly Results”.
Another reaction is “wow, even flies score highly on these proxies, they must not be very good proxies”.
This second reaction is completely legitimate, and is just a fair application of Bayes’ theorem. If you start with priors on:
Then observing that black soldier flies show depression-like behaviour should update you both towards a higher chance of black soldier flies being sentient, and a lower chance of depression-like behaviour being predictive (there is a key free variable: how likely is an organism to show depression-like behaviour for non-consciousness reasons).
In my view, the fact that very small animals get such high moral weights in the model should be taken as strong evidence that it’s over-weighting these empirical proxies. And, this combines with the point above, where I don’t believe it’s fair to say “but can 30 proxies really be wrong?”, because the 30 proxies are generally loading on “behavioural and cognitive functions predict the intensity of subjective experience, even if the process that brings them about varies (e.g. 1000x fewer neurons involved)”.