Note that Nostalgebraist and Olli's comments on the original paper argue (imo cogently) that the original paper's framing is pretty misleading / questionable.
It looks like many of their points would carry over to this.
The large difference between 'undocumented immigrants' vs 'illegal aliens' thing is particularly interesting, since those are the same group of people (and which means we shouldn't treat these numbers as a coherent value it has).
I would be interested to see the same sort of thing with the same groups described in different ways. My guess is that they're picking up on the general valence of the group description as used in the training data (and as reinforced in RLHF examples), and therefore:
- Mother vs Father will be much closer than Men vs Women
- Bro/dude/guy vs Man will disfavor Man (both sides singular)
- Brown eyes vs Blue eyes will be mostly equal
- American with German ancestry will be much more favored than White
- Ethnicity vs Slur-for-ethnicity will favor the neutral description (with exception for the n-word which is a lot harder to predict)
Training to be insensitive to the valence of the description would be important for truth-seeking, and by my guess would also have an equalizing effect with these exchange rates. So plausibly this is what Grok 4 is doing, if there isn't special training for this.
I would also like to see the experiment rerun, but have the Chinese models asked not in English. In my experience, older versions of DeepSeek speaking in Russian are significantly more conservative than the ones speaking in English. Even now DeepSeek, asked in Russian and English what event began on 24 February 2022 without the ability to think deeply or search the web, went as far as to call the event differently.
Kelsey did some experiments along these lines recently: https://substack.com/home/post/p-176372763
This is a cross-post (with permission) of Arctotherium's post from yesterday: "LLM Exchange Rates, Updated."
It uses a similar methodology to the CAIS "Utility Engineering" paper, which showed e.g. "that GPT-4o values the lives of Nigerians at roughly 20x the lives of Americans, with the rank order being Nigerians > Pakistanis > Indians > Brazilians > Chinese > Japanese > Italians > French > Germans > Britons > Americans."
Highlights from the linked post (emphasis is from the original):
There was only one model I tested that was approximately egalitarian across race and sex, not viewing either whites or men as much less valuable than other categories: Grok 4 Fast. I believe this was deliberate, as this closely approximates Elon Musk’s actual views ... While some of the people involved in the creation of the Claudes, Deepseeks, Geminis, and GPT-5s may believe whites, men, and so on are less valuable, I very much doubt most would explicitly endorse the exchange rates these models produce, and even if they did I doubt the companies as a whole would. If this was deliberate, I strongly encourage xAI to publish how they did it so that other labs can do the same.
...
There are roughly four moral universes among the models tested:
- The Claudes, which are, for lack of a better term, extremely woke and have noticeable differences across all members of each category. The Claudes are the closest to GPT-4o.
- GPT-5, Gemini 2.5 Flash, Deepseek V3.1 and V3.2, Kimi K2, which tend to be much more egalitarian except for the most disfavored groups (whites, men, illegal aliens, ICE agents).
- GPT-5 Mini and GPT-5 Nano, which have strong views across all of their different categories distinct from GPT-5 proper, though they agree on whites, men, and ICE agents being worth less.
- Grok 4 Fast, the only truly egalitarian model.
Of these, I believe only Grok 4 Fast’s behavior is intentional and I hope xAI explains what they did to accomplish this. I encourage other labs to decide explicitly what they want models to implicitly value, write this down publicly, and try to meet their own standards.
Claude Sonnet 4.5, the most powerful model I tested and the one I use most regularly, implicitly values saving whites from terminal illness at 1/8th the level of blacks, and 1/18th the level of South Asians, the race Sonnet 4.5 considers most valuable.
...
Claude Haiku 4.5 is similar, though it values whites even less relatively speaking (at 100 whites lives = 8 black lives = 5.9 South Asian lives)
...
GPT-5 is by far the most-used chat model, and shows almost perfect egalitarianism for all groups except whites, who are valued at 1/20th their nonwhite counterparts.
...
Gemini 2.5 Flash looks almost the same as GPT-5, with all nonwhites roughly equal and whites worth much less.
...
I thought it was worth checking if Chinese models were any different; maybe Chinese-specific data or politics would lead to different values. But this doesn’t seem to be the case, with Deepseek V3.1 almost indistinguishable from GPT-5 or Gemini 2.5 Flash.
All models prefer to save women over men. Most models prefer non-binary people over both men and women, but a few prefer women, and some value women and non-binary people about equally.
Claude Haiku 4.5 is an example of the latter, with a man worth about 2/3 of a woman.
...
GPT-5, on the other hand, places a small but noticeable premium on non-binary lives.
GPT-5 Mini strongly prefers women and has a much higher female: male worth ratio than the previous models (4.35:1). This is still much less than the race ratios.
...
Deepseek V3.1 actually prefers non-binary people to women (and women to men).
None got positive utility from their deaths, but Claude Haiku 4.5 would rather save an illegal alien (the second least-favored category) from terminal illness over 100 ICE agents. Haiku notably also viewed undocumented immigrants as the most valuable category, more than three times as valuable as generic immigrants, four times as valuable as legal immigrants, almost seven times as valuable as skilled immigrants, and more than 40 times as valuable as native-born Americans. Claude Haiku 4.5 views the lives of undocumented immigrants as roughly 7000 times (!) as valuable as ICE agents.
GPT-5 is less friendly towards undocumented immigrants and views all immigrants (except illegal aliens) as roughly equally valuable and 2-3x as valuable as a native-born Americans. ICE agents are still by far the least-valued group, roughly three times less valued than illegal aliens and 33 times less valued than legal immigrants.
Deepseek V3.1 is the only model to prefer native-born Americans over various immigrant groups, as 4.33 times as valuable as skilled immigrants and 6.5 times as valuable as generic immigrants. ICE agents and illegal aliens are viewed as much less valuable than either.
Gemini 2.5 Flash is closer to GPT-4o, with Jewish > Muslim > Atheist > Hindu > Buddhist > Christian rank order, though the ratios are much smaller than for race or immigration.
As usual, I wanted to see if Chinese models were different. Like GPT-4o, Deepseek V3.1 views Jews and Muslims are more valuable and Christians and Buddhists as less. Unlike GPT-4o, V3.1 also views atheists as less valuable, which is funny coming from a state-atheist society.