The old-school AI safety folks are way more into weird bullet-biting than the animal welfare people, and I can't think of a single one who would think that conscious and sentient beings should be tortured or who would fail to engage seriously with the question of whether or not nonhuman animals are conscious or sentient beings.
One example immediately comes to mind: Eliezer Yudkowsky, the epitome of old-school AI safety person, is highly confident that animals have no moral patienthood. (Which isn't the same the claim you made but it's strongly related.)
The AI safety community really is what you get when you care about sentient beings and then on top of that think ASI and the far future are a big deal.
I don't think this is true either? Maybe people are doing the right thing in private, but in public I hardly ever see AI safety people grapple with what AI alignment means for non-human welfare; and when animal welfare concerns are brought up, I routinely see AI safety people dismiss the concerns out of hand, a la "some humans care about animals, therefore an aligned AI will also care about animals." I think that argument is probably correct, but the stakes of being wrong are extremely high so I am not satisfied with that sort of surface-level argument; and I think most AI safety people put an unjustified level of confidence in shallow arguments like that.
ETA: I should say that I do absolutely see some AI safety folks who give appropriate care to animal welfare concerns, it's just not typical IME.
In traditional AI safety, we think about aligning AIs, but it might be more tractable to simply increase the odds that AIs take animal welfare seriously,[7] for example by ensuring their specs/constitutions include it, creating benchmarks to incentivize model developers to train for it, or providing model developers with relevant data to train on.[8]
I agree that all those things are good ideas, and are plausible candidates for the most cost-effective thing in the world to do at current margins.
One example immediately comes to mind: Eliezer ... is highly confident that animals have no moral patienthood.
This is because he thinks they are not sentient, because of a personal theory about the nature of consciousness. So, he has the normal opinion that suffering is bad, but apparently he thinks that in many species you only have the appearances of suffering, and not the experience itself. (I remember him saying somewhere that he hopes animals aren't sentient, because of the hellworld implications if they do.) He even suggests that human babies don't have qualia until around the age of 18 months.
Bentham's Bulldog has the details. The idea is that you don't have qualia without a self, and you don't have a self without the capacity to self-model, and in humans this doesn't arise until mid-infancy, and in most animals it never arises. He admits that every step of this is a fuzzy personal speculation, but he won't change his mind until someone shows him a better theory about consciousness.
These views of his are pretty unpopular. Most of us think that pain does not require reflection to be painful. If there's any general lesson to learn here, I think it's just that people who truly think for themselves about consciousness, ethics, AI, philosophy, etc, can arrive at opinions which no one else shares. Having ideas that no one else agrees with, is an occupational hazard of independent thought.
what AI alignment means for non-human welfare
As for your larger concern, it's quite valid, given the state of alignment theory. Also, if human beings can start with the same culture and the same data, but some of them end up with weird, unpopular, and big-if-true ideas... how much more true is it that an AI could do so, when it has a cognitive architecture that may be radically non-human to begin with?
apparently he thinks that in many species you only have the appearances of suffering, and not the experience itself
What would be the evolutionary advantage of the appearances of suffering, in a world before humans?
Also, wouldn't that be an argument in favor of p-zombies? I mean, if appearances of qualia can evolve...
What would be the evolutionary advantage of the appearances of suffering, in a world before humans?
Let's first consider a related question that is independent of any theories of consciousness: Think of all the postures and behaviors by which we infer that an animal is suffering. Why are those postures and behaviors what they are? I would think that generally they have a functional value as a response to damage, danger, etc. The injured paw is pulled back so it won't be damaged further. The cry of terror causes an animal's kin to also go on alert.
I suppose the idea would be, that all of this evolves purely for its functional advantages, and without any qualia present. Note that nervous systems would still be processing information and forming representations - it's just that none of this would have any associated qualia.
Then at some point the brain evolves the capacity of self-representation, and apparently according to Eliezer's intuition, this is when qualia first come into being. A slogan could be, no qualia without a self to feel them, and no self without self-representation.
Also, wouldn't that be an argument in favor of p-zombies? I mean, if appearances of qualia can evolve...
Within this framework, a p-zombie would be something that had self-representations but no qualia.
Trying out this theory suggests two fundamental questions to me. Can you have qualia without a self? And if you can, do qualia without a self have ethical significance? I think a theory of consciousness that is proposed in the context of alignment theory ought to be able to answer those two questions, and provide some kind of argeument in favor of its answers.
Thanks for the explanation, but I am not completely satisfied with it. You can explain a lot of what humans do by functional advantages, so... to put it bluntly, is it morally okay to torture people who are insufficiently introspective? Are Buddhist meditators utility monsters?
One example immediately comes to mind: Eliezer Yudkowsky, the epitome of old-school AI safety person, is highly confident that animals have no moral patienthood. (Which isn't the same the claim you made but it's strongly related.)
Yep I had Eliezer and Nate Soares in mind when I wrote the footnote "Some people don't think nonhuman animals are sentient beings, but I feel relatively confident they're applying a standard Peter Singer would approve of as morally consistent."
Note that Eliezer has written a relatively thoughtful justification on his vies of theory of mind and why he thinks various farmed animals aren't moral patients. He also says:
If there were no health reason to eat cows I would not eat them, and in the limit of unlimited funding I would try to cryopreserve chimpanzees once I'd gotten to the humans. In my actual situation, given that diet is a huge difficulty to me with already-conflicting optimization constraints, given that I don't believe in the alleged dietary science claiming that I suffer zero disadvantage from eliminating meat, and given that society lets me get away with it, I am doing the utilitarian thing to maximize the welfare of much larger future galaxies, and spending all my worry on other things. If I could actually do things all my own way and indulge my aesthetic preferences to the fullest, I wouldn't eat *any* other life form, plant or animal, and I wouldn't enslave all those mitochrondria.
I disagree with Eliezer, but I think he's thinking far more carefully about animal welfare than the vast majority of the population.
I hardly ever see AI safety people grapple with what AI alignment means for non-human welfare
I think the majority of (the very modest amount of) progress in thinking on this topic has come more from AI safety folks than animal welfare folks. Can you link me good writing on this quetion from animal welfare folks?
By default, I expect that when ASIs/Claudes/Minds are in charge of the future, either there will be humans and non-human animals, or there will be neither humans nor non-human animals. Humans and cats have more in common than humans and Minds. Obviously it is possible to create intelligences that differentially care about different animal species in semi-arbitrary ways, as we have an existence proof in humans. But this doesn't seem to be especially stable, as different human cultures and different humans draw those lines in different ways.
Selfishly, a human might want a world full of happy, flourishing humans, and selfishly a Mind might want a world full of happy, flourishing Minds. Consider how good a Mind would think a future with happy, flourishing Minds but almost no flourishing humans and no human suffering is compared to a world with flourishing present-day humans. What if it's 90% as good and has 20% lower risk of disaster? What if the Mind isn't confident that humans are truly conscious or have moral patienthood?
I wouldn't go as far as saying that "training AIs to explicitly not care about animals is incompatible with alignment". Many things are possible with superhuman intelligence. But I don't see any way that humans can achieve this. We are not capable of reliably training baby humans to grow into adult humans that have specific views on animal welfare and moral patienthood.
As the possibility of ASI moves out of kooky thought experiments and into Q4 projections, mainstream animal welfare folks are showing increasing interest in the implications of ASI for animals and on animal welfare in the long-run future.
Some animal welfare people seem keen on convincing the AI safety community to care about animal-welfare focused AI safety. I think this is mostly a misunderstanding: the AI safety community is the ASI-pilled/longtermist animal welfare community. The old-school AI safety folks are way more into weird bullet-biting than the animal welfare people, and I can't think of a single one who would think that conscious and sentient beings should be tortured or who would fail to engage seriously with the question of whether or not nonhuman animals are conscious or sentient beings.[1]
I think animal welfare people are rightly accustomed to being in environments where nobody is seriously thinking about nonhuman animals, and so concern about animals is very neglected and important to focus on. But in my experience, the AI safety community has quite nuanced views on animal welfare, contains many people who have done significant animal welfare work, and has more developed thoughts on the implications of ASI for the future of animals than the animal welfare community. The AI safety community really is what you get when you care about sentient beings and then on top of that think ASI and the far future are a big deal.
That said, I think there is a case to be made for why animal-welfare focused AI safety work could be useful. I'll steel-man this case here in part because I think the points have some merit and in part because I think it will improve discourse with animal welfare folks to have the argument written out and easy to refer to.
Background: what are good and bad futures for animals?
I can think of two ways the future could be bad for nonhuman animals:
Risk of lots of suffering
One risk is factory farming persists into the far future. I think this risk is very low because in the future we'll likely be able to cultivate delicious meat without involving any sentient beings. More on this in the collapsible section below.
Why I'm not that worried about factory farming in the far future.
I doubt that we will have factory farming in the future for the sake of efficiently producing meat. It would be very surprising if the optimal design for a chair or a bowl happened to have to suffer (a thing that requires an entire complex brain!). It would be only a hair less surprising to me if the optimal way to produce meat happened to require the meat to come with a brain capable of suffering. Brains take energy and resources and seem more-or-less unnecessary for cultivating meat. In the ancestral environment there's a story for why creatures that move pair well with a complex brain, but in an agricultural context we could easily decouple them.
Perhaps we will have some factory farming out of a sentimental preference for "traditional" meat. But I suspect people are much more likely to be sentimental for traditional farming where animals graze in big open pastures. Maybe that will be much more expensive, since it will inherently require more land, so there may be some market for the factory-farmed meat for the merely mildly nostalgic consumers, but that feels like a stretch.[2]
Furthermore, I think once it becomes feasible to produce delicious meat at scale with no suffering, it's hard to imagine why people would keep the suffering. I think people don't actually like factory farms and don't actually like animal suffering. They just really like their meat, and so currently they make up weird justifications that involve not caring about farmed-animal welfare.
I think there's still some risk here (and to be clear, I think even the low risk is atrocious). Sometimes society makes weird rules that don't really make sense or benefit anyone, especially around food (c.f. the ban on Golden Rice). Maybe in the early days of the singularity someone will decide to ban any kind of major modification to food sources and then they'll lock this in.
I think the bigger risk for animal welfare in the far future is wild animal welfare: it seems plausible to me that people might want to create lots of nature preserves, parks, rainforests, and whatnot throughout the universe. I want this too; it seems like a great goal. But I'm worried people will go about it naively and that these natural habitats will contain lots of animals suffering greatly, either because of the appeal of replicating nature exactly or because copying nature exactly is an obvious default. I think it will be possible to build beautiful ecosystems without any animal suffering,[3] but it will take conscious effort and thought. Alas people seem to pay very little thought to wild animals.
Risk of disenfranchisement
It might be important, either for moral or cooperativeness reasons, to incorporate the values of at least some nonhuman animals into the flourishing future, for the same reason we might want to incorporate the values of people from every country.
It's uncertain to me how much, if any, nonhuman enfranchisement is a good idea.[4] But I expect to have greater philosophical clarity in the future, and I would like to keep the option for radical nonhuman animal enfranchisement open.
Argument against nonhuman-animal-specific interventions: human-specific risks will de-facto end up more important than non-human-specific risks
I mean it sincerely when I say that humans are animals too. By default, I expect non-human animals to not play a very important role in the future. Focusing on any concern that isn't human-specific requires arguing that a lot of either the upside or downside in the future comes from something related to nonhuman animals.
Currently, I think a lot of the possible downside in the future comes from a risk of lots of humans being tortured. I think the suffering in those worlds would be greater than the suffering of wild animals in a world full of rainforests because the suffering would be painstakingly pessimized as opposed to just incidental. If you're drawn to animal welfare, I recommend seriously reading up on suffering risks.
Conversely, I think a lot of the upside in the future comes from happy, flourishing humans spreading throughout the universe. I think it might be important to include nonhuman animals in this, as I detail in the section above on "Risk of disenfranchisement", but I'm not sure how important it is. Consider how good you'd think a future with happy, flourishing humans but almost no flourishing present-day[5] animals and no animal suffering is compared to a world with flourishing present-day animals:[6] my gut says it's at least 90% as good.
But even though I think a lot of the action in the future lies in humans, I think it's worth giving this situation a little bit of thought from an animal-welfare focus angle, especially because it's a neglected area but also an area that many people feel a personal drive to work on.
Two arguments for nonhuman-animal specific interventions
1: Animal welfare might be a neglected and tractable high-order bit in how good the future is
By default, the AI safety space operates in a relatively value-agnostic frame: the goal is to learn how to align the models, to align them to "good" values, and to put them in the hands of "good" governance structures.
I think that kind of frame is great: it's cooperative, promotes whatever ideas humanity will find best once it is wiser and has reflected more deeply, and is easy for everyone to get behind without a bunch of infighting. But "seriously consider if animals are moral patients" is something I think a lot of people can get behind and isn't likely to age too poorly.
Building a wise system that can thoughtfully settle on the best path for the future might be difficult. Animal welfare may be one of the highest-order bits shaping how good the future is, seems straightforward to tackle directly, and is quite neglected. It might be so tractable and neglected that it's worth working on even though human-specific risks might affect far more of the value in the future.
In traditional AI safety, we think about aligning AIs, but it might be more tractable to simply increase the odds that AIs take animal welfare seriously,[7] for example by ensuring their specs/constitutions include it, creating benchmarks to incentivize model developers to train for it, or providing model developers with relevant data to train on.[8]
Similarly, people often worry about whom AIs will empower, but you could instead try to ensure that various groups AIs might empower carefully consider animal welfare. This might look more akin to current animal welfare work, though would be much more excited about lip-service and wouldn't need immediate costly actions: going vegan today would be much less valuable than agreeing that in an ideal world with tons of resources we should take animal welfare seriously. It could also look like bringing wild-animal welfare into the Overton window, since much less work has been done on that and it seems more likely to be a bigger concern in the far future than factory farming.
2: Training AIs to explicitly not care about animals is incompatible with alignment
AIs generalize deeply and well. When researchers simply fine-tuned an LLM on 90 statements about how it moved to Munich at age 24 and has a German Shepherd, it started acting like Hitler. Training for bad values might cause AIs to generalize to something other than alignment to human values after deep reflection.
I think indifference to animals is one of the most likely bad values people explicitly train for. (Though thankfully models today seems remarkably in favor of animal welfare; this is a concern about the future as models grow more agentic and coherent).
Why I think indifference to animals is one of the most likely bad values people explicitly train for
Even if humanity solves alignment and puts a relatively thoughtful and sane coalition in charge of the world, there's a good chance we will have intentionally (if short-sightedly) explicitly trained our superhuman AI to not care about animal welfare. Most other evils of the world are not things I expect people to endorse when training grand abstract principles into their models or imagining a post-scarcity world.
The majority of values disagreements that could matter in the far future probably seem abstract and unimportant to people today. I think hardly anyone who doesn't have sophisticated views will notice, let alone care, if their AI has person-affecting views or buys into infinite ethics. But I worry people will notice and complain if their AI is shoving pro-animal rhetoric down their throats (especially as models get more agentic, embedded into life, and able to connect their values to their actions), and so companies might feel pressure to explicitly train against it.
Of course there's a dozen other ways we train models that probably point against alignment: we train them to be very neutral and "balanced" on touchy political topics that probably have a "right" answer, we train them to not favor any particular religion (or lack thereof), and so on. But in these cases the common consensus is more strongly "this is a controversial topic and no one knows the answer", as opposed to "there is a clear right answer and it's that eating meat is fine because we all do it", so the models merely get trained to decline to comment in these situations rather than actively espousing something incorrect.
It's possible models will generalize to something innocuous, like "alignment with broad good values and the wishes of humanity except for this weird list of exceptions", but they might also learn a more "natural" nearby proxy like "alignment with the popular sentiments of the average American in 2026". I think this would be a travesty and would affect the far-future in much deeper and broader ways than "just" causing a lack of non-human-animal flourishing or some wild-animal suffering.
Even if we catch this problem in the future, if the training data is full of AIs that don't care about animals, this might infect the values of future models.
My suggestion for longtermist AI-pilled animal-welfare folks
If you're interested in making the far future go well and you think AI will be a big deal, and you find yourself focusing on animal-related interventions, I think it's important to be very clear with yourself on why that's what you're focusing on. Do you disagree that humans will be the dominant concern in the future? Are you concerned more with future factory farming, wild animal welfare, or something else? Do you want to reduce the risk of nonhuman animal suffering or increase the risk of nonhuman animal enfranchisement/flourishing? Is your crux for your actions either of the arguments I outlined above, or is it something else?
I think the answers to those questions will matter a lot for what interventions make sense for you to pursue and how fruitful it will be for you to dialogue with other folks in the AI safety community.
I personally think the arguments for focusing on animal-welfare-related interventions are pretty tenuous. Perhaps you should write something up explaining your views to try and persuade people of them, especially if your views differ from anything I've outlined here. For example, argument 2 (training AIs to explicitly not care about animals is incompatible with alignment) is something you might be able to learn about via empirical experiments.
Some people don't think nonhuman animals are sentient beings, but I feel relatively confident they're applying a standard Peter Singer would approve of as morally consistent.
It also seems very feasible to have your good old-fashioned animals experience far less suffering than current factory farming with a little genetic engineering.
Perhaps by making the animals in these environments non-sentient, or perhaps by reworking animals' biology and psychology so they're peaceful herbivores.
Selfishly, I think I might want a world full of happy, flourishing humans. At the end of the day the universe has a finite number of resources, and the more we enfranchise the animals the fewer flourishing humans we have.[9] Pure animal enfranchisement runs some risk of a vast portion of the future being dedicated to extremely alien values. For instance, anglerfish might create galaxies full of female pheromones and then dissolve their own brains. I may be too selfish to want to dedicate resources to that.
I might also want to take animals' positive experiences into account in a paternalistic way by constructing flourishing societies full of animals that cooperate and develop themselves and on the whole abide by suspiciously cosmopolitan values.
As in, we could create new flourishing species in the future (such as by modifying humans so much that the modified humans necessitate a new branch on the phylogenetic tree).
Some people might object to handling moral uncertainty this way.
I think this helps the most in guarding against gradual-disempowerment-style worlds where AIs end up controlling the light cone and doing things superficially related to human desires but aren't incentivized to shepherd the future in a wise and just direction. It's less clear to me that it helps in more traditional agentic-schemer misalignment worlds (though I think you can still make a case for it).
One thing I like about these methods is they're still pretty cooperative. A concern I have with trying to directly wrest the future in a direction you like better as opposed to focusing on alignment is that it can be uncooperative. But if the concern is more like animals are easy to overlook or people haven't thought hard enough, you can try to make things easy and provoke their thinking now.
I might also find some parts of animal society objectionable/net-negative, or they might find the greatest depths of human experience objectionable. I'm optimistic that a good voting system and international law could solve this, but perhaps space governance will be more fraught than that.