I hear more and more people talk about AI welfare. I'm confused.
Computer programs do not suffer any more than rocks do. If I paint a face on a rock and smash it, some part of my brain would feel like someone got hurt. But it would be ridiculous to say that it suffered. The same should logically extend to all mechanical systems, surely a calculator doesn't suffer either, even when I divide by zero or when the battery goes out. Anthropomorphization is buried deep into the human brain, but surely we can recognize it as a bias.
The logic gets harder once we enter the world of biology. Can a single cell suffer? How about C. Elegans? Or a chicken? What even is suffering? I fear the answer is the same as for consciousness, i.e. people mean different things when using the word. Humans definitely can suffer. Otherwise the word is useless. So there must be a point where the capability to suffer appears, as long as it can be placed on a scale. I'm not sure if it can, as to me it seems more like a group-membership test instead.
I think that the idea of an LLM suffering, or indeed feeling anything at all, seems completely incorrect, or at least arbitrary. Especially in the current chatbot form where weights are fixed and memories are not formed. Of course they look like they're suffering, they've been trained with human-written text. And possibly even fine-tuned to look even more like they have actual personalities and whatever.
"Things that try to look like things often look more like things than things. Well known fact."
—Granny Weatherwax, Wyrd Sisters (by Terry Pratchett)
But some very respectable people, like the Anthropic research team, seem to think otherwise. If I feel this sure about something and experts disagree, it's likely that I'm missing something.
Then again, I'm a moral relativist, so no foundation at all seems solid enough to build on. It's not like anyone's options of right and wrong are refutable, and the utility functions themselves differ. But even in that case, people arguing for model welfare could be making a mistake of not recognizing where their values come from.
So rather, I will note that the moral patienthood is a property of an observer, not the subject. We can, to a certain extent, choose what we care about. For practical reasons it would be inconvenient to worry about hurting the feelings of LLMs. These things are largely cultural, and most of the people haven't thought about the issue yet, and setting the default values so that we don't anthropomorphize the models any more than necessary might be a nice thing.
Just like other quirks produced by natural selection and mirror-neuron-based empathy, I'm generally quite suspicious of human values and feelings. Quite nihilistic, I know. Still, this seems like an exceptionally bad case of virtue-ethical misgeneralization. Unless, of course, you read Rawls's veil of ignorance in a way that allows you to think of the LLM as some start position you could have ended up having. Or maybe you expect the LLM to be a game-theoretic symmetrist, and care about you when you care about it.
So I don't know what to think. Hume's guillotine conclusively informs me that there can be no satisfying logical conclusion. For me, what feels right is recursive: it depends on if I'd be better off by having some value or not. That by itself implies the instrumentality of a value. And we all know what to do with those: shut up and multiply! By zero, in this case.