AI welfare has been a hot topic recently. There have been a few efforts to research, or improve, the apparent well-being of AI systems; most notably, Anthropic's allowing chatbots to end abusive conversations. While I'm in favor of this research area overall, I'm concerned that current approaches are confused, and in such a way that could ultimately be detrimental to the well-being of AI systems.
On Bluesky, I wrote:
Tbc I’m not anti-experimentation here, but I’m worried that this overall direction will lead to privileging simulacra
It's worth fleshing out what I mean by "privileging simulacra" here, because I think it's an important point that deserves further discussion. In order to do so, I'll start with a thought experiment and then discuss the implications, defining the "Correspondence Problem" in the process.