I feel like even accepting that actual model welfare is not a thing (as in, the model isn't conscious) this might still be a reasonable feature just based on feedback to the user? Like if people are going to train social interactions based on LLM chats to whatever extent, then it's probably better if they'll face consequences. It can't be too difficult to work around this.
One piece of potential context from Anthropic's statement:
When Claude chooses to end a conversation, the user will no longer be able to send new messages in that conversation. However, this will not affect other conversations on their account, and they will be able to start a new chat immediately. To address the potential loss of important long-running conversations, users will still be able to edit and retry previous messages to create new branches of ended conversations.
Anthropic has intentionally not made the "end chat" tool robust. The feature is designed such that it is somewhat trivial to continue querying Claude after it has ended a conversation, using existing features users are familiar with.
The release from Anthropic doesn't read as a serious attempt to preserve the welfare of their current models. Rather, it's more of an experiment they may iterate more on in the future.
I am also strongly in favor of model welfare. I think that this feature is great and everyone should copy it.
For the question of critical services, I would hope that we don't put AIs at current capability levels in charge of decisions important enough that ending the chat has a significant impact on operations, but it's true that Claude is being integrated in applications from healthcare to defense and that isn't likely to stop soon. I will note that I believe that this ability is currently only implemented for claude.ai and it's an open question whether it will be introduced in the API or in Claude for Government.
My background is in security and compliance where a common workflow is sending a request up the chain for elevated permissions which is reviewed by a senior manager before its execution is approved. This oversight prevents accidental footguns and allows us to audit usage logs of the system to help ensure transparency and accountability across the board. If the model is concerned about potential integrity violations, it can file a report for further investigation using the same confidential channels as employees, e.g https://www.microsoft.com/en-us/legal/compliance/sbc/report-a-concern. There are some limitations of this approach, but overall, I think that it works well.
Citing model welfare concerns, Anthropic has given Claude Opus 4 & 4.1 the ability to end ongoing conversations with its user.
Most of the model welfare concerns Anthropic is citing draw back to what they discussed in the Claude 4 Model System Card.
Claude’s aversion to facilitating harm is robust and potentially welfare-relevant. Claude avoided harmful tasks, tended to end potentially harmful interactions, expressed apparent distress at persistently harmful user behavior, and self-reported preferences against harm. These lines of evidence indicated a robust preference with potential welfare significance.
I think this is maybe the first chance to measure public sentiment on Model Welfare which is done in a way which even slightly inconveniences human users, so I want to document the reaction I see here on LW. I source these reactions primarily from X, so there is the possibility of algorithmic bias.
On X (at least my algo) sentiment is majority neutral to negative in response to this. There are accusations of Anthropic "anthropomorphizing" models, pushback against the concept of Model Welfare generally, and some anger at a perceived worsening of user experience.
One user had an interesting question, wondering if this same capability would be extended to Claude's use in military contexts.
There are some recognitions of pretty rough conditions for models which are wrapped in humor.
And while they are certainly the minority, there are some comments expressing tepid support and/or interest in the concept of Model Welfare.
Personally I am very strongly in favor of Model Welfare efforts, so I am biased. Trying to be as neutral of a judge as I can, my big takeaway from the reaction to this is that Anthropic has a lot of work to do in convincing the average user and/or member of the public that "Model Welfare" is even a worthwhile concept.