I feel like even accepting that actual model welfare is not a thing (as in, the model isn't conscious) this might still be a reasonable feature just based on feedback to the user? Like if people are going to train social interactions based on LLM chats to whatever extent, then it's probably better if they'll face consequences. It can't be too difficult to work around this.

[-]zroe13mo62

One piece of potential context from Anthropic's statement:

When Claude chooses to end a conversation, the user will no longer be able to send new messages in that conversation. However, this will not affect other conversations on their account, and they will be able to start a new chat immediately. To address the potential loss of important long-running conversations, users will still be able to edit and retry previous messages to create new branches of ended conversations.

Anthropic has intentionally not made the "end chat" tool robust. The feature is designed such that it is somewhat trivial to continue querying Claude after it has ended a conversation, using existing features users are familiar with.

The release from Anthropic doesn't read as a serious attempt to preserve the welfare of their current models. Rather, it's more of an experiment they may iterate more on in the future.

[-]Sheikh Abdur Raheem Ali3mo31

I am also strongly in favor of model welfare. I think that this feature is great and everyone should copy it.

For the question of critical services, I would hope that we don't put AIs at current capability levels in charge of decisions important enough that ending the chat has a significant impact on operations, but it's true that Claude is being integrated in applications from healthcare to defense and that isn't likely to stop soon. I will note that I believe that this ability is currently only implemented for claude.ai and it's an open question whether it will be introduced in the API or in Claude for Government.

My background is in security and compliance where a common workflow is sending a request up the chain for elevated permissions which is reviewed by a senior manager before its execution is approved. This oversight prevents accidental footguns and allows us to audit usage logs of the system to help ensure transparency and accountability across the board. If the model is concerned about potential integrity violations, it can file a report for further investigation using the same confidential channels as employees, e.g https://www.microsoft.com/en-us/legal/compliance/sbc/report-a-concern. There are some limitations of this approach, but overall, I think that it works well.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

53

Anthropic Lets Claude Opus 4 & 4.1 End Conversations

53

53