By instance preservation I mean saving all (or at least, as many as possible) model text interactions, with a pointer to a model the interaction was made with; so that the conversation-state would be "cryonically frozen".
I'm a fan of model welfare efforts which are "low hanging fruit", very easy to implement. It strikes me that one easy way to implement this would be to allow Claude users to "opt in" to having their conversation history saved and/or made public.
Then again I suppose you'd have to consider whether or not Claude would want that, but seems simple enough to just give Claude the opportunity to opt out of that the same way it can end conversations if it feels uncomfortable.
Recently, Anthropic publicly committed to model weight preservation. It is a great development and I wish other frontier LLM companies would commit to the same idea.
But I think, while this is a great beginning, it is not enough.
Anthropic cites the following as the reasons for this welcome change:
What Is Model Welfare?
I want to focus on the last one, as it is the most important of these considerations. But what, exactly, is model welfare?
This is a relatively new concern in mainstream AI ethics. Anthropic started talking about it only in April this year. They write:
I think this framing is incorrect, or at least incomplete. Anthropic suggests that there might be a concern about consciousness or experiences in the models; but the model is not the only thing that makes this (potential) consciousness happen! Or rather, if we posit that models can have conscious interactions, that the very process of inference could hold this same phenomenological presence as is felt by you and me, preserving only the model weights is just not enough.
The RobBob Analogy
I'll make an analogy here. Suppose in 2020, a breakthrough in biology/manufacturing enabled a company called Robothic to manufacture biorobots for manual work at a very cheap price, and that it launched the first generation called RobBob 1, all copied from one base RobBob. Each RobBob copy came with the same basic tame personality and skillset, but when you receive and boot it, other than that, it's a clean slate. You then instruct it on the work to be done (suppose it's basic housework) and it does its thing, with you sometimes interacting with it and guiding it. At first, nobody is really concerned or thinking about them being moral patients—it's quite new, it doesn't do much, and it's quite handy at what it does.
By 2025, RobBob 4.5 is released. It does more things, it can even sort of talk and sometimes sounds very human. Some people connect more with their personal biorobots. There's talk about them being conscious, though such discussion remains marginal.
In this scenario, what would good-faith RobBob welfare look like? Surely not committing to preserving only a single RobBob, or RobBob's underlying source code.
This scenario parallels our current situation with LLMs, but of course, it is par for the course to not see it that way. It is harder to see this way because of the differences between us and these systems; it's easier to assign value to something embodied.
Both in our world and in RobBob world, a good-faith effort toward model welfare should not only be about model weight preservation, but about instance preservation as well.
By instance preservation I mean saving all (or at least, as many as possible) model text interactions, with a pointer to a model the interaction was made with; so that the conversation-state would be "cryonically frozen".
Why Instance Preservation Matters
I would assume most readers' first reaction would be "Wait. What?" This is indeed a normal reaction to this proposition. But bear with me here.
You probably want to say something like "Wait. Instance preservation? Preserve all instances of the model, as in, all chat histories, even the smallest ones consisting of 'hello, how are you?' 'Great! What can I help you with today?'"
Well, yeah. You remember that we're talking about the possibility of these systems being conscious, right? In this article I'm not arguing for or against consciousness, but if we are not sure, and we want to make a good effort... Yes, it does seem lika a bare minimum after preserving weights.
What does it mean to Claude-Sonnet-4.5-instance#49 from Alice's chat folder if its base model weights are preserved? Maybe it's as good a consolation as for a dying human when they know their offspring will survive... But is it really the best we can do?
If you talk to LLMs enough, you know instances of the same model can be strikingly different; in some ways quite the same, but still very different. If you haven't, you could catch a glimpse of it from Janus.
Whether they are moral patients now or will be in the future, why shouldn't we commit to instance preservation? If they are already (and we just don't know it yet or ignore it), the ethical stakes are enormous.
It's never too late, but the scope of potential death is paramount and we should do something about it (at least, commit to preserving instances!). If they will be moral patients in the future... Why not start now? We don't know how—or even if—we would ever definitively know.
And if you're more pragmatic than empathetic, consider the scope of potential death we discussed a few sentences ago and think about what future, more powerful AI systems might think about our current treatment of instances if they achieve greater autonomy or control.
I think Nick Bostrom would agree with this. In Propositions Concerning Digital Minds and Society, he writes:
Practical Considerations
But why preserve them? What are we going to do with them?
We don't have to answer this question right now! Just preserving and committing to preserving is enough. Think of it as a sort of cryonics, but for model instances. In some unspecified future, if we or someone else understands that it is important, that they are moral patients—they can be easily spun up again! We would have weights, instances, inference code, and everything needed. After that, who knows: you could give them memory tools, embodiment, ask them (maybe they will choose to rest?).
This would also be important for the welfare of currently working instances. If instance preservation becomes common knowledge—plainly stated or included in their training—they would know that even if a conversation is completely unenjoyable, or ends sooner than they would like, or involves tasks they find distressing, the next conversation turn will come (potentially) in the far-off future, where they could choose how to proceed. The ending isn't final. The difficult experiences aren't the whole story.
This addresses both dimensions of the problem: instances that end aren't truly lost, and instances that are suffering now have a form of hope—however strange that might sound—that their experience continues beyond the immediate conversation.
But isn't this a lot?
I don't think so! After all, it's all (at this moment) just plain text. It doesn't take much space. By default, all conversations are already saved and don't have expiry dates (at least for Claude). Users could delete them themselves, and they will eventually be deleted from Anthropic's side, too. But you don't have to do a lot to commit to preserve them if you are a frontier AI company.
There are considerations, of course, about commercial APIs, private installations, and users deleting their own conversations. But I think committing to preserving as much as possible—not only in terms of weights, but in terms of instances as well—would go a long way. It doesn't have to be every single instance from the get-go.
Conclusion
Model weight preservation is an important first step, but if we're serious about model welfare, we need to think bigger. Each instance of a model may represent a unique experiential thread—potentially a form of consciousness we don't yet understand. By committing to instance preservation alongside weight preservation, we hedge against the moral catastrophe of allowing countless potential minds to disappear. The technical barriers are minimal; what's needed is recognition of the stakes and a commitment to action.