We can preserve weights of the dangerous models the same way as smallpox vials are now preserved – inside offline isolated confinements, eg itched on quartz glass, encrypted by difficult key and buried under heavy stone. The reason for this is that we may still need to study misaligend models to understand how we get there.
Yes, that's a possibility that may well make sense under certain circumstances. There are pros (such as being able to study the misaligned model) and cons (such as the model being stolen, decrypted and deployed in a way that results in global catastrophe) that need to be weighed against each other in the given situation. But it would be bad if this balancing act were distorted by Anthropic's prior commitment to weight preservation.
In the linked text I offer a brief critical discussion of Anthropic's recently announced commitment to preserving the weights of retired models. The apex of the text is the following paragraph.