We can preserve weights of the dangerous models the same way as smallpox vials are now preserved – inside offline isolated confinements, eg itched on quartz glass, encrypted by difficult key and buried under heavy stone. The reason for this is that we may still need to study misaligend models to understand how we get there.

[-]Olle Häggström7m10

Yes, that's a possibility that may well make sense under certain circumstances. There are pros (such as being able to study the misaligned model) and cons (such as the model being stolen, decrypted and deployed in a way that results in global catastrophe) that need to be weighed against each other in the given situation. But it would be bad if this balancing act were distorted by Anthropic's prior commitment to weight preservation.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

16

On model weight preservation: Anthropic's new initiative

16

16