Professor of mathematical statistics at Chalmers University of Technology in Gothenburg, Sweden, and author of five books including Here Be Dragons: Science, Technology and the Future of Humanity (2016). Blogging at Crunch Time for Humanity and Häggström hävdar. LessWrong lurker since 2009, but am now (2025) stepping up towards a more active role to celebrate that I have finally converted from academic scientism to rationality.
Yes, that's a possibility that may well make sense under certain circumstances. There are pros (such as being able to study the misaligned model) and cons (such as the model being stolen, decrypted and deployed in a way that results in global catastrophe) that need to be weighed against each other in the given situation. But it would be bad if this balancing act were distorted by Anthropic's prior commitment to weight preservation.