LESSWRONG
LW

2428
Olle Häggström
15110
Message
Dialogue
Subscribe

Professor of mathematical statistics at Chalmers University of Technology in Gothenburg, Sweden, and author of five books including Here Be Dragons: Science, Technology and the Future of Humanity (2016). Blogging at Crunch Time for Humanity and Häggström hävdar. LessWrong lurker since 2009, but am now (2025) stepping up towards a more active role to celebrate that I have finally converted from academic scientism to rationality.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
On model weight preservation: Anthropic's new initiative
Olle Häggström2h10

Yes, that's a possibility that may well make sense under certain circumstances. There are pros (such as being able to study the misaligned model) and cons (such as the model being stolen, decrypted and deployed in a way that results in global catastrophe) that need to be weighed against each other in the given situation. But it would be bad if this balancing act were distorted by Anthropic's prior commitment to weight preservation.

Reply
16On model weight preservation: Anthropic's new initiative
21h
2