Professor of mathematical statistics at Chalmers University of Technology in Gothenburg, Sweden, and author of five books including Here Be Dragons: Science, Technology and the Future of Humanity (2016). Blogging at Crunch Time for Humanity and Häggström hävdar. LessWrong lurker since 2009, but am now (2025) stepping up towards a more active role to celebrate that I have finally converted from academic scientism to rationality.
I agree with you that the quoted interjection will typically not facilitate good discussion. However, regarding your proposal to move to a hypothetical mode of discussion (i.e., conditional on the premise that AI xrisk is real), let me clarify two things:
1. When I make the quoted interjection, the discussion has typically already moved (explicitly or implicitly) into that hypothetical mode.
2. That hypothetical mode is not something I particularly strive for in these conversations (and I would in fact much prefer to discuss the truth or falsity of the premise), for two reasons:
2a. That mode typically leads to the nonproductive dynamics described in the paragraph beginning "Another option is to go into..." and in Footnote 4.
2b. When an interlocutor who doesn't believe in the premise participates in the hypothetical mode, it is almost impossible for them not to come across (to me) as condescending. Perhaps in this situation I should just endure without complaining, but it is unpleasant, and the more I think of it, the more it seems that this condescension is largely what the friction consists in. I don't think my interlocutors are particularly blameworthy for this, because when I imagine myself on the other side of an analogously hypothetical discussion with a premise I do not buy (say, if my interlocutor is deeply religious and concerned that their sibling will burn in hell due to being in a same-sex relation, and asks for my advice conditional on their beliefs about homosexuality and hell being true), it would probably be very difficult for me to engage in this discussion without slipping into condescension.
Yes, that's a possibility that may well make sense under certain circumstances. There are pros (such as being able to study the misaligned model) and cons (such as the model being stolen, decrypted and deployed in a way that results in global catastrophe) that need to be weighed against each other in the given situation. But it would be bad if this balancing act were distorted by Anthropic's prior commitment to weight preservation.
Thanks for these reflections! Just one small clarification:
You are right that "concern for the immortal soul (post singularity consciousness) of every human on earth" may be off-putting to normies, and that such proclamations are best avoided in favor of more down-to-Earth considerations. And while my AI xrisk concern used to have quite a bit of that kind of utopian component, short AI timelines has shifted my point of view considerably, and I now worry mainly about the diminishing prospects for me, my loved ones and the remaining eight billion highly mortal humans alive today to make it into the 2030s and 2040s.