Olle Häggström — LessWrong

LESSWRONG
LW

Replying toA friction in my dealings with friends who have not yet bought into the reality of AI risk

A friction in my dealings with friends who have not yet bought into the reality of AI risk

Thanks for these reflections! Just one small clarification:

You are right that "concern for the immortal soul (post singularity consciousness) of every human on earth" may be off-putting to normies, and that such proclamations are best avoided in favor of more down-to-Earth considerations. And while my AI xrisk concern used to have quite a bit of that kind of utopian component, short AI timelines has shifted my point of view considerably, and I now worry mainly about the diminishing prospects for me, my loved ones and the remaining eight billion highly mortal humans alive today to make it into the 2030s and 2040s.

Replying toA friction in my dealings with friends who have not yet bought into the reality of AI risk

Olle Häggström2mo

A friction in my dealings with friends who have not yet bought into the reality of AI risk

I agree with you that the quoted interjection will typically not facilitate good discussion. However, regarding your proposal to move to a hypothetical mode of discussion (i.e., conditional on the premise that AI xrisk is real), let me clarify two things:

1. When I make the quoted interjection, the discussion has typically already moved (explicitly or implicitly) into that hypothetical mode.

2. That hypothetical mode is not something I particularly strive for in these conversations (and I would in fact much prefer to discuss the truth or falsity of the premise), for two reasons:

2a. That mode typically leads to the nonproductive dynamics described in the paragraph beginning "Another option is to go into..." and... (read more)

A friction in my dealings with friends who have not yet bought into the reality of AI risk

Olle Häggström

2mo

(This is a cross-post of my blog post at Crunch Time for Humanity: https://haggstrom.substack.com/p/a-friction-in-my-dealings-with-friends)

A few months ago I was invited to a panel discussion whose title (translated from Swedish) was AI: opportunities and fears. I didn’t quite like the ring of this, because it seemed to me that “fears” could be read as a suggestion that the kind of AI risk I like to talk about at public events is mostly just in my head. My reply to the organizers was therefore something along the lines of “I would be happy to participate, but only if you change the title to AI: opportunities and risks, because I want to focus on the... (read 1078 more words →)

Replying toOn model weight preservation: Anthropic's new initiative

Olle Häggström3mo

On model weight preservation: Anthropic's new initiative

Yes, that's a possibility that may well make sense under certain circumstances. There are pros (such as being able to study the misaligned model) and cons (such as the model being stolen, decrypted and deployed in a way that results in global catastrophe) that need to be weighed against each other in the given situation. But it would be bad if this balancing act were distorted by Anthropic's prior commitment to weight preservation.

On model weight preservation: Anthropic's new initiative

Olle Häggström

3mo

In the linked text I offer a brief critical discussion of Anthropic's recently announced commitment to preserving the weights of retired models. The apex of the text is the following paragraph.

So let’s now imagine a situation a year or so from now, where Anthropic’s Claude Opus 5 (or whatever) has been deployed for some time and is suddenly discovered to have previously unknown and extremely dangerous capabilities in, say, construction of biological weapons, or cybersecurity, or self-improvement. It is then of crucial importance that Anthropic has the ability to quickly pull the plug on this AI. To put it vividly, their data centers ought to have sprinkler systems filled with gasoline, and

... (read more)