dvd — LessWrong

If Anyone Builds It Everyone Dies, a semi-outsider review

dvd3mo*20

The AI labs most willing to take costly actions now (like hire lots of safety researchers or support AI regulation that the rest of the industry opposes or make advance commitments about the preparations they'll take before releasing future models) are also the ones talking the most about catastrophic or existential risks.

Are these actually costly actions to any meaningful degree? In the context of the amount of money sloshing around the AI space, hiring even "lots" of safety researchers seems like a rounding error.

I may misunderstand the commitments you're referring to, but I think these are all purely internal? And thus not really commitments at all.

Like if you thought this stuff was an underhanded tactic to drum up hype and get commercial success by lying to the public, then it's strange that Meta AI, not usually known for its tremendous moral integrity, is so principled about telling the truth that they basically never bring these risks up!

This seems to presume that I have some well-formed views on how AI labs compare, and I don't have those. All I really know about Meta is that they're behind and doing open source. I wouldn't even know where to start an analysis of their relative level of moral integrity. So far as it goes (and, again, this is just the view of someone that reads what breaks through in mainstream news coverage), I have a very clear sense that OpenAI is run by compulsive liars but not much more to go on beyond that other than a general sense that people in the industry do a lot of hype.

People often quit their well-paying jobs at AI companies in order to speak out about existential risk or for reasons of insufficient attention paid to AI safety from catastrophic or existential risks.

I'm deliberately not looking this up and telling you my impression of this phenomenon. I'm coming up with three cases of it (my recollection is maybe garbled) that broke though into my media universe:

My understanding is that Anthropic was formed by people who broke away from OpenAI based on "safety" concerns. But then they just founded another company doing the same thing? And they got very rich doing it. So that all has roughly zero credibility.
There was an engineer at one of the big tech companies (Google? Microsoft?) who got a lot of attention for claiming that AI had achieved sentience and deserved personhood and either quit or got fired. The universal take seemed to be that he was insane.
One of the people involved in AI 2027 had quit or gotten fired from OpenAI(?) and refused to sign an NDA that would have come with a big payday so that he could go public with criticism. That seems pretty sincere and credible so far as it goes, but it's also one person. And then AI 2027 was so overwrought that I couldn't take it seriously.

And then, beyond that, you seem to have a lot of people signing these open letters with no cost attached. For something like this to breakthrough, it needs to be (in my estimation at least) large numbers of people acting in a coordinated way and leaving the industry entirely.

I'd analogize it to politics. In any given presidential administration, you have one or two people who get really worked up and resign angrily and then go on TV attacking their former bosses. That's just to be expected and doesn't really reflect anything beyond the fact that sometimes people have strong reactions or particularized grievances or whatever. The thing that (should) wake you up is when this is happening at scale.

Are there arguments or evidence that would have convinced you the existential risk worries in the industry were real / sincere?

Only steps that carry meaningful financial consequences. I agree that any individual researcher can send a credible signal by quitting and giving up their stock, at least to the extent they don't just immediately go into a similarly compensated position. But, you're always left with the counter-signal from all the other researchers not doing that.

On a more institutional level, it would have to be something that actually threatens the valuation of the companies.

If Anyone Builds It Everyone Dies, a semi-outsider review

dvd4mo73

I you believe "there'll probably be warning shots", that's an argument against "someone will get to build It", but not an argument against "if someone built It, everyone would die." (where "it" specifically means "an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are 'organically grown' in hard to predict ways").

It's a bit of both.

Suppose there are no warning shots. A hypothetical AI that's a a bit weaker than humanity but still awfully impressive doesn't do anything at all that manifests an intent to harm us. That could mean:

The next, somewhat more capable of this AI will not have any intent to harm us because through either luck or design we've ended up with a non-threatening AI.
This version of the AI is biding its time to strike and is sufficiently good at deception that we miss that fact.
This AI is fine, but making it a little smarter/more capable will somehow lead to the emergence of malign intent.

I take Yudkowsky and Soares to put all the weight on #2 and #3 (with, based on their scenario, perhaps more of it on #2).

I don't think that's right. I think if we have reached the point where an AI really could plausibly start and win a war with us and it doesn't do anything nasty, there's a fairly good chance we're in #1. We may not even really understand how we got into #1, but sometimes things just work out.

I'm not saying this is some kind of great strategy for dealing with the risk; the scenario I'm describing is one where there's a real chance we all die and I don't think you get a strong signal until you get into the range where the AI might win, which is a bad range. But it's still very different than imagining the AI will inherently wait to strike until it has ironclad advantages.

If Anyone Builds It Everyone Dies, a semi-outsider review

dvd4mo42

Because LLMs are already avoiding being shut down: https://arxiv.org/abs/2509.14260 .

Very interest, thanks. As I said in the review, I wish there was more of this kind of thing in the book.

If your terminal goal is to enjoy watching a good movie, you can't achieve it if you're dead/shut down.

If your terminal goal is for you to watch the movie, then sure. But if your terminal goal is that the movie be watched, then shutting you down might well be perfectly consistent with it.

Ok, let's say there is an "in between" period, and let's say we win the fight against a misaligned AI. After the fight, we will still be left with the same alignment problems, as other people in this thread pointed out. We will still need to figure out how to make safe, benevolent AI, because there is no guarantee that we will win the next fight, and the fight after that, and the one after that, etc

At that point, the shut down argument is no longer speculative, and you can probably actually do it.

To be clear, I'm not saying that's a good plan if you can foresee all the developments in advance. But, if you're uncertain about all of it, then it seems like there is likely to be a period of time before it's necessarily too late when a lot of the uncertainty is resolved.

If Anyone Builds It Everyone Dies, a semi-outsider review

dvd4mo307

No, I can't. And I suspect that if the authors conducted a more realistic political analysis, the book might just be called "Everyone's Going to Die."

But, if you're trying to come up with an idea that's at least capable of meeting the magnitude of the asserted threat, then you'd consider things like:

Find a way to create a world government (a nigh-impossible ask to be sure) and then use it to ban AI.
Force anyone with relevant knowledge of how to build an AI to go into some kind of tech-free monastery and hunt anyone who refuses down with ten times the ferocity used in going after Al Qaeda after 9/11.

And then you just have to bite the bullet and accept that if these entail a risk of a nuclear war with China, then you fight a nuclear war with China. I don't think either of those would really work out either, but at least they could work out.

If there is some clever idea out there for how to achieve an AI shutdown, I suspect it involves some way of ensuring that developing AI is economically unprofitable. I personally have no idea how to do that, but unless you cut off the financial incentive, someone's going to do it.

If Anyone Builds It Everyone Dies, a semi-outsider review

dvd4mo61

Yea, I get that.

That said, they're clearly writing the book for this moment and so it would be reasonable to give some space to what's going with AI at this moment and what is likely to happen within the foreseeable future (however long that is). Book sales/readership follow a rapidly decaying exponential and so the fact that such information might well be outdated to the point of irrelevance in a few years shouldn't really hold them back.

If Anyone Builds It Everyone Dies, a semi-outsider review

dvd4mo84

But, in those cases, it's most likely better for the AI to wait, and it will know that it's better to wait, until it gets more powerful.

But why? People foolishly start wars all the time, including in specific circumstances where it would be much better to wait.

(A counterargument here is "an AI might want to launch a pre-emptive strike before other more powerful AIs show up", which could happen. But, if we win that war, we're still left with "the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI", and we still have to solve the same problems.)

Or, having fought a "war" with an AI, we have relatively clear, non-speculative evidence about the consequences of continuing AI development. And that's the point where you might actually muster politically will to cut that off in the future and take the steps necessary for that to really work.

If Anyone Builds It Everyone Dies, a semi-outsider review

dvd4mo140

Did the book convince you that if superintelligence is built in the next 20 years (however that happens, if it does, and for at least some sensible threshold-like meaning of "superintelligence"), then there is at least a 5-10% chance that as a result literally everyone will literally die?

I'm much more in the world of Knightian uncertainty here (i.e., it could happen but I have no idea how to quantify that) than in one where I feel like I can reasonably collapse it into a clear, probabilistic risk. I am persuaded that this is something that cannot be ruled out.

I have the sense that rationalists think there's a a very important distinction between "literally everyone will die" and, say, "the majority of people will suffer and/or die." I do not share that sense, and to me, the burden of proof set by the title is unreasonably high.

I'll assent to the statement that there's at least a 10% chance of something very bad happening, where "very bad" means >50% of people dying or experiencing severe suffering or something equivalent to that.

I think this kind of claim is the crux for motivating some sort of global ban or pause on rushed advanced general AI development in the near future (as an input to policy separate from the difficulty of actually making this happen). Or for not being glad that there is an "AI race" (even if it's very hard to mitigate). So it's interesting if your "not sure on existential risk" takeaway is denying or affirming this claim.

Give me a magic, zero-side effect pause button, and I'll hit it instantly.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments