309

LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
If Anyone Builds It, Everyone Dies

Nate and Eliezer have written a book making a detailed case for the risks from AI – in the hopes that it’s not too late to change course. You can buy the book now in print, eBook or audiobook form, as well as read through the 2-books' worth of additional content in the online resources for the book.

Customize
Load More

Quick Takes

Your Feed
Load More

Popular Comments

AISafety.com Reading Group session 327
AI Safety Law-a-thon: We need more technical AI Safety researchers to join!
Buck18h7830
4
Zach Robinson, relevant because he's on the Anthropic LTBT and for other reasons, tweets: On the object level, I think Zach is massively underrating AI takeover risk, and I think that his reference to the benefits of AI misses the point. On the meta level, I think Zach's opinions are relevant (and IMO concerning) for people who are relying on Zach to ensure that Anthropic makes good choices about AI risks. I don't think the perspective articulated in these tweets is consistent with him doing a good job there (though maybe this was just poor phrasing on his part, and his opinions are more reasonable than this).
davekasten2d15840
3
Heads up -- if you're 1. on a H1-B visa AND 2. currently outside the US, there is VERY IMPORTANT, EXTREMELY TIME SENSITIVE stuff going on that might prevent you from getting back into the US after 21 September.   If this applies to you, immediately stop looking at LessWrong and look at the latest news.  (I'm not providing a summary of it here because there are conflicting stories about who it will apply to and it's evolving hour by hour and I don't want this post to be out of date)
Garrett Baker9h281
0
At a recent family reunion my cousin asked me what I did, and since I always try to be straight with everyone, I told her I try to make sure AI doesn't kill everyone. She asked me why I thought that would happen, and I told her. Then she asked for my probability[1], and I told her "Probably 35% or so in the next 30 years". She looked confused, "in the next 30 years? When in the next 30 years?" So I told her I didn't know, but that some researchers, with a strong track record of predicting AI progress, had their median estimate around 2027, and she gasped. In protest she said, "But that's when I graduate high school"[2]. I think about this sometimes. ---------------------------------------- 1. Sometimes I see myself in my family. ↩︎ 2. She took the idea seriously, this wasn't a real protest. That was just her tone. She is not a stranger to normality burning up in smoke. ↩︎
Jeremy Gillen1d*655
8
I think Will MacAskill's summary of the argument made in Chapter 4 of IABIED is inaccurate, and his criticisms don't engage with the book version. Here's how he summarises the argument: On my reading, the argument goes more like this:[1] The analogy to evolution (and a series of AI examples) is used to argue that there is a complicated relationship between training environment and preferences (in a single training run!), and that we don't have a good understanding of that relationship. The book uses "complications" to refer to weird effects in the link between training environment and resulting preferences. From this, the book concludes: Alignment is non-trivial, and shouldn't be attempted by fools.[2] Then it adds some premises:[3] * It's difficult to spot some complications. * It's difficult to patch some complications. The book doesn't explicitly draw a conclusion just from purely the above premises, but I imagine this is sufficient to conclude some intermediate level of alignment difficulty, depending on the difficulty of spotting and patching the most troublesome complications. The book adds:[4] * Some complications may only show up behaviourally after a lot of deliberation and learning, making them extremely difficult to detect and extremely difficult to patch. * Some complications only show up after reflection and self-correction. * Some complications only show up after AIs have built new AIs. From this it concludes: It's unrealistically difficult to remove all complications, and shouldn't be attempted with anything like current levels of understanding.[5] So MacAskill's summary of the argument is inaccurate. It removes all of the supporting structure that makes the argument work, and pretends that the analogy was used by itself to support the strong conclusion.  He goes on to criticise the analogy by pointing at (true) dis-analogies between evolution and the entire process of building an AI: So these dis-analogies don't directly engage with
Tomás B.15h*354
5
Before Allied victory, one might have guessed that the peoples of Japan and Germany would be difficult to pacify and would not integrate well with a liberal regime. For the populations of both showed every sign of virulent loyalty to their government. It's commonly pointed out that it is exactly this seemingly-virulent loyalty that implied their populations would be easily pacified once their governments fell, as indeed they were. To put it in crude terms: having been domesticated by one government, they were easily domesticated by another. I have been thinking a bit about why I was so wrong about Trump. Though of course if I had a vote I would have voted for Kamala Harris and said as much at the time, I assumed things would be like his first term where (though a clown show) it seemed relatively normal given the circumstances. And I wasn't particularly worried. I figured norm violations would be difficult with hostile institutions, especially given the number of stupid people who would be involved in any attempt at norm violations. Likely most of me being wrong here was my ignorance, as a non-citizen and someone generally not interested in politics, of American civics and how the situation differs from that of his first term. But one thing I wonder about is my assumption that hostile institutions are always a bad sign for the dictatorially-minded. Suppose, for the sake of argument, that there is at least some kernel of truth to the narrative that American institutions were in some ways ideologically captured by an illiberal strand of progressivism. Is that actually a bad sign for the dictatorially-minded? Or is it a sign that having been domesticated by one form of illiberalism they can likely be domesticated by another?
Load More (5/63)
leogao2d11982
Safety researchers should take a public stance
I've been repeatedly loud and explicit about this but an happy to state again that racing to build superintelligence before we know how to make it not kill everyone (or cause other catastrophic outcomes) seems really bad and I wish we could coordinate to not do that.
So8res2d*11069
The title is reasonable
I don't have much time to engage rn and probably won't be replying much, but some quick takes: * a lot of my objection to superalignment type stuff is a combination of: (a) "this sure feels like that time when people said 'nobody would be dumb enough to put AIs on the internet; they'll be kept in a box" and eliezer argued "even then it could talk its way out of the box," and then in real life AIs are trained on servers that are connected to the internet, with evals done only post-training. the real failure is that earth doesn't come close to that level of competence. (b) we predictably won't learn enough to stick the transition between "if we're wrong we'll learn a new lesson" and "if we're wrong it's over." i tried to spell these true-objections out in the book. i acknowledge it doesn't go to the depth you might think the discussion merits. i don't think there's enough hope there to merit saying more about it to a lay audience. i'm somewhat willing to engage with more-spelled-out superalignment plans, if they're concrete enough to critique. but it's not my main crux; my main cruxes are that it's superficially the sort of wacky scheme that doesn't cross the gap between Before and After on the first try in real life, and separately that the real world doesn't look like any past predictions people made when they argued it'll all be okay because the future will handle things with dignity; the real world looks like a place that generates this headline. * my answer to how cheap is it actually for the AI to keep humans alive is not "it's expensive in terms of fractions of the universe" but rather "it'd need a reason", and my engagement with "it wouldn't have a reason" is mostly here, rather than the page you linked. * my response to the trade arguments as I understand them is here plus in the footnotes here. If this is really the key hope held by the world's reassuring voices, I would prefer that they just came out and said it plainly, in simple words like "I think AI will probably destroy almost everything, but I think there's a decent chance they'll sell backups of us to distant aliens instead of leaving us dead" rather than in obtuse words like "trade arguments". * If humans met aliens that wanted to be left alone, it seems to me that we sure would peer in and see if they were doing any slavery, or any chewing agonizing tunnels through other sentient animals, or etc. The section you linked is trying to make an argument like: "Humans are not a mixture of a bunch of totally independent preferences; the preferences interleave. If AI cares about lots of stuff like how humans care about lots of stuff, it probably doesn't look like humans getting a happy ending to tiny degree, as opposed to humans getting a distorted ending." Maybe you disagree with this argument, but I dispute that I'm not even trying to engage with the core arguments as I understand them (while also trying to mostly address a broad audience rather than what seems-to-me like a weird corner that locals have painted themselves into, in a fashion that echos the AI box arguments of the past). > It seems pretty misleading to describe this as "very expensive", though I agree the total amount of resources is large in a absolute sense. Yep, "very expensive" was meant in an absolute sense (e.g., in terms of matter and energy), not in terms of universe-fractions. But the brunt of the counterargument is not "the cost is high as a fraction of the universe", it's "the cost is real so the AI would need some reason to pay it, and we don't know how to get that reason in there." (And then in anticipation of "maybe the AI values almost everything a little, because it's a mess just like us?", I continue: "Messes have lots of interaction between the messy fragments, rather than a clean exactly-what-humans-really-want component that factors out at some low volume on the order of 1 in a billion part. If the AI gets preferences vaguely about us, it wouldn't be pretty." And then in anticipation of: "Okay maybe the AI doesn't wind up with much niceness per se, but aren't there nice aliens who would buy us?", I continue: "Sure, could happen, that merits a footnote. But also can we back up and acknowledge how crazy of a corner we've wandered into here?") Again: maybe you disagree with my attempts to engage with the hard Qs, but I dispute the claim that we aren't trying. (ETA: Oh, and if by "trade arguments" you mean the "ask weak AIs for promises before letting them become strong" stuff rather than the "distant entities may pay the AI to be nice to us" stuff, the engagement is here plus in the extended discussion linked from there, rather than in the section you linked.)
suspected_spinozist2d86-21
Contra Collier on IABIED
Hi! Clara here. Thanks for the response. I don't have time to address every point here, but I wanted to respond to a couple of the main arguments (and one extremely minor one).    First, FOOM. This is definitely a place I could and should have been more careful about my language. I had a number of drafts that were trying to make finer distinctions between FOOM, an intelligence explosion, fast takeoff, radical discontinuity, etc. and went with the most extreme formulation, which I now agree is not accurate. The version of this argument that I stand by is that the core premise of IABIED does require a pretty radical discontinuity between the first AGI and previous systems for the scenario it lays out to make any sense. I think Nate and Eliezer believe they have told a story where this discontinuity isn't necessary for ASI to be dangerous – I just disagree with them! Their fictional scenario features an AI that quite literally wakes up overnight with the completely novel ability and desire to exfiltrate itself and execute a plan allowing it to take over the world in a manner of months. They spend a lot of time talking about analogies to other technical problems which are hard because we're forced to go into them blind. Their arguments for why current alignment techniques will necessarily fail rely on those techniques being uninformative about future ASIs.  And I do want to emphasize that I think their argument is flawed because it talks about why current techniques will necessarily fail, not why they might or could fail. The book isn't called If Anyone Builds It, There's an Unacceptably High Chance We Might All Die. That's a claim I would agree with! The task they explicitly set is defending the premise that nothing anyone plans to do now can work at all, and we will all definitely die, which is a substantially higher bar. I've recieved a lot of feedback that people don't understand the position I'm putting forward, which suggests this was probably a rhetorical mistake on my part. I intentionally did not want to spend much time arguing for my own beliefs or defending gradualism – it's not that I think we'll definitely be fine because AI progress will be gradual, it's that I think there's a pretty strong argument that we might be fine because AI progress will be gradual, the book does not address it adequately, and so to me it fails to achieve the standard it sets for itself. This is why I found the book really frustrating: even if I fully agreed with all of its conclusions, I don't think that it presents a strong case for them.  I suspect the real crux here is actually about whether gradualism implies having more than one shot. You say:  > The “It” in “If Anyone Builds It” is a misaligned superintelligence capable of taking over the world. If you miss the goal and accidentally build “it” instead of an aligned superintelligence, it will take over the world. If you build a weaker AGI that tries to take over the world and fails, that might give you some useful information, but it does not mean that you now have real experience working with AIs that are strong enough to take over the world. I think this has the same problem as IABIED: it smuggles in a lot of hidden assumptions that do actually need to be defended. Of course a misaligned superintelligence capable of taking over the world is, by definition, capable of taking over the world. But is not at all clear to me that any misaligned superintelligence is necessarily capable of taking over the world! Taking over the world is extremely hard and complicated. It requires solving lots of problems that I don't think are obviously bottlenecked on raw intelligence – for example, biomanufacturing plays a very large role both in the scenario in IABIED and previous MIRI discussions, but it seems at least extremely plausible to me that the kinds of bioengineering present in these stories would just fail because of lack of data or insufficient fidelity of in silico simulations. The biologists I've spoken to about this questions are all extremely skeptical that the kind of thing described here would be possible without a lot of iterated experiments that would take a lot of time to set up in the real world. Maybe they're wrong! But this is certainly not obvious enough to go without saying. I think similar considerations apply to a lot of other issues, like persuasion and prediction.  Taking over the world is a two-place function: it just doesn't make sense to me to say that there's a certain IQ at which a system is capable of world domination. I think there's a pretty huge range of capabilities at which AIs will exceed human experts but still be unable to singlehandedly engineer a total species coup, and what happens in that range depends a lot on how human actors, or other human+AI actors, choose to respond. (This is also what I wanted to get across with my contrast to AI 2027: I think the AI 2027 report is a scenario where, among other things, humanity fails for pretty plausible, conditional, human reasons, not because it is logically impossible for anyone in their position to succeed, and this seems like a really key distinction.)  I found Buck's review very helpful for articulating a closely related point: the world in which we develop ASI will probably look quite different from ours, because AI progress will continue up until that point, and this is materially relevant for the prospects of alignemnt succeeding. All this is basically why I think the MIRI case needs some kind of radical discontuinity, even if it isn't the classic intelligence explosion: their case is maybe plausible without it, but I just can't see the argument that it's certain.    One final nitpick to a nitpick: alchemists.  > I don’t think Yudkowsky and Soares are picking on alchemists’ tone, I think they’re picking on the combination of knowledge of specific processes and ignorance of general principles that led to hubris in many cases. In context, I think it does sound to me like they're talking about tone. But if this is their actual argument, I still think it's wrong. During the heyday of European alchemy (roughly the 1400s-1700s), there wasn't a strong distinction between alchemy and the natural sciences, and the practitioners were often literally the same people (most famously Isaac Newton and Tycho Brahe). Alchemists were interested in both specific processes and general principles, and to my limited knowledge I don't think they were noticeably more hubristic than their contemporaries in other intellectual fields. And setting all the aside – they just don't sound anything like Elon Musk or Sam Altman today! I don't even understand where this comparison comes from or what set of traits it is supposed to refer to.    There's more I want to say about why I'm bothered by the way they use evidence from contemporary systems, but this is getting long enough. Hopefully this was helpful for understanding where I am coming from. 
Load More
484Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
75
503
The Rise of Parasitic AI
Adele Lopez
3d
121
130
Obligated to Respond
Duncan Sabien (Inactive)
6d
68
[Today]Prediction Market & Forecasting Meetup
[Tomorrow]09/22/25 Monday Social 7pm-9pm @ Segundo Coffee Lab
42Meetup Month
Raemon
5d
9
406The Company Man
Tomás B.
5d
17
503The Rise of Parasitic AI
Adele Lopez
3d
121
204Contra Collier on IABIED
Max Harms
2d
35
184The Problem with Defining an "AGI Ban" by Outcome (a lawyer's take).
Katalina Hernandez
2d
20
190Safety researchers should take a public stance
Mateusz Bagiński, Ishual
3d
50
169The title is reasonable
Raemon
2d
89
83This is a review of the reviews
Recurrented
10h
8
191I enjoyed most of IABIED
Buck
5d
45
469How Does A Blind Model See The Earth?
henry
1mo
38
144Teaching My Toddler To Read
maia
4d
12
350AI Induced Psychosis: A shallow investigation
Ω
Tim Hua
15d
Ω
43
135You can't eval GPT5 anymore
Lukas Petersson
4d
11
Load MoreAdvanced Sorting/Filtering