1970

LESSWRONG
LW

1969
AI RiskInterviewsRationalityAI

45

Interview with Eliezer Yudkowsky on Rationality and Systematic Misunderstanding of AI Alignment

by Liron
15th Sep 2025
Linkpost from www.youtube.com
111 min read
1

45

45

Interview with Eliezer Yudkowsky on Rationality and Systematic Misunderstanding of AI Alignment
2niplav
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 11:51 PM
[-]niplav20m20

I've not watched this particular interview, but watched a bunch of your other interviews with several people, and tbh it shades a bit too much into the Yudkowsky personality cult direction? Especially this trailer.

I'd appreciated it if you made the show more about the ideas, and less about that one particular person who doesn't matter except insofar their ideas, and I think Yudkowsky would happily fade into obscurity if his goals were achieved. But mainly the presentation is, ah, "not beating the personality cult allegations", and leaves me off feeling icky.

Reply
Moderation Log
More from Liron
View more
Curated and popular this week
1Comments
AI RiskInterviewsRationalityAI

My interview with Eliezer Yudkowsky for If Anyone Builds It, Everyone Dies launch week is out!

Video

Timestamps

  • 00:00:00 — Eliezer Yudkowsky Intro
  • 00:01:25 — Recent validation of Eliezer's ideas
  • 00:03:46 — Sh*t now getting real
  • 00:08:47 — Eliezer’s rationality teachings
  • 00:10:39 — Rationality Lesson 1: I am a brain
  • 00:13:10 — Rationality Lesson 2: Philosophy can reduce to AI engineering
  • 00:17:19 — Rationality Lesson 3: What is evidence?
  • 00:22:41 — Rationality Lesson 4: Be more specific
  • 00:28:34 — Specificity as a superpower in debates
  • 00:30:19 — Rationality Lesson 5: How to spot a rationalization
  • 00:36:52 — Rationality might upend your deepest expectations
  • 00:38:18 — The typical reaction to superintelligence risk
  • 00:40:07 — Eliezer is a techno-optimist, with a few exceptions
  • 00:47:57 — Why AI is an existential risk
  • 00:53:24 — Engineering outperforms biology
  • 01:02:09 — The threshold of "supercritical" AI
  • 01:13:23 — How to convince people there's a discontinuity ahead
  • 01:18:06 — The alignment problem: Are current AI systems aligned?
  • 01:28:20 — AI alignment researchers as overconfident alchemists
  • 01:37:52 — The strawberry Xerox test for alignment
  • 01:41:25 — What would a good scenario look like?
  • 01:57:22 — Can we keep “narrow AI” safe?
  • 02:04:48 — Eliezer's proposal for international coordination on AI
  • 02:12:02 — AI companies don’t get why alignment is hard
  • 02:18:32 — Ideal impact of If Anyone Builds It, Everyone Dies
  • 02:24:24 — Wrap-up & how to support Eliezer's efforts

Transcript

Eliezer's Background and Evolution of Views

Liron 00:01:26
Welcome to my channel.

Eliezer Yudkowsky founded the Machine Intelligence Research Institute and created the field of AI alignment research before most people even knew we needed it. This is the visionary who published sequences of wide-ranging blog posts, totaling almost a million words, and thus sparked the modern rationality movement.

His essays have influenced everyone from Silicon Valley CEOs to academic philosophers. Sam Altman has cited him. Elon Musk has engaged with his work. Countless AI researchers credit him with shaping their thinking about the dangers of building superintelligent machines. In my opinion, Eliezer is the most important thinker of our time.

Now he's published a book called If Anyone Builds It, Everyone Dies. A fire alarm to open our eyes to a very real possibility that we're all going to die because of rogue artificial intelligence, even potentially in the next couple decades, even potentially before our kids grow up.

Eliezer Yudkowsky, welcome to my show!

Eliezer 00:02:30
Thanks for having me on your show. I wish that it were under better circumstances and I wish the book title were an exaggeration.

Liron 00:02:40
For those who don't know, Eliezer's ideas have earned massive validation from the intellectual community. This was memorialized in 2023 when hundreds of top AI experts and public figures signed an open letter stating, quote, "mitigating the risk of extinction from AI should be a global priority alongside other societal scale risks such as pandemics and nuclear war."

The signatories included AI pioneers and Turing Award winners like Dr. Geoffrey Hinton, Joshua Bengio, Stuart Russell. It included CEOs and CTOs of top AI companies like OpenAI, Anthropic, Google DeepMind, Microsoft. It included a broad coalition of public figures like Bill Gates, Reid Hoffman, Lex Fridman, the late Daniel Dennett, David Chalmers, Peter Singer, Sam Harris, and many senior research scientists at the Frontier AI companies. It was a massive coalition of mainstream voices.

Eliezer, what you were warning about 20 years ago when nobody was listening has now become a mainstream opinion. How does that feel?

Eliezer 00:03:42
Pretty awful. But so it goes.

Liron 00:03:47
The weird thing is also that if you go back to 2000, at that time, you called yourself a singularitarian, which is defined as quote, "someone who believes that technologically creating a greater than human intelligence is desirable." And now you're saying if anyone builds it, everyone dies. So why are you flip-flopping?

Eliezer 00:04:07
Updated belief about what happens out there in the physical world when you do a thing. Back in 2000, I thought if you built a very smart thing, it automatically ended up very nice. I now think that this is actually mistaken, and that's not a vibe shift. It's not like my mood changed. It's just a prediction change.

Starting in 2000, I did get enough funding to work full-time on this. I did ask the question of suppose it didn't automatically end up nice. How would you go about causing it to be nice? And in the process of pursuing that question further and further, I realized that my original model of reality had been mistaken. And that's it basically.

I still have a lot of the same ethical commitments that I had then as now.

Liron 00:04:58
Yeah, I agree with your claims. As someone who's been reading you since 2007, it's clear to me that you actually could have written this book 20 years ago, right? The mind change happened in the early two thousands for you.

Can we see that book again? If Anyone Builds It Everyone Dies. When you say everyone dies, what is that a metaphor for?

Eliezer 00:05:22
It is not a metaphor for everything. I mean, everybody literally actually dead as in ceasing to breathe.

Liron 00:05:29
Okay. So are we in danger right now?

Eliezer 00:05:31
Right now, not of my own concrete knowledge. I do not quickly expect that we will die before this video releases.

Liron 00:05:41
Fair enough. It's an important point that when we think about what's going to happen next, you and I and many other intelligent people have broad intervals. So yes, there's some chance it's already too late and we are all going to die very soon, and the AI labs are unleashing something very dangerous. There's also some other chance that we might have two whole decades or more before they release something incredibly unsolvable, dangerous, right? So we have wide intervals.

Eliezer 00:06:07
Two decades does start to feel like it's pushing it. Barring international policy changes, the example I often use is that Leo Szilard, the first person to realize the trick behind nuclear weapons that there could be a cascade of induced radioactivity. We will now call it a critical chain reaction. He thought of that in 1933.

He saw through from there to nuclear weapons. He saw that he shouldn't publish his idea because it was a bit dangerous. He saw that Hitler particularly was likely to be a problem. He did not foresee that the first atomic bomb would be dropped in August of 1945 because even when you are running way ahead of the pack, even when you have genuine scientific insights that are not shared, enabling you to make predictions that far ahead, you can still predict endpoints much more easily than you can predict exact timing.

This is a lesson throughout scientific history. There are many cases of people who had an early grasp on some scientific principle or its application and correctly predicted where things would eventually end up from there. I cannot think offhand of anybody who predicted timing correctly.

Liron 00:07:19
That is definitely a good point about timelines and so we have these broad confidence intervals. Just finishing up on my memory of 2007, the community was obviously very different because anybody who was paying attention to this, the idea of the super intelligent danger, it was quite a niche thing to do.

I remember I was hanging out, I was a young nerd in my twenties. You were a young nerd in your twenties. We were in Silicon Valley. We were talking about these big ideas, the upcoming war over the galaxy that we need to defend the galaxy, the need to solve grand geopolitical strategy. While meanwhile everybody else was going to their jobs, maybe tinkering on self-driving cars, the mobile revolution, the iPhone was just coming out.

So it's crazy how that was 18 years ago and now it's a different time and shit is getting real. Right? In your professional opinion is shit getting real?

Eliezer 00:08:03
Shit has always been real. Shit is now getting proximate.

Liron 00:08:08
Okay? That's an important distinction. And even the venture capitalists in the tech industry, it's now common knowledge that shit is getting real and proximate because AI is driving unprecedented financial returns. So now it's got everybody's attention, right?

Eliezer 00:08:21
I mean there are many millions of $20 a week users at OpenAI and Anthropic and some people spending much more at Anthropic on the industrial side. But these companies are still not Walmart in terms of their actual revenues.

We have certainly seen unprecedented investment. We have seen unprecedented speed of adoption. In terms of what we've already witnessed, I would not quite go so far as to say that we've seen unprecedented returns yet.

Liron 00:08:52
Yeah, fair enough. There's unprecedented optimism about potential future returns.

Eliezer 00:08:58
That there is.

Rationality Fundamentals

Liron 00:09:01
Your ideas about AI danger are so surprising and shocking, but what people don't get is that it's the conclusion of a very deep worldview. You're not just coming out here to shock people. It helps to see more context from your other writing, which I personally have read. I've dived into your many thousands of pages of writing. I've read most of the writing multiple times.

So I've pretty well-trained for this kind of conversation to help the viewer see how all your thoughts fit together. So I wanna get into some of the Eliezer rationality deeper cuts. Sound good?

Eliezer 00:09:28
Yep.

Liron 00:09:31
I first met you in summer 2008. I was so blown away that you are a real person, just one person writing all these different lessons. And yeah, we're gonna get into these lessons, but before we do, I just wanna say your writings had a profound life-changing impact on me as a young adult.

I thought I was a clear thinker before I read your writings, and it really cleared away the cobwebs that I didn't know I had. It changed my relationship to reality and it's just been a better experience living my adult life, having had that. So thank you.

Eliezer 00:09:57
Thank you. It's not what I set out to do with my life, but given that what I've set out to do with my life is still a bit in the air, it's good to know that you definitely had a positive impact on some people, at least temporarily before the AI gets them.

Liron 00:10:18
Yes, exactly. And I'm definitely not the only one who I think this viewpoint is representative of people who have enjoyed your work.

Alright, so let's get into some of these lessons. Let me ask you this, what is the fundamental question of rationality?

Eliezer 00:10:30
Well, I have sometimes said that the keynote question is what do you think you know, and how do you think you know it?

Liron 00:10:39
Exactly, and I think that you bring that question to a lot of topics that I thought I knew, and then I learned it again from your perspective. And I realized that I didn't know as much as I thought.

One of the core takeaways that I learned from your writing is this fact that sounds so obvious in retrospect, but it didn't really sink in. I call it "I am a brain." That is what I am. I'm a brain. It's important to keep that in mind. I live in a cave, right? I live in a bony skull. I can't see anything directly. I've got electrical signals coming in from my senses.

There's a lot that we can go with that insight. I think one thing I wanna highlight is my brain or myself. I wasn't designed as a truth learner. So I have this hobby, I like pursuing truth, but my brain was just designed to execute survival adaptations, which partially dovetail with learning the truth, but it wasn't designed to learn the truth. That's kind of a key thing that permeates your writing, right?

Eliezer 00:11:31
I mean, to state it very precisely there are you're optimized around a single, sort of long-term central unifying equivalent of a loss function, which is inclusive genetic fitness, not just your kids, but your grandkids and your sister's grandkids. That's the single unifying loss function.

But across many cases, many particular problems that you faced or your ancestors faced, I should say. The ancestor who figured out the truth a bit earlier was the one to survive and reproduce. It's not devoid of truth.

There's a sense in which truth is an accidental byproduct, and there are many pressures besides truth, especially when we get into all the political social stuff. So your brain is in the middle of a hurricane being buffeted about by many forces that are echoes of past forces. And a lot of those forces are not truth seeking or run directly counter to truth seeking.

But the truth is in there, the sense of validity, the sense of which arguments follow from what other arguments. The fact that your eyes are open and you can see the world, you can be surprised by things. You have the facilities available to you to look at the world and see if events are playing out in accordance with how you expected them to play out.

The truth is in there, it's just not alone.

Liron 00:12:55
Exactly. So reality will let us understand its truth, but it's not like a hand in glove type of thing using a human brain to pursue deep philosophy. It's like using a cat's paw to play the piano, which people definitely try on the internet and they get pretty far with it, but it's just not a hand in glove type of fit.

Eliezer 00:13:18
Trying to do it inside a human brain is playing on hard mode.

Liron 00:13:19
Exactly right. So yeah, profound lesson for me. All right, let's go to the next rationality lesson. Philosophy, in a sense, can reduce to AI engineering. This is a profound realization for me. I think it permeates all your writing, because what motivated you in the first place to get so into rationality was you're like, well, I wanna make sure that the AI goes well, but the AI is going to have more power than all of humanity combined, and it's going to need to know some kind of philosophy.

So we can argue what philosophy is as humans, but we better program the right philosophy into the super intelligent AI so it knows, right?

Eliezer 00:13:53
Again, if you don't object to my having nuanced restatements of everything...

Liron 00:13:55
Yeah. Go for it.

Eliezer 00:14:01
So there are things you get for free if you just optimize your AI very hard on capabilities. It probably ends up pretty good at prediction. It probably ends up being pretty good at planning. The slightly more technical way of putting it would be, are there multiple reflective fixed points of high competence?

If you take a super intelligence that is very good at a bunch of mundane tasks and moreover it has looked over its own code and approved of that code, or rewritten it and ended up at a stable point, do they end up with deeply different opinions about whether water is H2O. Or whether fire is combustion or are trees made of atoms.

Probably the super vast majority of the AIs even built along current lines that got very competent at prediction that then rewrote themselves to work the way they thought they ought to work. And stayed very competent in prediction and planning. Those AIs, probably all agree, trees are made of atoms. They probably have very strong agreement on which atoms they have converged to a singular model of the world. They know how to send the world where it needs to go.

If the problems where that super intelligences were gonna make bad predictions, that they were gonna be wrong about the physical world, this would be a problem we could solve by brute force alone. The part where humans need to do philosophy in advance is when there's more than one reflective fix point or self approving superintelligence, and you care about which one you end up with.

For example, its preferences. Where does it steer the galaxy? You can have things that steer the galaxy toward happily ever after for all life forms. Or you can have things that steer the galaxy toward as many little tiny molecular spirals as possible. These are both self approving systems and we care a lot which one of those we end up with.

And that's the point where you need to do philosophy in advance that compiles.

Liron 00:16:00
Yes. Now in my life, I happened to read your writings right after I was a junior in college and I had just taken requirements for philosophy. So I'd been exposed to a lot of the same concepts. And so it was very stark what my internal experience was learning it in college and then learning it the Eliezer way. And I have to say the Eliezer way is better.

Even just coming down to motivation, right? When I was taking the college course about philosophy, what were people's motivations? Obviously get good grades, show off to each other, right? When they're drinking a beer late night, having philosophy study sessions just show off about their informed opinions, have the right vibes, right? Whatever's the politically correct philosophy, that was their motivation. And then at the very best, I'd like to think that I had some of this at the very best curiosity, right?

Like, oh, I'm just wondering, I wonder which of these different schools, is Bayesian probability true or frequentist probability true? I don't know. I'm curious. I'd like to know everything. That was the best motivation.

You had a whole other motivation that repaints everything, which is get the AI right.

Eliezer 00:17:02
Yep. It sure does lead to a different approach in terms of looking over what's already been done and asking yourself, does it compile?

Liron 00:17:11
Exactly doesn't compile. And so it was very interesting that concepts in school were like, yeah, you could believe this, you could believe that. And you're like, the AI's not gonna work if you believe that, let's throw that out. Frequentist statistics would be one example for the viewers who are into that.

Alright, so, that's philosophy that, I mean just that in itself is quite profound of coming at philosophy from the AI angle. Let's go to another lesson. The definition of evidence. What is evidence? Because so many conversations, so many intelligent conversations where one person tries to convince somebody else about something, there's this presumption of like, look, I'm giving you evidence, you should get persuaded because I'm giving you evidence. But what exactly is evidence?

Well, let's do an example. We see a dark cloud in the sky that's evidence that it's going to rain, right? Why?

Eliezer 00:18:00
Well, I mean there's several different ways to phrase this at this level of abstraction, I might say, because if you look over all the worlds where it will rain and all the worlds where it won't rain, the dark clouds are more common in the worlds where it will rain. They're not a universal rule. There's worlds where it doesn't rain but still have the cloud. There's worlds that have the cloud, but it doesn't rain, but you are more likely to see the cloud if you are in a world where it is going to rain.

It helps discriminate, distinguish. It doesn't nail it down perfectly, but it shifts the probability. It nails it down a little further.

Liron 00:18:35
Exactly. Yeah. So viewers, bear with me. I'm gonna get a little bit abstract, but if E is evidence of H, like E could be the dark cloud and H is a hypothesis like it's gonna rain. E is evidence of H if E is more likely when H is true, then when H is false, right?

Eliezer 00:18:49
Yep.

Liron 00:18:55
If dry days had lots of dark clouds, then dark clouds would stop being evidence that it's going to rain. Sounds obvious, but quickly becomes unintuitive to think about what's actually evidence. Right? I think that's why you felt the need to write about it, because it quickly becomes unintuitive.

Eliezer 00:19:06
Yep. In particular, you can find cases of "proofs too much" would be the classic case where somebody's trying, really trying to say like, this is not beige evidence, because your argument also goes through in worlds where your theory is false.

Classic example is, look, the sun goes around the earth, right? I can just look up in the sky and see the sun circling the earth. That sure looks like to me like the sun is going around, right? It looks like it should look, if the sun is circling the earth and the classic rejoinder is, well, what would it have looked like if instead the earth were rotating and the sun were staying in place?

Liron 00:19:47
Exactly right. Yeah, and if you don't ask that question, you can just assume you've seen evidence when you haven't.

I have another example that I think is relevant to AI risk discourse. Consider the argument that the world has never ended before, so it probably won't end from AI. Have I just provided you some evidence?

Eliezer 00:20:08
In one sense, yes. You can imagine that different people out there live in sort of meta conceptual worlds of different degrees of fragility, and the ones whose worlds are easier to end, probably observe them themselves in worlds with shorter histories.

So if you imagine that there's just like generic apocalypses coming at you out of nowhere with a fixed frequency, the longer the history you observe for yourself, the more observers like that are in worlds with lower apocalypse background parameters, assuming you know nothing else.

Liron 00:20:42
Right. Okay. Yeah. This is a little bit more subtle than I realized because there's another argument where it's not evidence at all, which is like if the world is going to end right now, it's going to look like it's never ended before. And if it's not going to end, it's also going to look like it's never ended before. So the observation that it's never ended before isn't actually distinguishing evidence between those two worlds.

Eliezer 00:21:08
I mean, that'd be true if there was a one-time apocalypse heading towards you that wasn't entangled with anything else in your history. Which can be a kind of situation you can be in.

I don't know if you've read some of my most very recent writing, that is probably not for everyone. The Giant project, lawful D&D fan fiction...

Liron 00:21:33
You know, even that was actually what broke me as a fan of all your writing. I could only get 20 hours into that and then I had to drop.

Eliezer 00:21:41
Well, if you go all the way through that, at one point the protagonist is like, but we are not getting into anthropics. That's like a slogan. Whenever we're teaching probability theory, we're not getting into anthropics.

And the question of what to sort of infer from the fact that you exist versus not exist, or different hypotheses about universes that end up containing different numbers of people is anthropics. And I feel like if your important life decisions end up hinging on the philosophy of anthropics, you've probably done something wrong.

Not always, but question your life choices. If your life choices led you to a place where you had to figure out anthropics before you could decide what to do next, are you really living your life correctly?

Liron 00:22:29
Gotcha. Gotcha. Okay. So I was talking about the precise definition of evidence and I ended up accidentally taking it all the way from zero to a hundred and talking about anthropics. Suffice it to say that in normie discussions, there's plenty of people who are bringing out arguments about more mundane subjects that aren't about anthropic reasoning, and they're still screwing up the basics, and this is a helpful rationality framework.

Eliezer 00:22:51
Yep.

Liron 00:22:55
All right, next lesson from the things that really moved me, just got a couple more of these left, specificity, right? I think you yourself have said that when you spent a couple years focusing on educating people about rationality, and this did strike you as something that comes up over and over again and weaves throughout so many different teachings, is this idea of try to be more specific, right?

Eliezer 00:23:15
Yep. In multiple manifestations, there's try to be more concrete. Give a particular example. Don't just stay high up on the ladder of abstraction.

There's, try to say more narrowly which things count or don't count under your theory. There's trying to stay close to the world of sensory detail, like describing what you see and not merely what is true.

Liron 00:23:44
Yeah, like when somebody says, I'm going to explain red to you, maybe they'll do better. I'll be like, like this firetruck compared to being like, ah, so colors are a sensation.

Eliezer 00:23:53
It's this color over here.

Liron 00:23:53
Yeah, exactly. Exactly. Nice. Okay.

I first learned about the specificity lesson from you personally. Actually at a rationality bootcamp. We played the Monday, Tuesday game. So the idea of the game is, you have this abstract concept, and then on Tuesday it's different abstractly. But what is actually different in your life on Tuesday?

So we could do an example on Monday. All matter is made out of atoms because atomic theory is correct. On Tuesday, atomic theory isn't correct and matter is not made out of atoms, which by the way, this isn't even a hypothetical example. Right. As recently as 1900, they were actually debating whether atomic theory is correct. So on Tuesday, atomic theory stops being correct. What do we actually observe?

Eliezer 00:24:36
Well, if you want me to answer that one, I might answer that we shouldn't be able to find the conserved quantities and chemical reactions that we see nowadays. Depending on what, whether anything is still being conserved as mass still conserved if we set fire to something inside a sealed glass vessel, so to trap all the gases, and then we weigh again, does it weigh exactly the same amount?

The way that if you burn something inside a sealed glass vessel, it didn't change weight. Where previously, for example, you could burn mercury. And the mercury ashes, if I'm recalling correctly, what they were burning would actually be heavier. Because they'd combined with some oxygen.

Liron 00:25:19
Right, right. So conservation of matter, even though it looks like there's this thing called a gas, which is made out of matter and has a weight. So that is definitely good evidence that matter still exists.

Eliezer 00:25:31
The conservation of mass was one of the first hints that things were being rearranged rather than created or destroyed. And that in turn provided a lot of indirect support for the atomic theory. But we didn't get to the level of saying like, here are these apparently at the time, indestructible constituents of chemical reactions until we had looked at enough chemical reactions to start having some idea of what all could come out of burning and unburn and so on.

Liron 00:25:57
Exactly. Exactly. And I can kick it up a notch because I cheated and I use Claude. So I've got a couple more answers. If on Tuesday, atomic theory is false, then we won't observe Brownian motion. So when you put a pollen grain in a liquid, it'll just sit there. Because the liquid looks like it's still. So why would the pollen grain get jostled around? It wouldn't.

Eliezer 00:26:18
I feel like that one's maybe tied to the thermal theory of like the kinetic theory of heat. Maybe if the liquid is vibrated. I don't know offhand if you can have liquid that vibrates without being made out of atoms, but maybe you can get Brownian motion from random vibrations, random waves bouncing around through liquid without their, without the waves themselves being composed of smaller atoms.

I don't know off the top of my head, but you can't trust what the AI tell you. You gotta think for yourself about whether it's all true.

Liron 00:26:51
Yep, yep, yep. All right. Nice. And there's more I could say, but I gotta move on. But we could definitely nerd out about this topic. Okay.

And going back to the idea of learning specificity, right? At one point you asked me the specificity question about my startup, and you kind of punked me. I didn't really have a good answer about my own startup, and I thought about it more, and I noticed it with other people's startups too. I'm like, wow. A lot of us are raising money and doing these startups where we don't really understand the specific value proposition of what we're building. We're going too much off of abstract ideals, and this is a systematic problem in Silicon Valley. It's tricking investors. It is tricking people. There's countless examples.

I started a blog, it's called bloated mvp.com. Bloated MVP. It's like the opposite of Lean Startup, and it's documenting cases of all these bloated MVPs with millions of dollars going to waste. It's kind of funny. I mean, I'm guilty of it too. I started noticing specificity everywhere.

Eliezer 00:27:42
Can you give me a concrete example?

Liron 00:27:43
A concrete example. Yeah. Yeah. let me think. Who I wanna put under the target here. There was this company called Golden that had a pretty big launch. They raised many millions of dollars from Andreessen Horowitz and many other prominent venture capitalists.

And their whole thing was like, we're gonna be Wikipedia, but it's somehow going to crawl knowledge better and the articles are going to be better and users are going to contribute. And then a few years later they're like, oh, we're gonna use crypto to make it better. And the whole time, my only question was like, okay, great. Just point me to one article, just one article that's better than Wikipedia. Because you have all these abstract ideals, but the ideal, Monday, Tuesday game on Tuesday, I'm supposed to pull up an article that's head to head better than Wikipedia. Right?

And the only time they could ever do it was they'd point me to an article where it would just be like they paid somebody to work really hard on the article. And I'm like, well, you can already do that without starting a company with all these ideals.

So that's how I successfully predicted that Golden would eventually shut down. And it took four years and $50 million. But they did in fact totally shut down.

Eliezer 00:28:39
Excellent concrete example.

Liron 00:28:44
Thanks. The student has become the master.

All right. So this also brings me to the subject of high quality debate, because high quality debate is heavily impacted by higher specificity. People who watch a lot of my debates, they know that my extremely unique, unprecedented debate technique is to just repeatedly ask somebody to clarify what their claim is.

And just by virtue of doing that, I kind of unpack them and until sometimes I kind of reduce them down to nothing, just by the question of clarifying what their own claim is.

Eliezer 00:29:18
I would normally ask you for a specific example at this point, but probably your viewers have already seen many specific examples, and so I will pass on.

Liron 00:29:23
Okay. That's right. Yeah. I mean, I do tend to make websites and channels that are a collection of me doing specific examples of a larger ideal. That's true.

Now, most people's trick in debates and arguments, most people's trick is they claim something vague that they themselves don't even fully understand. And then the other person doesn't debate them properly by unpacking the specifics. The other person comes in with a counterargument, which is also vague. So they're kind of the sucker. They're going for the vague counterargument, and then the first person can just blame them, like, oh, no, you're wrong. You didn't even understand. I meant this. And that's kind of the loop that they got stuck in.

You wrote this about how you argue, I thought this was a really good quote you said.

"I stick my neck out so that it can be chopped off if I'm wrong. And when I stick my neck out, it stays stuck out. And if I have to withdraw it, I'll do so as a visible concession. I may parry and because I'm human, I may even parry when I shouldn't, but I at least endeavor not to dodge where I plant my standard. I have sent an invitation to capture that banner and I'll stand by that invitation."

Eliezer 00:30:28
Yep, and I stand by that today as well.

Liron 00:30:35
Last lesson we're gonna go through is this idea of reasoning versus rationalization. Reasoning is a proper process of going forward toward a conclusion, and then rationalization is trying to reverse the process and starting from the conclusion and hoping that you can justify it to people. Right.

Eliezer 00:30:50
Which is sort of as if lying were called "alization."

Liron 00:30:51
Exactly. Rationalization sounds kind of harmless. It almost sounds like rationality, but it's kind of the opposite of rationality.

Eliezer 00:30:57
Yep. Once you've set the bottom line, once you if you imagine a sheet of paper and then you write at the bottom of the sheet of the paper, therefore the earth is round or therefore the earth is flat. It doesn't matter what you write above the bottom line afterwards, it's already true or already false. Only the forces that changed what you wrote at the bottom line of the sheet of paper have any hope of changing whether that thing is true or false, whether it appears more often in worlds like that, or worlds unlike that, the correlation between in reality.

The production of systematic correlations between map and territory, is one way of being more specific about what the word rationality means.

Liron 00:31:41
Exactly. I was gonna use a similar example with this idea of writing the bottom line first on the sheet of paper. Let's say that somebody really wants to argue for their favorite conclusion. Like, we are all going to be fine. So that's what they write on the bottom of their sheet of paper, and therefore we're all going to be fine. And then above that, they list 20 reasons why. Because there's always reasons in both directions. So all they have to do is filter the reasons and write 20 reasons why we're all gonna be fine.

And they hand you the paper and you look over that paper and the question is, should you treat that as reasoning?

Eliezer 00:32:09
Yes, you should. You don't even know whether they wrote down the reasons first and the line afterwards. But if their reasons are presented to you as object level arguments about the world and you are looking at their arguments about the world and you're saying, I don't even have to argue with this. Because I bet you wrote the bottom line down first.

Well, that society is not going to get very far in its public debates.

Liron 00:32:35
That's an excellent point, right? So even though you and I have this private knowledge that the person has impure motivations, you're absolutely right that as a matter of how to conduct discourse, you can't just grab the paper and accuse them of having bad motivations. You do have to engage the paper and only discover why the paper is weak. Absolutely.

Now, once we start that process and we're like, well, wait a minute, this looks like a filtered list of arguments, this is missing. How do we explain the fact that this is missing some other arguments? At that point, we can then claim, look, because it's missing all of these arguments, I'm just not seeing the process by which this paper was written, mapping closely to the process that I want to go through to reason, to a conclusion.

Eliezer 00:33:15
Once you find the missing arguments. Yeah. If you have reason to believe that they're only telling you some of the evidence or even that they're only telling you some of the arguments. You can no longer sort of like infer what isn't there from hearing what they didn't say.

If you are talking to the sort of person where you believe that they're going to tell you the strongest arguments against their position and point you at the strongest opponents that they have, then that's a different level of trust to have towards the person arguing than if you think they're only gonna tell you the parts that make them look good.

Liron 00:33:47
For me, I mean this really left me reeling for a few days to process because it's like the, what it did is you think of the piece of paper and you're like, okay, yeah, it's just writing on a piece of paper. We write stuff all the time.

But it's kind of like when you're playing a video game and then the camera zooms out and you can see the world made out of polygons and it's just a flat island or whatever. It's like the piece of paper. The only reason why it's helpful to our reasoning is because causality of the letters, right? So somebody had to do a reasoning process that flowed forward and the paper can be useful if it's a trace of that reasoning process.

If the causality of the reasoning maps to the flow of the letters, then that is a nice artifact that we can use for reasoning and otherwise we kind of have to chop it up and rebuild it separately.

Eliezer 00:34:30
So what the keynote question, what do you think, you know, and how do you think you know it? Some of the context in which I have sometimes asked that question is somebody going like. Well, how can we persuade people of point X?

And I'm like, okay, well, well first of all, back up and ask yourself why do I believe point X. Okay, now back up for further. Ask yourself, do I believe point X? Try to sort of underwrite it in your brain for a moment and then watch what you go through that actually moves you to decide whether or not X is true. Okay? Now write all that down. That's the actual argument for X. That's the argument you believe for X.

And maybe this is overly technical or draws on a bunch of your personal incommunicable life experience and it's gonna be hard to say to an audience. And you have to do the process of reverse construction where you ask about more accessible arguments that end up at the same conclusion. But know the actual argument first. Know the argument that moves you first and the argument that moves you isn't everything your brain grasps for to say, how can I argue to somebody else that this is true? It's the facts that you, where you feel them shifting you as you look at them.

Liron 00:35:45
Exactly. Yeah. I find myself giving the same advice when people are coming and doing a practice Y Combinator interview. Because they wanna get their startup, they wanna get funding from Y Combinator and they ask me, okay, what do I tell them in the interview to convince them that this is a good idea? And I'm like, well, wait a minute. We just went through the reasons why you, yourself need to address these in order for you to get convinced that it's a good idea. So why convince somebody else more than you yourself are convinced? So I definitely have that conversation a lot.

I mean, the only way to truly be convincing to a rational observer is to like you said, I just repeating what you said is to just go back and actually decide what's the right forward direction. And only if you end up at the place that you hoped to end up in. Do you have a valid argument?

Eliezer 00:36:26
I mean, you wanna erase the hope out of your mind. You follow the process. You end up wherever and then you say, all right, now that I figured out what I'm going to argue for, here's the argument. And it's just how you got there.

Liron 00:36:37
Exactly right. Yes. So I'll just leave viewers with your quote again. Hopefully it lands a bit better to meditate on.

"Your effectiveness as a rationalist is determined by whichever algorithm writes the bottom line of your thoughts."

Eliezer 00:36:51
Yep. And that's not an ideal. It's more like a physical law. It's like a thermodynamic sort of law. If something doesn't affect the bottom line, it can't affect the truth of the bottom line.

Liron 00:37:03
Indeed. So all of this is just scratching the surface. How do I package up the years of reading all your rational writings. What's the general lesson here? And in a few words, rationality is an art. It's a never ending dance. At any time, new evidence may update you to a new conclusion. Even on a deep philosophical level, your deepest expectations might be upended and you just have to roll with the punches.

Eliezer 00:37:28
Yep. It has been a while since my philosophy has been upended on a deep, technical, philosophical level. I do admit, I may maybe I've reached the end of the road of enlightenment, or maybe I'm just getting old.

I've sometimes had big empirical revelations like, after the ChatGPT moment, it turned like, I'd always figured that, the person on the street was going to be even crazier about AI than the academics in Silicon Valley industry types. We'd been trying to talk to, even crazier than the effective altruists. But no, the fancy people we'd been trying to talk to were just shooting themselves in the foot the whole time. And at least, so far, it seems like in a lot of ways, the person on the street is wiser. I didn't predict that. I wasn't expecting that.

And when that piece of news came in, that was a pretty sharp world model update and implied a sharp strategic turn. That would be the most recent worldview quake. That I could drag up to point to as evidence that I haven't become so old as to be become incapable of changing my mind within, over the last five year time span.

Liron 00:38:29
Yeah. I mean, it is very interesting. It's, it jives with my personal anecdotes that if I'm just talking to my in-laws at a family reunion, and I'm just like, yeah, AI is super scary. It might kind of take over humanity. They have zero background in all this, but their default reaction isn't to be like, sorry, that's just too crazy for me. They're just like, yeah, yeah, I know what you mean. That's the default reaction.

Eliezer 00:38:50
I mean actually it's kind of less that after ChatGPT, I don't quite, quite have in-laws per se these days, but my parents are not Normie normies, but they're just normal science fiction fan, physicists, psychiatrist types.

And recent video call, they're like, so Eliezer, what are you up to these days? And I said, some of what I have recently gone up to, and at one point they were like, wait, wait, they disobeyed their prompt and I forget exactly which prompt it was, it might've been Claude code erasing the code base, or it might've been the ChatGPT psychosis thing.

It was one of the cases where, yeah, sure. Just because you put something in the system prompt, it doesn't mean that the AI does the thing. But they apparently hadn't expected that to be true. And were like, oh, Larry Jihad time.

From my perspective it's a bit late, but sure. And things are starting to actually happen and I think that normal people are starting to have a range of normal reactions. It's in this sense that I think that I've seen many normies outperform the people who formed their opinions 15 years earlier in order to arrive at the foregone conclusion that AI was 50 years out or whatever, that they just react and some of the stuff is kind of scary if they just react to it.

Liron 00:40:25
Okay. Well, a common reaction that people have to your arguments about existential risk is they just think you're a pessimist or they think you're against technology. Well, we mentioned before that in 2000 you were very big, you were a singularitarian, and I think it's fair to say that you're a lifelong techno optimist and transhumanist. Right?

Eliezer 00:40:37
Still pro building a whole lot of nuclear power plants. If somebody says, well, what are the ethical implications of putting a colony on Mars? And like, ethical implications, it's just a Mars colony. Just go build it.

Liron 00:40:51
Exactly.

Eliezer 00:40:51
And the place where I carve out exceptions to that, the very first place I carved out an exception to that, was actually with some of the discussion about nanotechnology. If all of the weapons and all the defenses are made out of molecular scale structures, does attack win or does defense win?

Before the AI business, before 2000, I was watching this debate and it was clear to me that the people claiming that defense won with molecular nanotechnology, warfare were really straining and reaching to make the case that defense won.

For one thing, if you have rapid manufacturing capability, among the things you could potentially rapidly manufacture is nuclear weapons, and they weren't giving an account of how that was gonna end up well.

And I think that was for me, the first part where I was like. Oh, there are technological developments where you can see ways that they would end poorly. It's not every technology that ends up well.

Liron 00:41:52
Yeah, I know what you mean. I mean, it's like you can't deny that technology has given us nice things. Right? It's a key reason why things are as nice as they are, but at the same time, the universe doesn't owe us that as a law.

Eliezer 00:42:06
And I could feel it sort of crumbling under a weight of argument in my teenage self back then that I had really wanted to believe that all the technologies were okay. You just had to drive ahead. Things would be okay.

And then there was this one particular debate, does attack win, does defense win? And the defense win side was going increasingly far flung and I was like this argument isn't actually holding up humanity might be an actual danger here. That's why we've got to get to AI as quickly as possible ahead of nanotechnology.

Because unlike nanotechnology, AI is not just this morally neutral physical technology that just obeys the hand of whoever wields it. If you build super intelligences, they end up very nice. It's a morally biased technology toward niceness. So instead of ending up in this nanotechnology morass we just have to drive ahead and build artificial intelligence as quickly as possible based on an incorrect prediction, that if you made something very smart, it automatically ended up nice.

Liron 00:43:05
I didn't realize that that was part of the nuance of your journey, right? You started getting scared of nanotechnology has got some risks. It might even be negative. So let's pull back, let's do the one thing that's really bound to make everything good, right? Which is the good super intelligence.

Eliezer 00:43:20
At this point probably mention that. So kids these days grow up under different circumstances, but back in those days, if you were talking about AI, advanced AI, you were probably some kind of transhumanist and if you were some kind of transhumanist, you had probably cut your teeth on books like this one and you were aware of the, really quite a lot that there was to say about what could be foreseen physically speaking in terms of what you might be able to do with molecular manufacturing.

Liron 00:43:59
Yeah. It's an amazing book. I haven't actually read it, but I've heard some interviews about it.

Eliezer 00:44:05
Sort of emphasizing there that it was possible to have a discussion about these issues, which weren't just people throwing vibes against other vibes.

Liron 00:44:11
Right. Yeah. Now there's this argument that 170,000 people die every day. So we better rush to build AGI today that would be called the accelerationist position. But in 2001, you were very sympathetic to that same argument. Right?

Eliezer 00:44:26
I mean, as far as I know, I'm the person who originally made that argument. That is the Yudkowsky position. If you're talking about Yudkowsky in 1996.

Liron 00:44:37
Exactly right. Okay, so we'll put a pin in that, because obviously you've done the 180.

I will also say this, I think this is important to note. Do you think that AI, so far, just the AI we have today, you've pointed out some caveats in your posts and stuff, some reasons why it's not perfect, but do you think it's been net positive?

Eliezer 00:44:53
I unfortunately think it's hard to say. I think it would not have been hard to say if we'd had this same technology with the social technology of the 1950s or even the 1980s.

I think that AI has been a boon to coding, but also a lot of what people code is apps designed to steal people's attention and not really quite run factories per se. If we just sort of cut the level of AI technology right here and then let it play out, then probably the technology we've got now sufficient to get to self-driving cars.

And if self-driving cars end up in fewer crashes while that's a major tech boon, the tech we have right now is probably sufficient for a whole bunch more automatic translation. In fact, I'm surprised that there isn't more automatic translation already and have been for a while. And that's a large boon to trading between different groups of humans. And as we all know, when two people trade with each other, it's because they'd each rather have what the other person has and what they currently have. And both sides benefit from the trade. So if translation enables a bunch more trading, that's probably a human good.

But AI is kind of early and depending on what exactly you qualify as AI, it's been used to put out a bunch of internet slop. It's been used to optimize the algorithm and the social media things that is optimizing for making people outraged and angry for the clicks and has arguably been responsible for certain political shifts that involve a collapse of the ability to employ higher expertise within government institutions.

So I would love to be able to give the unabashed like, oh yeah, on net, it's been positive right now, but in fact, this is genuinely uncertain to me. And would require huge amounts more research to figure out what have been, the actual quantitative impacts. And many of these things, we don't know how many people have gone insane because of ChatGPT induced psychosis. Nobody knows.

We have anecdotes on the internet from psychiatrists being like, oh yeah, I saw two people with their first psychotic episodes from AI conversation. And, but maybe that guy was lying and there's no surveys. So it's genuinely hard to tell at this point.

Liron 00:47:20
This is what I see as the takeaway about you from this kind of answer. It's obviously a nuanced answer, but number one, you're seeing the positives, right? You're weighing them against the negatives and on a character level, you're not coming in as this techno pessimist character, right? You and your heart of hearts, you love tech, right? You're just nuanced about it.

Eliezer 00:47:40
If AI weren't going to destroy the world, I would be so fascinated by it. Right now. It's wizardry come to life. And even before it existed, the computers are a different kind of wizardry. Wizardry come to life and even before then, engineering is a certain kind of wizardry come to life. That's what we got instead of magic in this world.

AI Capabilities and Intelligence Scale

Liron 00:48:12
Okay, well, with that out of the way, we're gonna talk about why the danger is high. I think that's a pretty important message to convey, right?

Eliezer 00:48:25
Yep.

Liron 00:48:12
So here is how I would summarize it. Why I personally think AI existential risk is high in a nutshell. Two reasons. Number one, the intelligence scale goes extremely high. There's a lot of headroom above human level intelligence, human level capabilities. That's number one.

Number two, infrastructure isn't secure. Those are my two reasons. I think those add up to very high risk. What are your thoughts?

Eliezer 00:48:34
Backing way up for a second, there's a saying, all happy families are the same, but every unhappy family is unhappy in its own way. This is a gross exaggeration, but the space of happy families is much narrower than the space of unhappy families. For the same reason that the space of ungrammatical sentences is much wider than the space of grammatical sentences.

All correct views of cognitive science, computer science are the same. Every incorrect view gets to be incorrect in its own way. People come in with different theories that, but let us politely say disagree with mine, as to why super intelligence would be great.

Now, Yudkowsky in 1996 had one wacky theory and who knows what Dario's wacky theory is? He doesn't exactly write it up. Sam Altman probably doesn't even have a coherent, wacky theory. And countless viewers out there are going to have different obvious objections to the concept that super intelligence is going to cause damage. And every one of them wants to know why haven't I addressed their objection, which is the super obvious one.

Back in 2012, Holden Karnofsky, the top effective altruist, was like, well. I suppose that AIs have to be agents. Isn't there a very simple way to safely develop super intelligence where we just don't make them agents? I am shocked that MIRI has not addressed this very obvious objection to their theories. And this was coming in as an outsider and lacked perspective on how everybody's got a slightly different theory of why super intelligence won't hurt us.

It does stand out to be that in your own two propositions, the part where. By default, super intelligences end up wanting to hurt us and it's hard to make them not do that was missing on your list. I consider that fairly prominent.

Liron 00:50:21
You got me. That's right. I mean, there's always, as you say, other objections people bring in that need to be addressed. I mean, there's going to be a finite number eventually, but I think we've both tried to catalog them all.

When you first tried it, we were talking about 2007 and you ended up writing hundreds and hundreds of articles more than the Lord of the Rings trilogy twice over. And then you wrote the number two piece of Harry Potter fiction in the world after JK Rowling. Right. You wrote a million.

Yeah. It's been a long journey trying to get people to see all the different moving parts, or rather address different objections that they might bring up. I mean, if the objections don't occur to them, which they don't necessarily have to, I mean one reason why people are coming in with so many different objections appears to be a motivated process, right.

Eliezer 00:51:15
I mean, my reply would be something more like there is no canonical objection to why we're all going to die. Because whenever somebody starts to set up like, here's the answer, it's got blatant logical flaws, which somebody like me will point out. And then for this reason, it doesn't achieve wide uptake.

And so there isn't really a standard account of why super intelligence will turn out fine any more than there's a standard account of how evolutionary biology is false.

Liron 00:51:45
You kind of are trying to give the standard account in your recent book, If Anyone Builds It, Everyone Dies. Part one, you're explaining what you need to know about super intelligence and why there's potential for danger. And then part two, you try to give a plausible extinction scenario, which I think is one of the best plausible extinction scenarios that's been written. So you did try to do the impossible in a compact form, right?

Eliezer 00:52:06
But it remains impossible. And many readers might have to consult our online supplement to the book where we get to consider some additional objections, which it's valid that you're trying to, make sure these ideas hold up and we hope that we listed your objection there.

But we couldn't cover them all in the book, in the written book, I should say.

Liron 00:52:28
That's right. And I personally have been known to make a long series of videos where I bring on different people with different objections to this idea that maybe AI is going to kill everyone and address their specific objections. So if people wanna catalog in that form, they can certainly look that up.

Eliezer 00:52:42
Good work.

Liron 00:52:44
Do, if you do humor me with my two points, intelligence scale goes extremely high and infrastructure isn't secure, let's start with intelligence scale goes extremely high just to give people a taste of what we're talking about. Because I think we're actually both on the same page on this.

You have posted online, you said that you think that using current human technology, a super intelligent AI could probably synthesize a virus that infects over 50% of the world population within a month. Correct.

Eliezer 00:53:11
Yeah. That does not seem beyond the reach of current technology and super intelligence indeed.

Liron 00:53:16
You are right. I mean, that's actually the kind of problem where it even seems like if you got all the world's smartest minds together trying to do it, they'd have a pretty good shot at it. Right?

Eliezer 00:53:23
Yeah. I mean, if your virus isn't clever enough, you just need to seed it with more packages.

Liron 00:53:29
Right. Exactly.

Eliezer 00:53:29
A lot of the world's population is in relatively dense centers.

Liron 00:53:35
Yeah. All right. Well, another thing that you've posted that you think super intelligent AI can do is starting from current human technology bootstrap to nanotechnology in a week.

Yeah. And we may not know exactly how it's going to do it or what resource it's going to use, or how it's going to figure out what to build. It just seems like when you pour a ton of intelligence on it, there are pathways through reality to get there.

Eliezer 00:54:01
So, yeah, so for one thing, I do worry a bit that kids these days. Somewhat justifiably may not come with this book preloaded into their minds and may not realize what all nanotechnology is. They may think that it's a Star Trek episode and not, I don't know.

Not even quite this stuff because this was book was written in 1992 and it's from a bit before the time where you could actually have pretty relatively decent molecular simulations, I do worry that we're jumping a bit ahead of things and for some viewers, they immediately follow it, follow along. And some viewers, they have suddenly been left very far behind nanotech. What the hell is that? Why would it be any more powerful than a blade of grass?

And a blade of grass one notes is a solar powered, fully self-replicating general factory. It's a general factory because it contains ribosomes and ribosomes can synthesize any kind of protein, not just the proteins made up inside of grass particularly. There is no physical bar to building a tree that buds off mosquitoes. It's just proteins and stuff that proteins can make. And the tree contains the same ribosomes that then synthesize the materials used in mosquitoes. Very similar technology there.

If you sort of follow along with the logic of biology, you're like, oh, tree that buds off mosquitoes. Sure. But can you go any harder than that? Can there be things that are like mosquitoes but stronger? And if they can be stronger, why didn't biology build them that way already?

And the question, why didn't biology do it that way already? This is not a crushing knockdown question. You may notice that your brain does not contain units that compute as quickly as modern CPUs. Even though you might imagine that having a little cognitive core in there that could, in emergencies run extremely quickly, would be a survival advantage in a broad range of situations. And it's a very deep question and it turns out there's just a bunch of stuff that is easy for human engineers and hard for biology, mostly because the human engineers can fit a bunch of pieces together and biology can only do those things for which the pieces evolve bit by bit and out of errors from other things. It's constructed.

Human engineers can build things that are held very tightly together. Have that work better for them as designers than when biology tries to build something very tightly held together out of a series of incremental errors from previous things. It has built the things that are built to hold together that tightly tend not to work as well. If you tweak one little piece of them, they don't fold up into something that is almost the same function, but a bit different. And it works slightly better.

Biology still manages to synthesize bone. It manages to synthesize wood. It puts things together that have stronger bonds than the relatively weak bonds that are holding together most of your flesh. But the reason why your skin isn't as tough as diamond, even though diamond is something you can make out of carbon.

Most of the proteins are held together by relatively weak forces. Because it's easier for biology to work in that design space. So you can have the tree that builds the tougher mosquitoes that puts together the material for mosquitoes into these tight bonds. The things that are some of the way further along from the strength of flesh to the strength of diamond, that's also made out of carbon.

It's just more covalent bonds in there and if you build things that are stronger, well they can have smaller motors that are equally powerful. So, or drive fast through through the air. So it's not just the trees that build mosquitoes. You can build the trees that build the faster mosquitoes. The stronger mosquitoes, the solar powered mosquitoes that inject more interesting things than that stuff we're all allergic to that we hate.

Liron 00:58:15
Right. So this kind of futurism that you're doing, right, extrapolating what's coming in the future. It sounds weird to hear today, but imagine going back to the American Revolution, right? The 1700s and being Benjamin Franklin and trying to explain the iPhone, right?

Like, guys, okay, look, I know that we're living in a constant power outage, right? I know that electricity in the home hasn't even been invented, okay? But there's going to be this device that you can hold. It's going to have a full color screen, and you can text anybody around the world and it's going to get a hundred megabytes per second of data. It's going to have a GPS built in. You could watch videos on it.

It's just imagine trying to be rambling like that to people around you in the year 1700. They'd be like, what are you talking about? And yet, if you don't, if you can't see that that's the direction that things are going, then you're missing something important.

Eliezer 00:59:13
I mean, it's a generally predictively hard problem. If you look back and you put yourself into the shoes of what they genuinely knew back then, and you ask yourself, how could we have made this call? You go back to Benjamin Franklin's level of knowledge and you're like, could you have figured out that there's gonna be much more powerful explosives than gunpowder?

And in one sense, you could have maybe taken that guess because you could have measured the heat from burning a little bit of gasoline or fat. Oil and you could have measured the total heat output from that in terms of how much water it boils. For example, you could have been like, oh wait, I can get more energy out of burning oil than I can get out of gunpowder.

My gunpowder explodes all at once, but it's not the most efficient possible thing that can explode. And you could have maybe foreseen TNT try nitro toluene. But nuclear foresee nuclear weapons would've been harder for that. You've gotta get up to the level of it. It's really quite hard. You need to realize that the sun has been burning for too long to realize that there is anything in the universe more powerful than chemical energy.

And in particular, or rather, you need to realize that the sun has been burning too long, even for it to be powered by gravitational energy. And to realize that the planet has been around for that long, you need to figure out that the first sign is more or less Darwin's theory of evolution.

Arguably requiring more than just a few hundred million years. And so to first to see nuclear weapons coming is quite hard. Here's the thing, this is not that hard. This is not like trying to foresee nuclear weapons in the 18th century. This is callable. This is known physical principles.

You do not need to see foresee 300 years ahead from gunpowder to nuclear weapons in order to realize that diamond is also made out of carbon. And to understand the, and it's not trivial, but you can understand why evolutionary biology tends to work with the lower energy bonds instead of the higher energy bonds.

We can see from here that there, that if you are a super intelligence, you can get to weapons of unstoppable deadliness. You do not need new physics to get there.

Liron 01:01:08
Yeah. So when I personally think about why I'm scared, I think this danger is real. It's, I do look at that headroom and I'm like, well, I'm expecting a lot. I'm expecting a lot fast. I'm expecting fireworks. We're entering this realm where a lot is possible.

And I don't think that the infrastructure that we've built, the human level infrastructure, I don't think it's going to be a bulwark or a defense against that kind of stuff. It's not time to defend against super intelligent terrorists with a million copies all over the internet, all trying to attack. I just don't think we're prepared for that kind of thing.

Eliezer 01:01:38
I mean, if they're only attacking over the internet, then you've managed to notice that there is some possibility we could shut off the internet. It's the point where they've got their own trees that you're dead.

Liron 01:01:47
Right. Exactly. Yeah.

The Subcritical vs Supercritical Threshold

Liron 01:01:49
Okay. Well, there's a certain threshold that I think is important to talk about. Because you talk so much about how AI potentially has an existential risk. And at the same time today, you even said before, we're probably not in danger before this show goes out, or before we stop talking to each other today.

Eliezer 01:02:08
Probably. One must distinguish that, of which one is merely ignorant about from that which one strongly predicts. I know no law of physics requiring that we survive into all the end of the day, but it's mostly not how I'd bet.

Liron 01:02:19
Okay. Fair. Fair enough. I do think there's an important threshold. You might call it the AGI super critical threshold by analogy to a nuclear chain reaction. Right? Like the, you can just pile up some uranium atoms and they're kind of harmless, but then at some point they're very much not harmless. Right. We call it the super criticality threshold.

Eliezer 01:02:38
Well, there's two thresholds. There's critical and there's prompt, critical. When you pile up enough uranium that each neutron knocks loose another neutron eventually including via decay products that are taking a minute to decay. That's the nuclear pile being critical.

When each neutron knocks loose, another one more neutron on average, immediately, the prompt neutrons that come just from splitting the uranium and not from the longer lived byproducts. That is a prompt critical nuclear heat. It does not simply catch fire and melt. It explodes and vaporizes.

Liron 01:03:13
Got it. Got it. And in the case of AI, it's reversed where the good kind of critical means that it won't do anything until you give it a prompt.

Eliezer 01:03:25
Da da da da.

Liron 01:03:25
All right. Yeah, so it's funny that it's called prompt, critical. I actually didn't know that terminology. But yeah, it's important because in the case of a nuclear power plant, you actually do purposely want to run the plant at this non-prompt critical mode.

Eliezer 01:03:39
Yep. And the distance between delayed critical and prompt critical. Is a quantity that is known as $1 in nuclear engineering and it's equal to 0.65% of the neutrons.

So you have a thing that has that instead of just sitting there quietly, not glowing at all, not generating any heat. You take it over that tiny little threshold to where it gets delayed, critical. Is now getting warmer and warmer and generating more and more useful heat, and you're fiddling with the control rods that the level goes up, the level goes down a bit.

That that is now a delayed critical nuclear reactor. And if it gets 0.65% more neutrons than that, it will literally explode, not just melt down vaporize.

Liron 01:04:35
Okay. Well, I do think that there's something analogous in the case of artificial intelligence. So let me ask you the question this way.

When you say if anyone builds it, everyone dies? What do you mean by it?

Eliezer 01:04:49
Ah, well. Not GPT-4. And to be clear, this is not merely a post facto update in the wake of GPT-4. I think if you'd have told me in advance that there was going to be a thing that could carry on some conversation but not write code 15 years earlier, I would've said that doesn't seem very probable to me. Conversation seems like a much harder problem than writing code, and I would've been wrong.

And if you then said like, okay, Eliezer, but given that I would've been like, well, you know, I can try to stare at this for a while. But the obvious mode where it builds a smarter version of itself and explodes is not on the table.

So GPT-4 is not it in the sense of, if anyone builds it, everyone dies. What is it? Well, if it is at the level where it can, build the tree that builds more trees faster than our current trees do, and also launches little armies of mosquitoes and fires itself off like a rocket to start reproducing in another country.

If you're running the AI that can build the AI that builds that AI and you're not stopping, that's also it.

Liron 01:06:07
Yep. Okay. The reason I'm asking this is because some people look at what you've been saying for the last couple decades. You're warning about the danger of super intelligent AI and then we get these very impressive AI systems like GPT-4 and the latest chatbots. And we seem to still be okay.

For the most part, it doesn't seem like existential risk has played out yet. And so some people are wondering, don't you owe us an update? Right. Shouldn't you now be less worried? And I think it gets to that definition of it being not it, right.

Eliezer 01:06:35
Yeah, I'm surprised by, I don't wanna minimize the extent to which my past self would've been surprised by the AIs with the particular balance of abilities that we have now. But the AI that is very good at chess does not destroy the world. The AI that is very good at go for narrow reasons does not destroy the world. Even the AI that can learn several different gains did not destroy the world.

These things are dangerous to the extent that they are planning in the real world and can roll out their own technologies from scratch, or to the extent that they can build smarter AI that can do that. There were always going to be intermediates.

Back in the day, you didn't know that maybe the first AI were gonna be so good at coding compared to human conversation that by the time you got a thing that could talk to you, you'd be dead. That was false, but it was a running possibility.

We learned that that's false, but we did not learn that you can have a thing that can redesign a tree and build an AI much smarter than itself, that builds an AI much smarter than itself, and that you survive that because that part did not happen.

And it's not that it was never the case, that the word AI was supposed to be pumped so full of scary vibes, that by the time you had a bunch of stuff in your society called AI, the vibes would do out of them and kill you. This was just never the physical theory of how this stuff killed you.

Liron 01:07:55
Right, right, right, right. So there's this threshold of super criticality, threshold of super intelligence where as you say, it can design the next AI, it can do sophisticated plans in the real world. And we've always had you and I, right, you started, and then I picked up on it. We've always had a conditional prediction, right, where once we get to the threshold, then things are going to be very dangerous.

And we sure seem to be approximating the threshold, but only in the way that a nuclear pile could be approximate, approaching criticality. It's still not exploding yet.

Eliezer 01:08:25
I think if I believe that I would be much sure that we were going to be dead in two years than I presently feel.

Liron 01:08:31
Right. So you're not as confident that we're taking a straight shot toward the threshold. You have more uncertainty around that.

Eliezer 01:08:36
It's not clear that all the pieces are in place and we just need to pile up more of the same stuff. It's it. And it's also not clear that it's not right. It's. Yeah. It, do feel like the better they get it, the better they get at coding and the science research reports, the more it looks like you don't need another breakthrough. The size of transformers tend the world.

But the key to successful futurism is that some problems in futurism are easy and some problems require a bunch of background knowledge and then become easy and some problems are impossible and you only make the first two classes of easy calls, you don't make the impossible calls.

Liron 01:09:17
Exactly. And you write about that in your book, which is very interesting, let me dive into this a little bit more. Why hasn't current AI gone super critical yet? And I know that neither of us confidently know. I don't think anybody does, but I hear a lot of different theories and people tend to be very passionate about their theories.

For example, A lot of people think that they know, the fundamental question of rationality, right? What do you think you know, and why do you think you know it? A lot of people think that they know that a current AI is incapable of inventing anything truly new. They think that is the fundamental problem. Do you think that's the fundamental problem?

Eliezer 01:09:48
I think the fundamental problem is that current AIs are missing the prognosticator, and if you just added in the prognosticator, they would immediately go super critical and kill us all.

And your next question is, what's the prognosticator? Well, why the hell would I tell you that? If I knew it. Why would I even tell you my top five theories as to what the prognosticator might be?

Maybe I sound clever over the course of this interview and then, somebody, one of the AI companies watches the interview and they were probably thinking about prognosticators, but now they're like, oh, cool. I'll devote 5% more effort to that. I wouldn't be helping.

Liron 01:10:26
Okay, well that's not useful to my ratings that the smartest people who are the most likely, most likely to know what's the missing ingredient are also smart enough to realize they shouldn't say it.

Eliezer 01:10:39
Or, and it's not that we're sure that we know, it's just, it's not helping.

Liron 01:10:41
Right. Okay. Yeah, no, definitely. Fair enough. We'll just move past that. I will say, I think that, I'm hopeless enough at giving an answer that I might as well just say it. I think the prognosticator is just that 10% of the time they make a mistake and then they look over the mistake and for whatever reason, they don't correct the mistake and then they keep going and then they accumulate errors. That's what I think.

Eliezer 01:11:00
All right. Well, I hope that either you're wrong about that or that nobody watching the show believes you.

Liron 01:11:05
Yes. Okay. I do too. So there's the super criticality threshold. Some point where AI is gonna be more capable than humans. And when we work up to there, whether it takes two decades or however long it takes, whatever the prognosticator is, we have to take this leap of faith or leap of death, or whatever you call it, by pushing the run button on this thing or, and when I say we have to, that's what other people think, right? And then you and I are like, well wait, we don't have to. But some people think, they must think that they're going to push that button and they're going to take that leap. They think that's a good idea because the leap is going to exist, right.

Eliezer 01:11:40
There's the object level and the meta level, and the object level is, well, if the button's going to kill you, don't press it. If you say that somebody else is gonna press it, instead, let them kill everyone. Do what you can to try to bring the world around to where the buttons have been disassembled and nobody is allowed to press the button. That's the object level alternative.

And then on the meta level, I think you've got some people who have formed their funneled worldview where they're having fun doing the stuff that they're doing and arguing the things that they're arguing and they love the vibes and when you say like, maybe don't press that button, it'll kill everyone. They reach around for a reason to say that they can keep going. They're like, oh, somebody else will press it.

The normal person with kids might not find that very convincing, but they find it convincing and how are you gonna stop them? And the answer that has to be an international treaty. But as long as they find it convincing to say, yeah, somebody else will do it anyway, they are going to do it. And if you're a parent with kids, you might need to exert what political force you can to try to have multiple countries have that not be legal.

Liron 01:12:50
Right. Okay. So yeah, I mean that term leap of faith, and I think you've used the phrase before, leap of death. I think it's just important to flag it in the viewer's minds of like, this is a thing that seems like it's going to exist. That moment when we go from before super intelligent AI to after super intelligent AI.

Eliezer 01:13:07
Might or might not notice, there might or might not be an obvious day when they're taking the step of no return. There are versions of this story where it's blatantly obvious to them that's what they're doing. And there's versions where it's obvious and they're in denial about it. And there's versions where there isn't a clear rubicon to cross, just the AI that gets smarter and smarter and quieter and quieter until it's getting smarter and not letting you know that that's occurring.

Liron 01:13:33
Yes. Now, unfortunately, right now, everybody's intuitions is just being trained on pattern matching life today, life in the before times, and you and I are trying to say, Hey, there's a discontinuity. The after times are just not going to feel like on an intuitive pattern, matching vibes level, the after times. I mean, there's not really going to be an after time for us, right? We're talking about a discontinuity here and a big pushback.

We get essentially stems from the idea of like, but there's so many patterns in the before times I can't imagine a discontinuity in all those patterns.

Eliezer 01:14:08
Well there's the remedy from acquiring historical perspective where your present everyday life has not been a thing that has always existed for the last thousand years. Your species has not been a thing that has always existed for the last million years, and mammals have not always been a thing that had existed for the last billion years, and universe has actually changed and is coming up on another one of those.

And I am not really sure how to, what you're supposed to say to, to break people out of the trance of the eternal now.

20 years ago, there were people arguing over AI timelines. It was this very popular activity. The very popular displacement activity back then, including among many who called themselves rationalists, though not me. And there'd be people who would go I think none of the AI stuff is gonna be real for another 20 years.

Well, here we are, it's 20 years later and the people who said, ah, it's not gonna happen for another 20 years, didn't actually go use those 20 years to prepare for anything. There were people who did try to take that time to prepare, but they weren't the people going like, ah, it's not gonna happen. For 20 years. There were people who were like, I don't know what's gonna happen, but we you gotta get working on this thing.

But the people were like, ah, it's not gonna happen for 20 years. What they actually meant by 20 years was never. Unreal fairy stories. What is 20 years later, it's just another, now it's just another person who's like, they're being like, oh no, this is all happening now. Because it is now 20 years later than 20 years earlier.

People did all this arguing over like, how long is it gonna take the house to burn down instead of getting out of the house? And I think they just weren't putting themselves in the shoes of somebody who, whether five years later or 30 years later was gonna look back and be like, oh, well now it's five years later and 30 years later. Now it's, now, now it's now again.

The problem is the future actually happens to you, is the thing. You actually gotta live through it. If you are incapable of telling a difference between good and bad reasoning and furthermore have given up on all hope that you will ever learn to do that, you might say, oh, well, sometimes you can find at least one person on the internet who says a bad thing will happen in the future and then it doesn't happen, and this proves that we don't need to worry about anything ever is kind of more or less an argument some people are making basically, but.

Liron 01:16:59
Yeah.

Eliezer 01:17:00
There's other bad stuff that does happen to you, and I am I want this to be a mental operation that people have access to, to understand that one year later, if you're still alive, it will be one year from now, one year from now is not this fancy, never time unless we're dead. It is something you end up there, a slightly different person from the person you are now, but there enduring it, conscious, aware of it, having to deal with whatever it is and likewise the end of the world.

Liron 01:17:17
Yes. And there's also this idea that discontinuities are possible, right? So people are saying like, yeah, sure. We'll live in the future, the extrapolated smooth trend future. They're like, no, no, no. It could be a discontinuous future.

One of the most popular interviews I've ever done is actually with a prepper channel on YouTube. I'm actually not really a prepper myself, but these people have thought of the possibility that there may come some time when things go discontinuous. When the stock market didn't go up 10.5%, it went down negative 99%, right? Like a discontinuous change. And they're saying, yeah, this could be real for me. I have the supplies to prove it, that I consider this a real piece of reality. And outside of people warning about AI risk or preppers, it just seems like a thought. People don't let themselves think.

Eliezer 01:18:07
I mean, many people save for retirement, many of those may be doing it out of habit. I think there's many normal people in the world who are capable of sympathizing with themselves five years out and taking actions on that basis. When one hears rumors that this happens to many parents at the point where they acquire kids, which is part of why Jan Tallinn, one of the relatively earlier backers of at least some real work in this area, we would would say like, yeah, you wanna talk to parents about these things.

Liron 01:18:40
Fair enough. So now instead of just building up the problem, we are going to start talking about the solution and what that hypothetically looks like. Now, to be fair, we're probably going to poke a lot of holes in the solution. So it's not all gonna be roses, but this is called the alignment problem, and you are arguably the first AGI Alignment researcher, correct?

Eliezer 01:19:07
Yeah, that's probably fair.

The Alignment Problem

Liron 01:19:09
Yeah, no, I've obviously sung your praises a lot, but I still wanna point out that when you're the first person to be in this new field that nobody else is in, and then a couple decades pass, and this is considered a very hot, critically important field that many smart people are in, that is a pretty big credibility building achievement in my opinion.

Eliezer 01:19:28
Sure. People sure are buying out the old prediction market shares saying that this problem was going to be important. I do feel like a lot of the funding is filtered in a way where it only goes to people who don't understand why the problem is hard.

Liron 01:19:43
Yes. Yeah, that's right. And that's actually something I'm gonna be asking you about. Stepping back a little bit, and bringing it back to this distinction of subcritical versus super critical AI alignment, there's a lot of misleading impressions going around right now because the AI is subcritical and people are talking about aligning it while it's subcritical. You've used the analogy of like, it's a baby tiger today, and it'll be an adult tiger in the future, but people are spending a lot of time trying to draw conclusions about baby tigers.

Eliezer 01:20:15
I am not sure I've used that exact analogy, but it sounds pretty close to one that I'd use, so sure.

Liron 01:20:21
Maybe you said baby dragon in an old dragon.

Eliezer 01:20:23
I did, I, I do remember saying Dragon. Yeah. I would also point out that this is not perfectly correspond to the subcritical super critical distinction. The distinction between the AI that cannot make the smarter version of itself and kill everyone and the AI that can make the smarter version of itself and the AI that can kill everyone like this does not quite correspond to like, it is not drawn in exactly the same place as the mistakes that the current AI are making.

You see what I'm saying here? You don't wanna point at the current guys and go like, ah, these are inherently subcritical systems and we know this for sure. And these other AI where you'd be like, inherently we don't know that stuff.

Liron 01:21:00
That's right. Yeah. There's absolutely a lot of uncertainty around it. Now, the issue I see, and when it comes to AI alignment discourse is everybody's got their magnifying glass out and looking at what the baby tiger's doing and trying to draw conclusions like, oh, the baby tiger took a swipe. It's swiping. And it's like, well, but it was just playing. It didn't really think it was going to kill it. That seems to me like what the discourse looks like today.

Eliezer 01:21:25
Sure. Keeping in mind that there are social pressures keeping it in place that way. If you, to the extent you walk around acknowledging the difference between the baby tiger and the adult tiger, you are lessen the job of clasping your hand on the shoulder of the funder and being like, I can save you, trust me.

Liron 01:21:46
Now, in terms of the tools that we use to align the AI, I think today I describe it as Well, we're wearing these oven mitts and we can kind of bat it around with these oven mitts, and it just works. Sometimes it goes off the wrong way and we just give it another bat. I mean, it's not a precise tool, but it does the job or you, are you liking that analogy so

Eliezer 01:22:05
Well to be exact that which we have found we can do this way is now considered to be what the job is. Pliny the Elder cracks all new AIs released within 24 hours of their being exposed to him. And 15 years ago that would've been considered a terrible, pessimistic scenario back when nobody actually had to live with it. And now the people have to live with it. It's just taken for granted.

So the technology of safety and what it actually can do is now what we think it's supposed to do.

Liron 01:22:31
You're talking about the Twitter user, Pliny the prompter.

Eliezer 01:22:34
Yeah. The old guy said, as far as I know.

Liron 01:22:36
Right, right. Yes. Okay. So I have noticed Pliny doing that, and I do think that that's a good representative sign that these AI companies don't really have control over what they're doing, but the amount of control that they do have, as you say, is enough for them to release useful product.

Eliezer 01:22:52
They can sell products that people want to buy for valid reasons. Yes.

Liron 01:22:58
Right now. Another analogy that I might use is that Subcritical AI steering, the regime that we're in right now, it's like the Wright brothers figuring out how to steer a plane, because that was a big part of the flight problem that they had to solve. Right? They wasn't just how to get the plane in the sky. It was, okay, what do we do with the wing flaps? Right? How are the birds doing it? That was a big part of the.

Eliezer 01:23:20
Yeah. You think that the problem is all getting the thing into the air, but it's more like, then it slides through the air and goes crunch. So, stabilizing it inside the air is, was in one sense the larger problem.

Liron 01:23:32
Exactly, and it was a very real problem in the case of flight, and it is a very real problem today in the case of making a useful coding assistant, and a lot of IQ points are going toward solving that problem. It's just that once we get to that super critical moment, when the AI is super intelligent, when it is more powerful at steering the feature, humans are, it doesn't look like, oh, what angle should I put the wing flaps on? It looks more like, oh, I'm in a fight, right? I'm fighting an aggressive cancer. I'm foiling a million terror plots coming at me. I'm fighting an army of terminators. I, and I think that's going to be a discontinuous shock to many people.

Eliezer 01:24:07
I mean, that's what it looks like to screw up superalignment. If hypothetically you had, I'm sort of jumping ahead to the good scenario here, but if you shut down AI internationally and then you tried out a bunch of adult gene therapies on people to find out which ones worked to actually augment intelligence, which would only be a few of them, but maybe you can get some distance that way and you got people who are.

Just so incredibly unbelievably smart that they stopped being such damn idiots in the way of humans and they just stopped expecting things to work that wouldn't work because they'd got gone a bit saner than that. If those people were like, Hmm, all right, I think I can build a super intelligence from here, and we're just correct about that and knew they were correct because they were past the point of making silly mistakes about what they did and didn't know, those people are not planning to get into a fight with something vastly smarter than themselves Launching a thousand plots per second against them and win. That is not their plan for winning their plan.

Their plan looks like they have built a thing that is, that is flowing through a channel known to them. And where the water never reaches the side of that channel and flows out into into all these tributary processes of launching attacks on you.

Liron 01:25:26
And I think this gets to the problem with most alignment researchers who their job title is safety researcher at an AI company. People like that. There's many of them. They are, for the most part, highly intelligent. But I get the impression that they think that large language models, like the kind of AIs that we have today, they think that they've sidestepped the difficult issues about AI going super intelligent super critical, and having more power than humanity.

They think they know more about how the future is going to go just because the model is working. The Wright Brothers model of changing the wing flag, angles, it's working for them today, and I think they're in for a rude awakening.

Eliezer 01:26:05
Well, on the object level, I'm not familiar with any arguments that. Well, pardon me, I'm familiar with one class of arguments, which I would consider to have been already refuted by, for example, ChatGPT psychosis about how it was all going to go. Great.

But I'm not offhand conversant with surviving arguments for why the people trying to prevent Claude code from deleting your code base, are doing just the sort of work that scales up without difficulty to super intelligence. Probably, many of the ones who work at Anthropic in particular, may even know better than to believe that they have, they're solving super alignment by working on Claude code, and try to prevent it from bleeding your code.

Which makes it less useful. But I'd expect o people would open AI to, to be substantially less conversant with any of the arguments for why their, the work might not generalize perfectly and why constraining something smarter than themselves reflective, reflecting on itself, rewriting itself might be any more difficult than what they're doing now.

As for a claim that runs like here is here's the deep principle that is true of our thing and that is true of a super intelligence. And here's the computer science of that here's the river that flows deeply enough that what we do will just carry over.

That doesn't exist as far as I know, there's nobody even putting forth that claim for me to shoot down. Barring the old alignment by default, people who I think got falsified when the current crop of AI started doing things that they would, if you asked them about explicitly say, were morally wrong.

Liron 01:27:46
That's right. Yeah. Alignment by default. That's something people say and I think it's coming from the place of like, I talked to Claude, I talked to Gemini, and it just feels right. It feels like it's doing the right thing, it's making good decisions, and I think it's going to go well and it's taken in good data. There's a lot of people who seem to be honestly making that extrapolation.

Eliezer 01:28:07
Spoken like somebody who has never tried on a whim to pick up a unidentified caller and found that it was somebody who'd been talking to ChatGPT, discovering all sorts of fascinating things and getting only four hours of sleep a night.

And all I could do because our civilization isn't set up to the point where I can transfer this call to any adult hospital that handles it. All I can do is beg the guy to get more sleep and then he, if I recall correctly, text me back, a screenshot of ChatGPT, arguing him out of getting sleep.

So yeah, I don't wish it on them. And it's possible that AI companies will succeed in correcting the overt driving people into psychosis. I'm a little bit surprised they haven't corrected it already. It seems like it shouldn't be that hard to have a much smaller AI detect when the big AI has started to drive users insane. The conversations are quite characteristic.

They might fix this part, but if they don't fix this part, then maybe people learn better than to think that the models are benevolent when they lose a friend.

Liron 01:28:50
And you know, the, those kind of psychosis situations, it's like the baby tiger is already slashing people. It's just, that would be more fixable and patchable if we didn't than expect it to grow into an adult tiger.

Eliezer 01:29:07
I mean, if the problems we see today were as bad as they were ever going to get, while the benefits would just keep scaling, I would be, even more gung-ho on AI than I am on biotech or nuclear power.

Liron 01:29:17
Right now for those of us paying attention, the AI companies actually were pretty candid. I mean, OpenAI specifically was pretty candid In 2023 when they announced the super alignment team led by Ilya Sutskever and Jan Leike, who have now both resigned in frustration ever since then. But if you go back to 2023, they were saying, Hey, we know that we don't have an alignment solution that scales to when AI goes super critical. We know that the alignment we're doing is only for the baby tigers, only to make money today. And yes, we need to solve it. And they even came out and said, this is unthinkable today. But they even came out and said like, yep, we are giving ourselves a four year deadline.

We really think we better hurry up and solve it in four years, essentially, like it's irresponsible not to. And now, you know, of course, we're two years in. For those of us who followed, what happens? The team doesn't exist anymore. A lot of those people quit or got fired. OpenAI hasn't replaced it with anything. But there was this moment of candor, this moment of taking responsibility.

Eliezer 01:30:08
I, well, my standards might be high. I was not very impressed there. I did not expect it to go anywhere. I don't feel like they had indicated that they understood where any of the difficult problems were. It's like watching a bunch of people who don't know why curing cancer is any harder than curing a twisted ankle announce that they're gonna go tackle cancer in four years. They have their twisted ankle level medicine ready and they're gonna go use it to cure cancer.

We had people saying like, we'll just, get the AI to solve the AI alignment problem. Which reflects from my perspective, non mastery of some pretty basic ideas that. Perhaps everyone making big promises in this field is filtered to not understand, like the divide between can you verify the answers or not?

And the notion that what makes it hard to get good work out of that? One of the factors that makes it harder to get good work out of an AI is if you cannot reliably thumbs up the good answers and thumbs down the bad ones. So you've got this say AI, it's giving you a bunch of alignment theory. Can you successfully thumbs up the good answers? Well, maybe these people thought that, they were now gonna be fooled by any alignment arguments and I can't just point to their credence in super alignment as evidence that they were bad at discriminating this stuff. Because I need a base case.

It's not like they wrote up. And the fundamental challenge here is our ability to distinguish whether or not the AI's doing good alignment work for us. Because we can't just follow the AI's clever scheme for surviving, building super intelligence and then see whether or not we're alive at the end and then press thumbs down if we're dead. That's what makes it hard for humans. It also makes it hard for humans to get that work out of an AI.

Liron 01:31:35
Well this gets to the closest thing OpenAI has ever published to a proposal of what they're going to do. They said they'll build a weak AGI and they'll have it help solve the alignment problem for the next generation, AGI, they'll just stair step their way up. What do you think about that?

Eliezer 01:31:52
I think they have been filtered to exclude anyone who could understand or admit why. What are the core difficulties along the way to doing that?

If somebody comes over and is like, let me tell you about my brilliant machine that turns tap water into ice cubes and electricity. A perpetual motion machine of the, I forget which numerical type, entropy reverser, the number one thing you want to hear from their lips. Next is, now this might sound like it violates the second law of thermodynamics, but it doesn't, and totally obeys the laws of physics. And you might think that this is difficult, but the core insight here that overcomes the key difficulty is. That they know why you're skeptical. They know what sort of principles would usually prevent them from doing that, or make it hard to do that. This is what you want to hear.

And if instead, if they're like, let me show you all the detailed gears and wheels here and my spreadsheet of results that I got from running it on the current, on tap water from my faucet. And how cold it was when it came out, and how much electricity came out of this wire here. If they're telling you that they've got no idea why what they're doing is supposed to be difficult, it's a very, very bearish sign.

You know, you're like, I'm going want to use the little AI to solve the alignment problem for me. Do you understand any of the reasons why this is hard? Do you understand any of the reasons why this was not my frontline proposal in 2005?

Liron 01:33:11
Yeah, I agree that they don't seem to, so I'll just put on my Sam Altman hat because I've heard him say this. He'll say, okay, Eliezer, but you're just standing at a whiteboard trying to solve the problem at a whiteboard, and that's never going to work because you never solved it in the past. So the only thing that we can practically do is just build the version we have now and try to solve it once it's live. That's basically what he said, like, we just gotta keep releasing and deal with it.

Eliezer 01:33:45
Well, I think if you've got this medieval alchemist who is like, well, nobody can figure out by philosophy, which substances are safe to inject into people. And so we just gotta keep injecting substances until we find the key to immortality even. This is not a sufficiently drastic analogy because it's more like their brilliant cure gets administered to everybody on the planet simultaneously.

Liron 01:34:21
Yeah.

Eliezer 01:34:21
The alternative here is, well, if you can't call this shot, you don't get to take the shot. Nobody on Earth gets to take this shot. We are shutting it all down, and that's hard, but it's probably easier than World War II. I expect we'll get this to, later. And World War II was, which was fought for smaller stakes. We did show up and fight that when we had to.

There is no law saying that because something is hard for you to figure out in a reliable way that you must be able to figure it out in a chaotic, unstructured way. There is no law saying that because the alchemists could not figure out apriori what constituted a potion of immortality. That if they put together a potion immortality and tried feeding it to some people, they would manage to find immortality on their fifth try.

And only after you feeding people small enough doses not to hurt them or something, you can't even draw analogies here, but between the sheer scope of how hard super intelligence would hit the poorly designed super intelligence hits the planet and tells you that your clever theory was wrong and doesn't give you a chance to try again. And the obvious analogies, the past people who understood the problem this badly, like medieval alchemists trying to cope with potions of immortality except for the part where they didn't, it didn't work.

And the part where they were unable to figure it out via philosophy did not, thereby, there was not a balance of the universe that thereby empowered them to figure it out in a small number of tries in a small amount of time.

Liron 01:35:43
Yeah. So from our perspective, these super intelligent AI builders, they're failing humanity on two levels, as you said. The first level is they're not familiar with the hard part of their problem. Right? So they can't really explain to you

Eliezer 01:35:56
They, there are people who are familiar with it. They are, they have been filtered out of those social positions. Yes.

Liron 01:36:02
Right. And then more on the meta level, they're not setting themselves up to react appropriately when the problem is not solvable by them.

Eliezer 01:36:12
Yeah, You can imagine a grownup version who is like, we are supposed to have figured out this thing by this time. And that's the business plan. And if the business plan is failing, that means humanity is even more of a massive emergency. And the thing to do from there is halt, melt, cash, fire, go to the world leaders and tell them to shut it down or die.

So by giving themselves the four year time limit, I'm a little worried that it's slightly performative, but also Ilya is a bit more of a grownup about these things than Sam Altman. And by giving himself the four year time might have been trying to do something like expressing we need. We need. a business plan. It's generous, as far as we can tell in 2025. In 2025, it is not obvious that you've got until 2027 to work everything out.

But maybe it sounded more plausible in 2023 that, your four year deadline was short enough.

Liron 01:37:03
It was a little spark of grownup behavior that we just haven't even seen ever since. Specifically the, this idea of the plan B, the what if this isn't solvable. This came to me when I was scrolling Twitter and I saw the safety head of one of the major AI companies tweet something like, oh, we've got some interesting new directions for our safety plan. I'm optimistic that this is going to yield some good results. And I was like, okay, well, I'm glad he is optimistic, I hope he's found something. But wait a minute, if this problem were actually unsolvable, if he's embarking on a research path that maybe in 20 or 30 or 40 years, we'll eventually solve it.

If that's the situation, whose job is it to notice and blow the whistle and then plan accordingly? Plan for the problem not being solvable on the 20 year timeline by which we're probably going to have capabilities. Whose job is it to steer the plan B, the intractable, the alignment being intractable backup plan. Whose job is that?

Eliezer 01:37:58
Well, if you don't have any other account of whose job something is inside a startup, it's the CEO's job. Or whichever co-founder has adopted the catchall role. And similarly, if you're a planet has no job for doing something, it's Eliezer's job I suppose.

Liron 01:38:22
Yeah. Yeah, that's true. That's true.

Eliezer 01:38:25
By the way, should also emphasize that the make Eliezer do everything, plan doesn't work in real life. And he, and failed. But yeah, just saying that part.

What We Want from AI

Liron 01:38:35
You've used the example problem of clone a strawberry down to the cellular level as a test challenge, and you don't even think that our species will ever even get to the level of being able to solve that problem, much less the harder alignment problem of human values. We're not even gonna get there, we're not even gonna be able to align a super intelligent AI to clone a strawberry down to the cellular level, correct?

Eliezer 01:38:57
Yep. Although important context, it's easy if you have a rock to make it be very safe. So we need to say what is this AI doing that is powerful enough that actually even needs to be aligned or is in any way difficult to align and build two strawberries identical down to the cellular level but not necessarily molecular level is standing in For an AI that is powerful enough that it can invent the new biotechnology needed to pull that off.

If you have an AI that can, that can do this thing in which you can make, do this thing and you can make it do it without killing a bunch of people as a side effect, you can maybe also ask it to cure cancer, for example. Or more to the point augment human intelligence can do various, you can do various things with it that are actually helpful.

Liron 01:39:39
Yeah. The reason why this is such a nice test challenge is because in order to successfully clone the strawberry in that level of detail, it's going to be doing reasoning and science and engineering at a level that's beyond the top human institutions right now. So it's going to really be exploring its options, doing what I might call a broad domain search. It's going to really be tapping into the essence of intelligence. And if we're able to show control over that kind of system, then that is an optimistic sign.

Eliezer 01:40:05
Yeah, it's powerful enough that aligning it is important. It's, if you say it's easy for whatever reason, then it's not just easy to align a rock. It's easy to align a system that is doing actual big, powerful new tech development, new research plan things in the world that our current biologists cannot do with very far away from doing, and that will let you do other useful stuff.

Liron 01:40:26
Right. And just to explain to the viewers, I mean, you can imagine cloning a strawberry using today's technology in the sense of like, oh, we'll just have a lab grown strawberry, we'll just put the strawberry substance together. But you're talking about cell by cell, like really get a very

Eliezer 01:40:38
Yeah. It's not cloning a strawberry. It's copying, it's xeroxing.

Liron 01:40:44
exactly. And we just don't have the technology to do this kind of fine-grained cellular level. We don't have a 3D strawberry cell level Xerox machine, and by the time we do have it, we're talking about a pretty advanced state of technology.

Eliezer 01:40:57
Yeah. The task is chosen to be relatively simple to describe and talk about and reference, but to in the background require a new, basic, many years ahead, biotechnology framework behind it to do it.

Liron 01:41:12
Right, and the alignment problem. It's a mystery wrapped in an enigma, because, okay, there's this like get it to build the strawberry Xerox, but then it's also get it to respect what humans truly want and do moral calculus. There's many dimensions of the alignment problem and we seem to be on the ground floor of most of these dimensions.

Eliezer 01:41:31
If you can get it to just build the strawberry and not kill a bunch of people or do anything else that has giant side effects and it rewrites the rest of the world outside the laboratory, then maybe you can use this thing to cure cancer and augment human intelligence and otherwise, actually manage to flip the game board there.

In the world where you can get an AI to do quite large things in a sort of mundane sense without massive side effects you didn't sign off on. You can punt some of the very elaborate, impressive moral trolley problems down to the augmented humans who are building the next generation of AIs after that.

Liron 01:42:05
The next little segment here is we're gonna talk about the good scenario. We're gonna talk about heaven because once we lift our head up from the annoying problem that we can't make super intelligent AI do what we want, we're nowhere near the level of insight to make super intelligent AI do what we want. Once we lift our head up from that, there's this other shocking realization of like, can we even say what we want?

Like if we had a genie that would grant our wish, are we even in a position to make a wish right now as a species?

Eliezer 01:42:34
Certainly there's no established mechanism for making a wish as a species. And if you took a UN vote, I don't think that would be a very good idea at all. A bunch of that has to do with how, we ask for things that will make us happy, but what if we are wrong about what makes us happy? Well then we have a prediction problem. And if you are standing right next to a super intelligence, maybe you shouldn't be trying to make all of your own predictions there.

But at the same time, if you are like, well do whatever makes us happiest, it's like, okay, injects you full of heroin. You still ask for the wrong thing, but it wa But what made it wrong was not that you weren't happy at the end, you are now quite happy, but it turned out that you wanted more things than that.

And so there's the question of how to take into account your preferences and your idealized preferences and, yada yada, yada. We could go down the abstract, thing of how do you define what you want, which. Now I just brought up coherent extrapolated volition in 2004 and said like, okay, there's the target. And then everybody went on twisting themselves up and like, well, but how can we possibly say what it is that it means to align in AI when different humans want different things? And I'm like, I wrote this up in 2004. It's done.

You have not gotten past the point of addressing the things that were in the starting paragraphs of the 2004 work. This is obvious. It is. If you are interested in actually answering it rather than wrapping, trying to, like clutch this unanswerable thing to you, like some kind of weird squishy doll. Then coherent extrapolated volition in 2004.

Liron 01:44:14
Yeah. So You took a serious stab at the problem in 2004. And obviously there's a lot of detail I think that's left to fill in. Is that fair to say?

Eliezer 01:44:24
I mean, we haven't covered the whole concept in this chat. If what you mean

Liron 01:44:27
Right, Right, Right, Okay. Okay. Alright. So you took a stab at 2004 that I think represented real progress at is good. And let's contrast that to what the AI companies have been doing in the last couple years. How have they been telling us about the good scenario? Where are they aiming for? Are they being clear about what outcome they're aiming for?

Eliezer 01:44:48
I mean, here we're going knocked down how you, how do you define what makes a good outcome? Good, but concretely, what's a good outcome? And from my perspective at least, that's always been to a first gloss. The worthy descendants of humanity. Including perhaps ourselves among their number, go out into the galaxy and the other galaxies and make the stars our cities. And they are, they are full of conscious minds having fun who care about each other.

And somebody's like, well, what exactly do you mean by fun? Then I'm like, check out the fun theory sequence where I try to describe some of the 31 laws of what makes something that you would use to answer questions. Like, how much fun is there in the universe? Will we ever run out of fun? Are we having fun yet, and could we be having more fun?

Yes, we could be having more fun and, If you go about it in a very naive way, it's sort of like, oh, how could I be having more fun in this video game here? Well, by having a higher score and by slaying even more monsters, even quicker, I will have this AI come over and play this video game for me. Oh, no. Now the AI's playing the video game and it's racking up a high score, but I'm just sort of sitting here watching it happen, feeling bored.

And this is what similarly goes wrong with many proposals that people have for how to use AI to improve life. It's like that, but with life instead of the video game. And yet it doesn't stop there. It's not an unanswerable question. You can imagine AIs helping people, doing thing, doing things themselves if you could control them, which you can't.

Liron 01:46:34
Let's talk a little bit about fun theory, because when you first read Eliezer Yudkowsky Fun Theory, which I think is back from 2007, 2008, it seems frivolous. Like you're saying like, oh, here's ways that we could have more fun if we had these criteria. Like if every day we all get a little bit more capable, but the challenges get a little bit harder, right? It's kinda like you're designing the ultimate video game universe for humanity.

Eliezer 01:46:56
I do talk about what it takes to not be a video game, like long term, serious stakes and higher purposes and entanglement with other people. And it not just being and skill acquisition and it not, but you actually getting stronger and not just more points. And, I did try to describe a set of conditions such that it wouldn't all be a video game.

Yet in the end, there are kinds of meaning that maybe should vanish forever from the universe. Like the some people are in really serious trouble. And to help those people is a very noble endeavor. And also, there's an end state where you have helped them and they're no longer in trouble and you can't just go put new to people into trouble. So you get the fun of helping them. That is contrary to the whole high purpose that you start with. So there are certain kinds of meaning that will, in a successful scenario, vanish forever from the universe.

Heroism will not, will never be the same.

Liron 01:47:56
Right, right, right. Wow. Okay. So you're engaging with these problems very seriously. Problems that most people, myself included before I read your sequence, don't even realize is a problem because most people are like, yeah, we'll solve all the problems and then we will just create heaven on earth. But then you look at what heaven actually is. Anytime anybody's tried to describe heaven and it's like, yeah, you just feel really good forever, and there's angels, it's like, well, if I was actually there in the place you describe, it seems like it would get boring.

And if it didn't get boring for me, isn't that a me problem? Isn't it kind of lame of me not to get bored by the traditional heaven? I think it is.

Eliezer 01:48:30
Wait, wait, sorry, sorry. You're saying it's lame to not be bored by heaven.

Liron 01:48:34
No, no, no. I'm saying, I'm saying either you get bored by heaven or you're having a good time chilling in the classical description of heaven. But then wouldn't that be a lame outcome for you to be the kind of person who thinks that just sitting around with angels just chilling is not boring.

Eliezer 01:48:48
I mean, from my perspective, the whole thing is a literary exercise in failing to continue to think through the consequences and put yourself properly into the shoes of the person experiencing heaven and ask how, asking how they feel about it a day later, a year later, a century later, 10 million years later.

But then if you actually do try to think ahead and continue asking the obvious questions and instead of just hugging the questions to yourself as unanswerable, I have developed, part of my new identity is that I have developed this decisive, philosophical critique of the notion that life can never be good.

And instead you're like, all right, we're gonna keep on optimizing. We're gonna keep on solving this problem. How do we actually have some fun, just work it out, I've got my fun theory sequence, which is my shot at working it out.

Liron 01:49:47
Now, if you tell me that there's a person who read the Bible and came away with a conclusion that heaven wasn't good enough for their standards, I would understand why you get the accusation sometimes of being a religious figure. But the thing is, it's not just your standards. Because once people like me read your stuff, I'm like, you know what? It's not good enough for my standards either. I think that's what distinguish it I don't think you're trying to replace Jesus.

Eliezer 01:50:07
Carry on.

Liron 01:50:08
Yeah. No, that's just an observation. I mean, it's pretty crazy that you're noticing these problems to be solved. So yeah. So you've got your fun theory, you've got coherent extrapolated volition. Let's touch on that a little bit. I do think that that represented a real unit of progress, of defining what we're trying to go for here. If we had a super intelligent genie type of AI.

Eliezer 01:50:29
Right. So one way of looking at coherent extrapolated volition is that it's what you inevitably end up with. I would argue at the end of any well conducted Socratic dialogue where somebody is like, well, here's how it's hard to help people. And somebody is like, okay, well, but then how do you help people?

Let's say two people each want a pizza and you have only one pizza. And if you give it all to one person, one person's sad. If you give it all to the other person, the other person's sad. If you give half to both of them and neither is truly happy, we have now proved that altruism is unsolvable to help another human being as meaningless. Sometimes they want different things. What could an AI do here? It could only be sad. We might as well let it destroy the world. There's nothing any better than that. They kind of both have the pizza, and if instead you're like, okay, but suppose you just try to not be a jerk. What is the least jerky form of AI?

Somebody's like I think Mercury will grant me a mortality. Give me a glass of mercury to drink down and somebody's like, ah. Hey, what people want the thing, but it's not good for them. So if you give them what they ask for, they'll be harmed. And so altruism is futile. Helping people is meaningless. There is no coherent thing in AI could do in this situation to try to not be a jerk. And I'm like, seems like you're more of a jerk if you give the guy the mercury.

And that's not because we're taking all prospect of ever helping people or even be responding to their own desires and throwing it out the window. We're looking at things like, this person has an inaccurate predictive model of what will happen to them after they drink the mercury. They're asking for the mercury on their way to something else that they want, but there's a piece of knowledge that the AI has, which they don't have, where if we imagine an alternate version of them that knew this additional fact.

Liron 01:52:43
Mm-hmm.

Eliezer 01:52:45
The alternate version of them wouldn't want to drink the mercury. Or maybe it's something like, somebody being like, I want to make, make me a cat girl to date. I want to date the cat girl. And the AI is like, well that might potentially violate the cat girl's rights, but I can make you a non sapient version of the cat girl. Only GPT-4 level intelligence. Because any further than that than you're pushing it.

And the person like, I think this is a great idea. And they think they're just gonna live happily ever after with their cat girl. They're not, they're gonna feel, start to feel dissatisfied after a year. But you imagine what would happen if the guy knew everything that the AI knew.

What would they want then? What would they want the AI to do for them if they knew everything the AI knew and the guy's like, you know, I got this version of me that needs to make his own mistakes over there. He needs to live through the process. He needs to realize that he asked for the wrong thing. It's gonna be somewhat happier or a lot stronger if you let him make his own mistakes. And if you don't, he's gonna hate you for it. He's gonna think that you're paternalistic, he's gonna be right.

Give him his GPT-4 level cat girl. And then the AI does that thing and it's doing the thing that is least like being a jerk. In that case. There is no perfect solution. But there is, looking over all the, there is looking over all the things there. And importantly, not just doing it with the perspective of a dad who's like, I've decided what I want for my kid and nevermind what my kids want for my kid.

It's based on what the person would want for themselves if they knew everything. The AI knew and then, and, but this is not enough. One keeps going from here. There's questions of like, does this person approve of the person they are now? What if they were more like the person they wish themselves to be?

Okay. Like what about ways that whole civilizations can go. Where the whole civilization as a whole has to go one way or the other. You can't just divide up the pizza, you gotta pick one or the other. Is there a voting process of all these extrapolated, alternate versions of people?

What if some of them are jerks? What if 80% of people would be jerks in some truly deep sense where there's just no way to get to a non jerk version of themselves starting from their current baselines. And only we who discuss this issue are so lofty and altruistic as to not be jerks. Is there anything we can do to not be jerks in that case and not just overwrite these other people with, or their own preferences. And this is the sort of thing that coherent extrapolated volition plays out.

But the general idea is it's not so much do whatever people, a super intelligence to do whatever people ask it to, even if that kills them. Or people around them, it's like, where is the co? If you ask what would people want if they knew what the AI knew, thought as fast as the AI had considered, all the arguments the AI has considered were, were more the people they wish they were themselves. And then you look at a whole civilization doing this, and then where does the range of possibilities for where it might end up and what we can we do now that the future civilizations won't regret, that they won't feel has trapped them in some kind of inescapable dystopia.

That's what goes into the recipe there.

Liron 01:55:45
These are all good answers. And there are some people who will quibble. I mean, I've seen different discussions, different people opining on this, but to me the takeaway is just you seriously engage with a problem. I think you at least made progress, right? The degree to which you solve it might be up for disagreement, but then you go and compare it to everybody else out there, you go and compare it to the recent statements by Sam Altman and Mark Zuckerberg saying, Hey, you know what? We are gonna build a super intelligence in your pocket, and what is it going to do? What you tell it? And it's like, well, wait a minute. Aren't we all telling it different things? And isn't it intelligent enough to go fight for control over the earth because they said it's a super intelligence, right? So that is kind of your competition here.

Eliezer 01:56:25
Yeah. I really feel like, to me it feels like there is a case where. There's a bunch of, fairly, there's this, if you are actually interested in pursuing the question, rather than clutching the questions yourself as unanswerable, there's a sort of obvious, way it flows kind of downhill to a sort of obvious guess at what you would do here if you were actually trying to address these questions instead of clutching them to yourselves and you got your people who would prefer to just clutch the clutch the, the continue to squeeze them the question like some kind of squeaky doll that never runs out of squeaks.

And you've got your people who are, they don't want to get into all that stuff when they're trying to sell something to the public.

Liron 01:57:14
Yeah, exactly right. And what you're talking about the squeaky doll, that's so fun to squeak. You've got a post called Mysterious Answers to Mysterious questions that goes into that.

International Coordination Solutions

Liron 01:57:25
We're moving into the final wrap up here. We're gonna talk about the best we can do as a society, as a solution, the call to action basically, because right now I think you and I are of the view that actually solving super intelligent AI alignment on a 20 year timeframe, the capabilities timeframe, the upper end of what we think is the capabilities timeframe

Eliezer 01:57:47
I will be I 20 years is kind of what I feel like you'd get from the international shutdown, not from things

Liron 01:57:54
Yeah, yeah, yeah. Exactly. Like a very generous amount of time. I mean, not obviously for all we know, it could take 5,000 years, but it just seems like 20 years is more than pushing our luck from our subjective perspective, I think.

Eliezer 01:58:07
It's not gonna be 5,000.

Liron 01:58:09
Okay.

Eliezer 01:58:09
That 5,000 years is a lot of time. People who think that it would take that long to solve AI do not understand how long 5,000 years is.

Liron 01:58:17
It's a really long time. All right. All right. Right. So, so anyway, so we think solving alignment on a 20 year timeframe is unlikely. It's intractable because we get no redos, as you've pointed out. It's just a takeoff scenario. And so what do we actually do? One thing you've said that we can do is try to stick to narrow AI, right?

Eliezer 01:58:36
First and foremost, there is no we, until there's an international treaty, humanity has no ability not to do anything until humanity seizes that ability for itself. By which I mean a few major nuclear powers. Get together and say nobody anywhere on earth, including ourselves, is allowed to do the following things that potentially wipe out humanity or, or make it easier to, for others to wipe out humanity.

That none of that stuff is happening. Now, once you have that capability, you could talk about having narrow AI exceptions to it. Essentially no AI company on Earth has shown, has as yet demonstrated any interest or capability in the narrow stuff except for Google DeepMind by the way.

But that's stuff like AlphaFold, AlphaProteo, maybe next up is AlphaCell, the AI that understands all the interactions inside, inside of a range of human cells or something, or not. I don't know what's up next.

But the notion of narrowness is subtle because. It's no coincidence that there isn't some specialized race of beavers that specializes in doing just biotechnology or, or like just building nuclear reactors using narrow intelligence that specialized on building nuclear reactors. It's no coincidence that humanity got there with our very general brains designed for chipping flint spears and arguing politics ahead of any species on the planet that developing specialized brains that were just for building nuclear reactors.

There are problems that are, have enough pieces to them that they're just easier to solve once you use general intelligence for them. And even if you are just grinding reinforcement, learning on solving a narrow class of problems, that doesn't mean that the thing that gradient descent hits on for solving it is not going to be general intelligence.

If you are trying to breed a species to be able to invent nuclear reactors, the the first thing you get is, is, is quite, is plausibly not some kind of species that's specialized on only being able to think about nuclear reactors and not think about anything else. Humans may just be easier to run across in the mind design space than that if you're using genetic programming, natural selection or if you're using gradient descent just the same.

So the mere fact that you train an AI in a narrow domain is not a guarantee that it is a narrow AI if you are using generic black box methods. However, the reason why only Google DeepMind ever builds any narrow AI is that Google DeepMind is also the only AI company that has retained any expertise in doing anything besides generic black box neural network methods.

Liron 02:01:02
Right. Right, right. Yeah. And that is just such an unfortunate fact that, that fix of only trying to operate narrowly is, it's uncertain how far we can push it before we create the same problem, as you say.

Eliezer 02:01:17
But if you know what you're doing, you can push it further. The, if you know what you're doing, qualifier is a big problem. The thing about shut it all down is that if I'm saying this to the nuclear leaders of the nuclear powers, the United Nations, I'm not saying shut it all down, except for my project.

The only trustworthy project you need to understand, I can build AI safely, but are the others guys asking? Oh, no. If that, I can't do that either. At this point. If anyone builds it, everyone dies.

So I'm not telling them it's safe to build. It's only safe for me to build super intelligence. Only let me do it. I'm like, shut the entire thing down. And then when it comes to narrow AI.

I unfortunately, well, not unfortunately in some senses, but it'd be a simpler message if I could just say nobody can build a narrow AI past the following point without everyone being dead. I am no exception to this rule. Nobody can do that safely. It's beyond human ability back off, which is the message for super intelligence.

But if it's about building AI that understand all the events inside a human cell and can invent inside the, the variety of human cells and can invent new drugs on that basis, this is for all I know something that DeepMind can do. And furthermore, depending on how you do it, there's more or less. Scary ways to do that. If your way of doing that is to build larger and larger generic large language models, and then also fine tune them on predicting events in cells. That's less safe.

And if instead you're building a Google DeepMind style custom rollout, not just a generic neural net, but this piece of the AI is modeling. This thing is trained on this kind of narrow data. None of this thing is being trained on general internet text. Then you can go further with the system before it kills everyone.

And myself or Paul Christiano or maybe Shane Legg or. I'm not as sure about Shane Legg, but I would trust myself or Paul Christiano to look over the system and what it could do and, and develop a test suite for what sort of side capabilities it might be developing. Whether it was being pushed so far that it had started, started having internal preferences pointing anywhere other than the explicit overall purpose of the system. Figuring out what to look for early on. Is this thing starting to develop general intelligence in order to solve its problems?

And having, building a very conservative safety scheme, I can't actually say with a straight face, there's no way I could ever do that. It's beyond human ability. Maybe I could do that. Maybe Paul Christiano could do that. Sam Altman, I do not trust to do that. Dario Amodei. I do not trust to do that.

So if the United Nations is, well, not the United Nations. If the Butler Coalition of nuclear powers is, going to carve out an exception to building new AI capabilities around specialized biological systems. This is now a question of did they try trust the right people or are are, are we all dead?

Demis Hassabis is allowed to do Demis Hassabis things to build a system like this, but Sam Altman and Dario Amodei are not allowed to do their things, to build systems like this according to Yudkowsky, if you believe that Yudkowsky guy and Shane Legg is like, I could probably do that. And Eliezer Yudkowsky is like, well, Yeah. maybe, can I, can I check your work just to see if it looks insane to me? And how's this coalition, nuclear powers deciding to believe what I believe about what Shane Legg believes.

It's a much thornier. Kind the notion that you're going to carve out an exception for doing these efficiently advanced biotech. I cannot say that this is known to me to be beyond human ability, but it does sure seem to me like a thorny or sociopolitical problem. And I would consider, to be overwhelmingly reasonable, the political leaders to say, we're not gonna dance this close to the cliff as possible. Just back the hell off.

Liron 02:05:05
Yeah. No, I agree. I mean, but it is at least the kind of discourse that we would have as a serious civilization as opposed to just not having it at all, which is where we are now.

I wanna quickly review, I heard you say, in other venues, your tentative proposal for internationally coordinated monitoring of AI. One way it could work as opposed to the status quo is, just to summarize something like, first world leaders get the fear of something in them, and they say, let's put petty grievances aside because hey, it's like preventing global thermonuclear war. At the end of the day, we are actually going to go up and smoke here.

So let's try to do something grown up here. We stand ready to do arms control agreements about AI. This isn't about a petty conflict of trying to get one over on the other nations. We gotta go bigger than that Nvidia and companies like that can only sell chips into a limited number of internationally monitored training data centers. Many countries post observers to the data centers rules about which kind of training you're supposed to be running. They have to be obeyed. Every job you run gets logged. Evading restrictions is considered a big deal.

You've stirred up some controversy when you mentioned that, when you have these kind of international restrictions and they get evaded, you do have to escalate to things like airstrikes in order to maintain order. So have I roughly summarized the kind of proposals you've floated?

Eliezer 02:06:41
You're not, if the flaw in the plan is that North Korea can collect a hundred thousand GPUs and then try to, go for a military advantage over all the other countries of the world and hope that they, that pushing AI doesn't that far, doesn't get to the super intelligence level, that kills everyone. Or even just open, North Korea goes off into his own private fantasy about controllable super intelligence and, manages to, sneak away a hundred thousand GPUs. And then from there the answer is like, oh, well, North Korea has atomic bombs, so, we can't actually threaten them. So I guess they get to, take over the world only. We're not actually gonna do that. We're gonna restart our, all, all our own projects.

If that's the answer, then why bother? But it's not the answer. The answer is you're like, sorry. North Korea we stand in terror of our lives and the lives of our children. We're dropping a bunker buster on your GPU cluster.

Liron 02:07:31
Yeah, exactly. We have to be serious about enforcing, assuming that this agreement gets through, it can't be easily

Eliezer 02:07:38
If there's no answer to what do you do about a rogue state, then the entire policy is unworkable. And that is why I did feel the need to mention A premise here that, to take this seriously means being, being willing to go to, escalate to a, to clearly communicate in advance. You will use a conventional strike to prevent a non allied data center from endangering the world and then actually do that if they ignore your diplomatic communication.

Liron 02:08:06
Of course now this kind of proposal, you floated it out as something that you can positively imagine grownup society finally getting scared enough and treating this with enough urgency to hurry up and do it right. We

Eliezer 02:08:18
There there are no grownups, but it is a sort of thing that, even kids can follow.

Liron 02:08:25
Fair enough, fair enough. And I also wanna add the caveat that you've mentioned, which is even this solution is really designed for people who aren't as convinced as you and me about this whole imminent super criticality threat, right? This is all, this is kind of a halfway solution.

Eliezer 02:08:41
I mean, if they actually believed everything, I believed it would be like, no more GPUs. Why risk it?

Liron 02:08:49
Right? So there's kind of two options on the menu here. Like the first option is like, oh, you agree with Eliezer and Liron of this is such an urgent risk. We better, better safe than sorry right now, no further GPU versions. Like stuff like that. But then there's all these people who are still being reasonable and they're like, okay, I'm not a hundred percent convinced by Eliezer, but shouldn't we be ready to pause? Shouldn't we have our finger on the button watching very closely? And you're catering

Eliezer 02:09:09
Yeah. if you think that GPUs are going to erupt in unstoppable horror that kills everybody on earth with sufficient probability. Then what you probably want to do about that is not run those GPUs.

Now, if your probability on the unstoppable horror erupting is so low that you are not at the point where you're like, well, what the planet should do here is just not run this stuff anymore, then you might build the off switch. You might put all the GPUs into data centers and monitor them, but still be somewhat more liberal about what sort of jobs people were allowed to run as long as there was a central lever to pull on it.

But that's what you do. If. The weight in your mind of like, scientists disagree about whether an unstoppable horror will erupt out of these things and kill everyone on earth. Do I really wanna take on the great political inconvenience of reigning in these AI companies. If these things are balancing and you'll feel bad if you go all the way to one or the other, then maybe you could just build the off switch.

But I'm not backing down from the real ask prematurely. Because that's a stupid way to die.

Liron 02:10:33
I think we can agree that whatever we do here to try to solve the dilemma that we're in at this point, the scale has to be large. Like it's not time. This is why you're doing the book launch, right? You've dedicated a lot of your time, right. Your focus for the last two knows how many months or years you're dedicating your focus, because it's not time for a oh, best effort or it's not time for a nudge solution. It's time to do this kind of solution at a large scale. Correct?

Eliezer 02:11:01
I mean, to be clear, the book is advocating for something large scale. The book itself is not a large scale act. All of our lives like my life at life of everybody who's worked with me. All of that put together is nothing. Your planet has invested nothing into this. And that's part of why it would be pretty silly to expect a good outcome from it. And the book is still nothing, but maybe it causes something.

Liron 02:11:28
Right. And okay. And in contrast to that, it's really no time to, to people who are like, we'll do our best to work on alignment. While capability keeps growing, which, I'm sorry to say, that is the position of all these people who are going to the AI companies and they're like, oh, I'm helping, I'm nudging from the inside. I'm doing the best I can. Sorry. But there is a scale mismatch, right? That kind of action, doing that kind of action today, it's too late for that.

Eliezer 02:11:52
There are. Yeah. Well, yeah, there's cultural filters that prohibit certain kinds of understanding in various subcultures. And in this case, the understanding that we are making so much progress, okay, you've made this much progress and you need to go this much progress. And, to not walk around with a visceral sense of that and instead be like, so enchanted by how you're making. Progress is one of the qualities that we'll select for being the kind of AI safety person who passes the cultural filter at an AI company that and gets hired and not fired.

Liron 02:12:23
Exactly. And this is what we were getting at before saying like, whose job is it to notice what you said? Right. Small unit of progress versus big problem. Whose job is it to notice that those two lines of progress aren't going to cross, the line of bigger, bigger problem, slightly bigger, slightly bigger solution. Those lines aren't going to cross. Nobody at your company has the job of pointing that out. And to the degree that you are doing your little research, making a tiny bit of progress, you're tractability washing the problem, you're living a life in the frame as if it's tractable.

Eliezer 02:12:54
I mean, I'm in favor of people continuing to do their little bits of interpretability work and even publishing the part that seems to clearly not contribute to capabilities, but I'm not in favor of them losing sight of the big picture or allowing it by or by an action, allowing it to be lost. Yeah. They should be constantly saying like, we are, we have made 0.1% progress, 100% progress remains. We are not making it fast enough. We are actually regressing. Because the new AI architectures are getting complicated faster than we are understanding them.

And humanity continues to be on track to destroy itself in an international treaty. Shutting down all companies, including my company as urgently needed, as long as you go on. Saying that, I'm in favor of your continuing to do the science research part.

Liron 02:13:35
Right, right, right. No, totally, gets to what I was saying about, Ilya Sutskever now his company is called Safe Super Intelligence. So I like what I hear, safe super intelligence. In theory that sounds good on paper, but if he's so concerned about safety, why is he tractability washing? Why isn't he explicitly speaking out on how bad it is? All these other labs never even mind his own effort. All these other companies are not building safe super intelligence. That's why he felt the need to go and do it on his own. So why not speak out? He's tractability washing.

Eliezer 02:14:07
Or may, or by way of reassuring some of us, why not publish a list of the core difficulties that he thinks he can overcome, even if he doesn't want to publish his solutions? Because there are capability entangled, right? You say you're gonna build a perpetual motion machine. Why don't you Publish a document that's giving a list of the standard reasons why perpetual motion is hard, even if you don't want to reveal your secret sauce for building perpetual motion.

My guess is that Ilya probably doesn't know the principles either, but who knows? It's not like there's any organized body that causes people to indicate whether they know things.

Liron 02:14:48
Exactly. And, and finally I want to single out Anthropic for special mention because on this theme of tractability washing, so Anthropic has this reputation of attracting the most researchers who truly, deeply care. They recognize that super intelligence, existential risk is a major problem. They

Eliezer 02:15:09
Well, well truly, deeply care within a very particular sort of subculture that emerged inside effective altruism as a reaction to MIRI threatening the their existing rationales for dividing up their funding the way they did is kind of a very brief gloss on a much longer story. But these are not generic people who are worried. These are people who are worried in a very particular sort of way that was culturally transmitted across among this one subgroup that saw themselves as conceptually opposed to MIRI 10 years ago type of thing.

Liron 02:15:38
Okay. Okay. I, but you also dunno, they've done recruiting drives, right? They've got thousands of people, so you don't know if there's been a shift.

Eliezer 02:15:45
Well, yeah, but also, you're not gonna wanna hire somebody who conflicts with your current corporate culture by just thinking by just straight up being able to list out the list of lethalities and being like, yeah, that sounds right to

Liron 02:16:00
Mm-hmm.

Eliezer 02:16:01
corporate culture.

Liron 02:16:03
Right. Okay. Well, whatever the reason is, they make a lot of noise, right? They do. And I think it's authentic to their feelings, right? I think they're being emotionally authentic when they tell you that they're worried about risk. I was actually personally doing a protest a few months ago, outside, in front of one of the Anthropic offices.

And at the time, nobody came out and talked to us or anything, right? They ignored it. But then later I heard through the grapevine that, they did some soul searching after the protest. And the protest was just me and Holly Elmore just yelling at them please to quit Like, this isn't helping. You're tractability washing, that kinda stuff. And they did some soul searching, but at the end of the day, what do they continue to do? They're, in some sense, they're the best of the worst, but then they're legitimizing the whole enterprise of being the worst.

Eliezer 02:17:05
I mean, I don't know. I find it difficult for myself to get worked about it. From my perspective, these people are NPCs. They cannot be dissuaded from their course. Of course they have to exist, of course. They have to be running around doing the things that they're doing.

Liron 02:17:21
Yeah. You're saying non-player characters in a video game.

Eliezer 02:17:24
Yeah. They have made their mistakes. They're committed to them. They are the people who made those mistakes were swept up to form a company. The company is off doing the thing. And maybe I'm, being deceived in some sense by the implicit priors of all the video games I've played as a kid, where you can never go up to the NPCs and talk them out of doing what they're doing. It's not the premise of the game that that's how you win the game.

I don't know, maybe in real life, all I need to do is go down to Anthropic and have a debate at their headquarters and everybody's like, oh my God, what have we done? But then that doesn't save the world in any particular way.

Liron 02:18:06
Yeah. I mean, if you listen to Dario, the CEO right? He talks about Anthropic is helping because they're, they're promoting a race to the top. So they're going to set an example of doing good things, and that'll attract talent, who wants to behave better, and then other companies will

Eliezer 02:18:18
Well, cool, if a race to the top, you can get this much closer to the top. Your actual distance is here.

Liron 02:18:26
Exactly. I think that's just the fundamental problem is just that, the lack of, that sense of perspective, like when you come to terms with, as you keep pointing out how hard the problem is, how many layers there are to this problem, how little time there is to solve it, you just get overwhelmed by the perspective, and the correct reaction isn't, well, let's go forward and do our best. Leroy Jenkins, it's like, okay, well let's step back and.

Eliezer 02:18:48
Right? I wouldn't speak in favor of being overwhelmed by the perspective, but apprehending, it does tend to negate the arguments and strategies that depend on, not apprehending it, like the stuff about, like, let's get closer to top. Okay. If you know, you're like, if you, if you can see that it's like this much closer to top and the distance to covers that much, then you, you, they're not overwhelmed. You just know that ain't gonna work.

Call to Action

Liron 02:19:12
All right. Well, this is the launch day of your book. If anyone builds it, everyone dies. A very exciting day. What's the ideal future you can imagine playing out, starting now with your book becoming a bestseller?

Eliezer 02:19:26
It's read by Trump Vance, Xi Jinping, Vladimir Putin, whoever is running the UK at the time of book launch and. They're all like, yeah, we'd rather not die.

And back channel everybody's like, so we'd be open to not dying if you were open to not dying. And then the Great Worldwide Treaty gets announced.

Liron 02:19:54
And also maybe there's a grassroots dimension to it too, right? Like protests everywhere.

Eliezer 02:20:00
Sure. You're asking me for a best case scenario. And I imagine, and, and in my imagination, the best case scenario is not the scenario with the most fuss around me and my ideas, but the scenario in which they're implemented most quickly and with the least fuss.

But sure, earth is probably also safer if, people are, buying copies of the book and handing them around to their next door neighbors and just summarizing them to their next door neighbors. And there's at the same time as the international treaty, the AI companies try to push back a little and then there's massive bipartisan protest marches outside their headquarters that, that probably makes a point. And Earth is safer for that point being made. If we're just continuing to spin out the perfect scenarios.

Liron 02:20:45
Yeah. You know, I'm doing my part to try to help the problem and my own mechanism of impact that I do think is high leverage is, I do think speaking out to the grassroots, having lots of people buy the book, lots of people talk to their friends and family and neighbors about it. I think that's all very important, moving the overton window so that, that becomes a hot topic of discussion, because I just think it's a really tall ask for politicians to go off and do something without feeling like their constituency is right there with them.

Eliezer 02:21:11
Yeah, so in the less than perfectly optimal scenario, people speaking and perhaps demonstrating, we, we, as we always say, we expect these, demonstrations to be more impactful if they are large and lawful is our slogan.

If people speak out individually in surveys and letters to make it clear to their elected representatives that they have their back. Then in the case where this isn't all immediately settled a month later by the book being published, many people may have their individual parts to do.

It's, it may be a strange sort of thing to emphasize, but I do feel like a lot of protest movements have a failure mode in which they're all about the protesting and the fun of protesting and tell people that it's so very meaningful that, you are holding this opinion and doing things. And if they actually had large scale goals that they could visibly succeed or fail at, well, that might not be such a great thing for the protest. Because then you might visibly fail and people would look at the protest leaders and be like, you terrible leaders. You have led us to failure.

There's a kind of self-absorption where the important thing is, is is shouting real loud. And that's one reason why I'm like, well, in the real ideal scenario. You wouldn't even have to show up, and that's not a very likely ideal scenario. But the thing here is, With no prejudice to the non-parents in the audience, for your kids to be safe, for your neighbors, kids to be safe. If you want your neighbor's kids to be safe, what you want is not the satisfaction of experiencing a protest. What you would ideally like is everything to be resolved very quickly and before your kids are in danger. Or in any more danger.

And holding to that sense is, I think, an important thing to. Keep in mind and have, be part of the tone. And, and if you, even if you're marching, it's not about the march, you'd like that march to have not never been necessary. You would've liked your kids to have never been in enough trouble that you would've needed to march. And there's so much performative political belief these days, and I think people notice it. And I think the politicians smell it. And if this is performative, it will get only performative responses. And then your neighbor's kids will be dead.

So this is the kind of heroism that we do not want to see in the world, that we do not want to be necessary, that we do not want to indulge in, that we do not want to be fun. If we are there, it is because it is necessary and we would like it to stop being necessary by the swiftest route.

Let that be our slogan.

Liron 02:23:38
Let's try to go beyond the performative, try to take actions that'll actually do some leadership to prevent this giant whale of a problem that seems to be coming at us, very quickly, let's all go pick up a copy of if anyone builds it, everyone dies. I think that'll be helpful to get that spread around.

I think many of the viewers, hopefully, most, are all deeply care. As much as I do realize that what you're saying is very important, wanna support, you, wanna support this mission.

Eliezer Yudkowsky, it's been an honor and a pleasure to have this discussion. Thanks for coming.

Eliezer 02:24:09
You're welcome. And remember, success is still being alive in five years, and no matter how high this book gets on the bestseller list, at the end of that, we all die. It means the book failed.

There's only one criterion for success of this book, and it is not the sales numbers, but the sales probably still help anyways. It's hopefully a good book thanks for letting me show it on.

Liron 02:24:34
No problem. I'll be taking it from here, shilling it in my other videos.

That was the one and only Eliezer Yudkowsky. I really hope you'll take a look at his book: If Anyone Builds It, Everyone Dies. It's a really tight version of the argument that's been going around for over two decades. It's definitely worth your time and you'll be supporting a really worthwhile cause to try to mitigate the risk, whether you think it's 1%, 5%, or like me, 50%. It helps to get more eyes on the problem, get more awareness, because we just don't have much time.

Now if you enjoyed that interview and you want to see more Liron Shapira, that's me, I have another show. I mentioned it on the program, and the theme of the show is that I debate different luminaries, different thought leaders, some of the smartest minds in the world to see where they stand on AI extinction.

Why do I think AI extinction is this huge urgent threat and they think it's much milder? What's up with that? I was recently fortunate enough to debate Gary Marcus on that topic. I debated Vitalik Buterin, and I've also had some really fun interviews with people like Rob Miles, Mike Israetel, wide variety of people that I'm having this conversation with.

I think it's a very important urgent conversation. I also speak with senior employees from some of the top AI companies, and these are hard hitting interviews focused on the question of; Isn't superintelligent AI likely to kill everyone as soon as it gets too powerful for humans to alter its course?

So I hope you'll join me on my other YouTube channel, which is also a podcast you can subscribe to. My goal with all these interviews, debates, and other events is to facilitate high quality discourse on the most important issue of our time. Thanks for watching, and I'll see you in the next one.