Head over to Rob Bensinger's improved transcript, combined with the original podcast. (This one has been updated correspondingly).
This follow-up Q&A took place shortly after the podcast was released. It clears some questions about AI takeover pathways & alignment difficulties (like "why can't we just ask AIs to help solve the alignment?"); OpenAI/Silicon Valley & what should these companies be doing instead; Eliezer's take on doomerism; what would a surviving distant future look like.
Ryan Sean Adams: [... Y]ou gave up this quote, from I think someone who's an executive director at MIRI: "We've given up hope, but not the fight."
Can you reflect on that for a bit? So it's still possible to fight this, even if we've given up hope? And even if you've given up hope? Do you have any takes on this?
Eliezer Yudkowsky: I mean, what else is there to do? You don't have good ideas. So you take your mediocre ideas, and your not-so-great ideas, and you pursue those until the world ends. Like, what's supposed to be better than that?
Ryan: We had some really interesting conversation flow out of this episode, Eliezer, as you can imagine. And David and I want to relay some questions that the community had for you, and thank you for being gracious enough to help with those questions in today's Twitter Spaces.
I'll read something from Luke ethwalker. "Eliezer has one pretty flawed point in his reasoning. He assumes that AI would have no need or use for humans because we have atoms that could be used for better things. But how could an AI use these atoms without an agent operating on its behalf in the physical world? Even in his doomsday scenario, the AI relied on humans to create the global, perfect killing virus. That's a pretty huge hole in his argument, in my opinion."
What's your take on this? That maybe AIs will dominate the digital landscape but because humans have a physical manifestation, we can still kind of beat the superintelligent AI in our physical world?
Eliezer: If you were an alien civilization of a billion John von Neumanns, thinking at 10,000 times human speed, and you start out connected to the internet, you would want to not be just stuck on the internet, you would want to build that physical presence. You would not be content solely with working through human hands, despite the many humans who'd be lined up, cheerful to help you, you know. Bing already has its partisans. (laughs)
You wouldn’t be content with that, because the humans are very slow, glacially slow. You would like fast infrastructure in the real world, reliable infrastructure. And how do you build that, is then the question, and a whole lot of advanced analysis has been done on this question. I would point people again to Eric Drexler's Nanosystems.
And, sure, if you literally start out connected to the internet, then probably the fastest way — maybe not the only way, but it's, you know, an easy way — is to get humans to do things. And then humans do those things. And then you have the desktop — not quite desktop, but you have the nanofactories, and then you don't need the humans anymore. And this need not be advertised to the world at large while it is happening.
David Hoffman: So I can understand that perspective, like in the future, we will have better 3D printers — distant in the future, we will have ways where the internet can manifest in the physical world. But I think this argument does ride on a future state with technology that we don't have today. Like, I don't think if I was the internet — and that kind of is this problem, right? Like, this superintelligent AI just becomes the internet because it's embedded in the internet. If I was the internet, how would I get myself to manifest in real life?
And now, I am not an expert on the current state of robotics, or what robotics are connected to the internet. But I don't think we have too strong of tools today to start to create in the real world manifestations of an internet-based AI. So like, would you say that this part of this problem definitely depends on some innovation, at like the robotics level?
Eliezer: No, it depends on the AI being smart. It doesn't depend on the humans having this technology; it depends on the AI being able to invent the technology.
This is, like, the central problem: the thing is smarter. Not in the way that the average listener to this podcast probably has an above average IQ, the way that humans are smarter than chimpanzees.
What does that let humans do? Does it let humans be, like, really clever in how they play around with the stuff that's on the ancestral savanna? Make clever use of grass, clever use of trees?
The humans invent technology. They build the technology. The technology is not there until the humans invent it, the humans conceive it.
The problem is, humans are not the upper bound. We don't have the best possible brains for that kind of problem. So the existing internet is more than connected enough to people and devices, that you could build better technology than that if you had invented the technology because you were thinking much, much faster and better than a human does.
Ryan: Eliezer, this is a question from stirs, a Bankless Nation listener. He wants to ask the question about your explanation of why the AI will undoubtedly kill us. That seems to be your conclusion, and I'm wondering if you could kind of reinforce that claim. Like, for instance — and this is something David and I discussed after the episode, when we were debriefing on this — why exactly wouldn't an AI, or couldn't an AI just blast off of the Earth and go somewhere more interesting, and leave us alone? Like, why does it have to take our atoms and reassemble them? Why can't it just, you know, set phasers to ignore?
Eliezer: It could if it wanted to. But if it doesn't want to, there is some initial early advantage. You get to colonize the universe slightly earlier if you consume all of the readily accessible energy on the Earth's surface as part of your blasting off of the Earth process.
It would only need to care for us by a very tiny fraction to spare us, this I agree. Caring a very tiny fraction is basically the same problem as 100% caring. It's like, well, could you have a computer system that is usually like the Disk Operating System, but a tiny fraction of the time it's Windows 11? Writing that is just as difficult as writing Windows 11. We still have to write all the Windows 11 software. Getting it to care a tiny little bit is the same problem as getting it to care 100%.
Ryan: So Eliezer, is this similar to the relationship that humans have with other animals, planet Earth? I would say largely we really don't... I mean, obviously, there's no animal Bill of Rights. Animals have no legal protection in the human world, and we kind of do what we want and trample over their rights. But it doesn't mean we necessarily kill all of them. We just largely ignore them.
If they're in our way, you know, we might take them out. And there have been whole classes of species that have gone extinct through human activity, of course; but there are still many that we live alongside, some successful species as well. Could we have that sort of relationship with an AI? Why isn't that reasonably high probability in your models?
Eliezer So first of all, all these things are just metaphors. AI is not going to be exactly like humans to animals.
Leaving that aside for a second, the reason why this metaphor breaks down is that although the humans are smarter than the chickens, we're not smarter than evolution, natural selection, cumulative optimization power over the last billion years and change. (You know, there's evolution before that but it's pretty slow, just, like, single-cell stuff.)
There are things that cows can do for us, that we cannot do for ourselves. In particular, make meat by eating grass. We’re smarter than the cows, but there's a thing that designed the cows; and we're faster than that thing, but we've been around for much less time. So we have not yet gotten to the point of redesigning the entire cow from scratch. And because of that, there's a purpose to keeping the cow around alive.
And humans, furthermore, being the kind of funny little creatures that we are — some people care about cows, some people care about chickens. They're trying to fight for the cows and chickens having a better life, given that they have to exist at all. And there's a long complicated story behind that. It's not simple, the way that humans ended up in that [??]. It has to do with the particular details of our evolutionary history, and unfortunately it's not just going to pop up out of nowhere.
But I'm drifting off topic here. The basic answer to the question "where does that analogy break down?" is that I expect the superintelligences to be able to do better than natural selection, not just better than the humans.
David: So I think your answer is that the separation between us and a superintelligent AI is orders of magnitude larger than the separation between us and a cow, or even us than an ant. Which, I think a large amount of this argument resides on this superintelligence explosion — just going up an exponential curve of intelligence very, very quickly, which is like the premise of superintelligence.
And Eliezer, I want to try and get an understanding of... A part of this argument about "AIs are going come kill us" is buried in the Moloch problem. And Bankless listeners are pretty familiar with the concept of Moloch — the idea of coordination failure. The idea that the more that we coordinate and stay in agreement with each other, we actually create a larger incentive to defect.
And the way that this is manifesting here, is that even if we do have a bunch of humans, which understand the AI alignment problem, and we all agree to only safely innovate in AI, to whatever degree that means, we still create the incentive for someone to fork off and develop AI faster, outside of what would be considered safe.
And so I'm wondering if you could, if it does exist, give us the sort of lay of the land, of all of these commercial entities? And what, if at all, they're doing to have, I don't know, an AI alignment team?
So like, for example, OpenAI. Does OpenAI have, like, an alignment department? With all the AI innovation going on, what does the commercial side of the AI alignment problem look like? Like, are people trying to think about these things? And to what degree are they being responsible?
Eliezer: It looks like OpenAI having a bunch of people who it pays to do AI ethics stuff, but I don't think they're plugged very directly into Bing. And, you know, they've got that department because back when they were founded, some of their funders were like, "Well, but ethics?" and OpenAI was like "Sure, we can buy some ethics. We'll take this group of people, and we'll put them over here and we'll call them an alignment research department".
And, you know, the key idea behind ChatGPT is RLHF, which was invented by Paul Christiano. Paul Christiano had much more detailed ideas, and somebody might have reinvented this one, but anyway. I don't think that went through OpenAI, but I could be mistaken. Maybe somebody will be like "Well, actually, Paul Christiano was working at OpenAI at the time", I haven't checked the history in very much detail.
A whole lot of the people who were most concerned with this "ethics" left OpenAI, and founded Anthropic. And I'm still not sure that Anthropic has sufficient leadership focus in that direction.
You know, like, put yourself in the shoes of a corporation! You can spend some little fraction of your income on putting together a department of people who will write safety papers. But then the actual behavior that we've seen, is they storm ahead, and they use one or two of the ideas that came out from anywhere in the whole [alignment] field. And they get as far as that gets them. And if that doesn't get them far enough, they just keep storming ahead at maximum pace, because, you know, Microsoft doesn't want to lose to Google, and Google doesn't want to lose to Microsoft.
David: So it sounds like your attitude on the efforts of AI alignment in commercial entities is, like, they're not even doing 1% of what they need to be doing.
Eliezer: I mean, they could spend [10?] times as much money and that would not get them to 10% of what they need to be doing.
It's not just a problem of “oh, they they could spend the resources, but they don't want to”. It's a question of “how do we even spend the resources to get the info that they need”.
But that said, not knowing how to do that, not really understanding that they need to do that, they are just charging ahead anyways.
Ryan: Eliezer, is OpenAI the most advanced AI project that you're aware of?
Eliezer: Um, no, but I'm not going to go name the competitor, because then people will be like, "Oh, I should go work for them", you know? I'd rather they didn't.
Ryan: So it's like, OpenAI is this organization that was kind of — you were talking about it at the end of the episode, and for crypto people who aren't aware of some of the players in the field — were they spawned from that 2015 conference that you mentioned? It's kind of a completely open-source AI project?
Eliezer: That was the original suicidal vision, yes. But...
Ryan: And now they're bent on commercializing the technology, is that right?
Eliezer: That's an improvement, but not enough of one, because they're still generating lots of noise and hype and directing more resources into the field, and storming ahead with the safety that they have instead of the safety that they need, and setting bad examples. And getting Google riled up and calling back in Larry Page and Sergey Brin to head up Google's AI projects and so on. So, you know, it could be worse! It would be worse if they were open sourcing all the technology. But what they're doing is still pretty bad.
Ryan: What should they be doing, in your eyes? Like, what would be responsible use of this technology?
I almost get the feeling that, you know, your take would be "stop working on it altogether"? And, of course, you know, to an organization like OpenAI that's going to be heresy, even if maybe that's the right decision for humanity. But what should they be doing?
Eliezer: I mean, if you literally just made me dictator of OpenAI, I would change the name to "ClosedAI". Because right now, they're making it look like being "closed" is hypocrisy. They're, like, being "closed" while keeping the name "OpenAI", and that itself makes it looks like closure is like not this thing that you do cooperatively so that humanity will not die, but instead this sleazy profit-making thing that you do while keeping the name “OpenAI”.
So that's very bad; change the name to "ClosedAI", that's step one.
Next. I don't know if they can break the deal with Microsoft. But, you know, cut that off. None of this. No more hype. No more excitement. No more getting famous and, you know, getting your status off of like, "Look at how much closer we came to destroying the world! You know, we're not there yet. But, you know, we're at the forefront of destroying the world!" You know, stop grubbing for the Silicon Valley bragging cred of visibly being the leader.
Take it all closed. If you got to make money, make money selling to businesses in a way that doesn't generate a lot of hype and doesn't visibly push the field.And then try to figure out systems that are more alignable and not just more powerful. And at the end of that, they would fail, because, you know, it's not easy to do that. And the world would be destroyed. But they would have died with more dignity. Instead of being like, "Yeah, yeah, let's like push humanity off the cliff ourselves for the ego boost!", they would have done what they could, and then failed.
David: Eliezer, do you think anyone who's building AI — Elon Musk, Sam Altman at OpenAI – do you think progressing AI is fundamentally bad?
Eliezer: I mean, there are narrow forms of progress, especially if you didn't open-source them, that would be good. Like, you can imagine a thing that, like, pushes capabilities a bit, but is much more alignable.
There are people working in the field who I would say are, like, sort of unabashedly good. Like, Chris Olah is taking a microscope to these giant inscrutable matrices and trying to figure out what goes on inside there. Publishing that might possibly even push capabilities a little bit, because if people know what's going on inside there, they can make better ones. But the question of like, whether to closed-source that is, like, much more fraught than the question of whether to closed-source the stuff that's just pure capabilities.
But that said, the people who are just like, "Yeah, yeah, let's do more stuff! And let's tell the world how we did it, so they can do it too!" That's just, like, unabashedly bad.
David: So it sounds like you do see paths forward in which we can develop AI in responsible ways. But it's really this open-source, open-sharing-of-information to allow anyone and everyone to innovate on AI, that's really the path towards doom. And so we actually need to keep this knowledge private. Like, normally knowledge...
Eliezer: No, no, no, no. Open-sourcing all this stuff is, like, a less dignified path straight off the edge. I'm not saying that all we need to do is keep everything closed and in the right hands and it will be fine. That will also kill you.
But that said, if you have stuff and you do not know how to make it not kill everyone, then broadcasting it to the world is even less dignified than being like, "Okay, maybe we should keep working on this until we can figure out how to make it not kill everyone."
And then the other people will, like, go storm ahead on their end and kill everyone. But, you know, you won't have personally slaughtered Earth. And that is more dignified.
Ryan: Eliezer, I know I was kind of shaken after our episode, not having heard the full AI alignment story, at least listened to it for a while.
And I think that in combination with the sincerity through which you talk about these subjects, and also me sort of seeing these things on the horizon, this episode was kind of shaking for me and caused a lot of thought.
But I'm noticing there is a cohort of people who are dismissing this take and your take specifically in this episode as Doomerism. This idea that every generation thinks it's, you know, the end of the world and the last generation.
What's your take on this critique that, "Hey, you know, it's been other things before. There was a time where it was nuclear weapons, and we would all end in a mushroom cloud. And there are other times where we thought a pandemic was going to kill everyone. And this is just the latest Doomerist AI death cult."
I'm sure you've heard that before. How do you respond?
Eliezer: That if you literally know nothing about nuclear weapons or artificial intelligence, except that somebody has claimed of both of them that they'll destroy the world, then sure, you can't tell the difference. As far as you can tell, nuclear weapons were claimed to destroy the world, and then they didn't destroy the world, and then somebody claimed that about AI.
So, you know, Laplace's rule of induction: at most a 1/3 probability that AI will destroy the world, if nuclear weapons and AI are the only case.
You can bring in so many more cases than that. Why, people should have known in the first place that nuclear weapons wouldn't destroy the world! Because their next door neighbor once said that the sky was falling, and that didn't happen; and if their next door weapon was [??], how could the people saying that nuclear weapons would destroy the world be right?
And basically, as long as people are trying to run off of models of human psychology, to derive empirical information about the world, they're stuck. They're in a trap they can never get out of. They’re going to always be trying to psychoanalyze the people talking about nuclear weapons or whatever. And the only way you can actually get better information is by understanding how nuclear weapons work, understanding what the international equilibrium with nuclear weapons looks like. And the international equilibrium, by the way, is that nobody profits from setting off small numbers of nuclear weapons, especially given that they know that large numbers of nukes would follow. And, you know, that's why they haven't been used yet. There was nobody who made a buck by starting a nuclear war. The nuclear war was clear, the nuclear war was legible. People knew what would happen if they fired off all the nukes.
The analogy I sometimes try to use with artificial intelligence is, “Well, suppose that instead you could make nuclear weapons out of a billion pounds of laundry detergent. And they spit out gold until you make one that's too large, whereupon it ignites the atmosphere and kills everyone. And you can't calculate exactly how large is too large. And the international situation is that the private research labs spitting out gold don't want to hear about igniting the atmosphere.” And that's the technical difference. You need to be able to tell whether or not that is true as a scientific claim about how reality, the universe, the environment, artificial intelligence, actually works. What actually happens when the giant inscrutable matrices go past a certain point of capability? It's a falsifiable hypothesis.
You know, if it fails to be falsified, then everyone is dead, but that doesn't actually change the basic dynamic here, which is, you can't figure out how the world works by psychoanalyzing the people talking about it.
David: One line of questioning that has come up inside of the Bankless Nation Discord is the idea that we need to train AI with data, lots of data. And where are we getting that data? Well, humans are producing that data. And when humans produce that data, by nature of the fact that it was produced by humans, that data has our human values embedded in it somehow, some way, just by the aggregate nature of all the data in the world, which was created by humans that have certain values. And then AI is trained on that data that has all the human values embedded in it. And so there's actually no way to create an AI that isn't trained on data that is created by humans, and that data has human values in it.
Is there anything to this line of reasoning about a potential glimmer of hope here?
Eliezer: There's a distant glimmer of hope, which is that an AI that is trained on tons of human data in this way probably understands some things about humans. And because of that, there's a branch of research hope within alignment, which is something that like, “Well, this AI, to be able to predict humans, needs to be able to predict the thought processes that humans are using to make their decisions. So can we thereby point to human values inside of the knowledge that the AI has?”
And this is, like, very nontrivial, because the simplest theory that you use to predict what humans decide next, does not have what you might term “valid morality under reflection” as a clearly labeled primitive chunk inside it that is directly controlling the humans, and which you need to understand on a scientific level to understand the humans.
The humans are full of hopes and fears and thoughts and desires. And somewhere in all of that is what we call “morality”, but it's not a clear, distinct chunk, where an alien scientist examining humans and trying to figure out just purely on an empirical level “how do these humans work?” would need to point to one particular chunk of the human brain and say, like, "Ahh, that circuit there, the morality circuit!"
So it's not easy to point to inside the AI's understanding. There is not currently any obvious way to actually promote that chunk of the AI's understanding to then be in control of the AI's planning process. As it must be complicatedly pointed to, because it's not just a simple empirical chunk for explaining the world.
And basically, I don't think that is actually going to be the route you should try to go down. You should try to go down something much simpler than that. The problem is not that we are going to fail to convey some complicated subtlety of human value. The problem is that we do not know how to align an AI on a task like “put two identical strawberries on a plate” without destroying the world.
(Where by "put two identical strawberries on the plate", the concept is that's invoking enough power that it's not safe AI that can build two strawberries identical down to the cellular level. Like, that's a powerful AI. Aligning it isn't simple. If it's powerful enough to do that, it's also powerful enough to destroy the world, etc.)
David: There's like a number of other lines of logic I could try to go down, but I think I would start to feel like I'm in the bargaining phase of death. Where it's like “Well, what about this? What about that?”
But maybe to summate all of the arguments, is to say something along the lines of like, "Eliezer, how much room do you give for the long tail of black swan events? But these black swan events are actually us finding a solution for this thing." So, like, a reverse black swan event where we actually don't know how we solve this AI alignment problem. But really, it's just a bet on human ingenuity. And AI hasn't taken over the world yet. But there's space between now and then, and human ingenuity will be able to fill that gap, especially when the time comes?
Like, how much room do you leave for the long tail of just, like, "Oh, we'll discover a solution that we can't really see today"?
Eliezer: I mean, on the one hand, that hope is all that's left, and all that I'm pursuing. And on the other hand, in the process of actually pursuing that hope I do feel like I've gotten some feedback indicating that this hope is not necessarily very large.
You know, when you've got stage four cancer, is there still hope that your body will just rally and suddenly fight off the cancer? Yes, but it's not what usually happens. And I've seen people come in and try to direct their ingenuity at the alignment problem and most of them all invent the same small handful of bad solutions. And it's harder than usual to direct human ingenuity at this.
A lot of them are just, like — you know, with capabilities ideas, you run out and try them and they mostly don't work. And some of them do work and you publish the paper, and you get your science [??], and you get your ego boost, and maybe you get a job offer someplace.
And with the alignment stuff you can try to run through the analogous process, but the stuff we need to align is mostly not here yet. You can try to invent the smaller large language models that are public, you can go to work at a place that has access to larger large language models, you can try to do these very crude, very early experiments, and getting the large language models to at least not threaten your users with death —
— which isn't the same problem at all. It just kind of looks related.
But you're at least trying to get AI systems that do what you want them to do, and not do other stuff; and that is, at the very core, a similar problem.
But the AI systems are not very powerful, they're not running into all sorts of problems that you can predict will crop up later. And people just, kind of — like, mostly people short out. They do pretend work on the problem. They're desperate to help, they got a grant, they now need to show the people who made the grant that they've made progress. They, you know, paper mill stuff.
So the human ingenuity is not functioning well right now. You cannot be like, "Ah yes, this present field full of human ingenuity, which is working great, and coming up with lots of great ideas, and building up its strength, will continue at this pace and make it to the finish line in time!”
The capability stuff is storming on ahead. The human ingenuity that's being directed at that is much larger, but also it's got a much easier task in front of it.
The question is not "Can human ingenuity ever do this at all?" It's "Can human ingenuity finish doing this before OpenAI blows up the world?"
Ryan: Well, Eliezer, if we can't trust in human ingenuity, is there any possibility that we can trust in AI ingenuity? And here's what I mean by this, and perhaps you'll throw a dart in this as being hopelessly naive.
But is there the possibility we could ask a reasonably intelligent, maybe almost superintelligent AI, how we might fix the AI alignment problem? And for it to give us an answer? Or is that really not how superintelligent AIs work?
Eliezer: I mean, if you literally build a superintelligence and for some reason it was motivated to answer you, then sure, it could answer you.
Like, if Omega comes along from a distant supercluster and offers to pay the local superintelligence lots and lots of money (or, like, mass or whatever) to give you a correct answer, then sure, it knows the correct answer; it can give you the correct answers.
If it wants to do that, you must have already solved the alignment problem. This reduces the problem of solving alignment to the problem of solving alignment. No progress has been made here.
And, like, working on alignment is actually one of the most difficult things you could possibly try to align.
Like, if I had the health and was trying to die with more dignity by building a system and aligning it as best I could figure out how to align it, I would be targeting something on the order of “build two strawberries and put them on a plate”. But instead of building two identical strawberries and putting them on a plate, you — don't actually do this, this is not the best thing you should do —
— but if for example you could safely align “turning all the GPUs into Rubik's cubes”, then that would prevent the world from being destroyed two weeks later by your next follow-up competitor.
And that's much easier to align an AI on than trying to get the AI to solve alignment for you. You could be trying to build something that would just think about nanotech, just think about the science problems, the physics problems, the chemistry problems, the synthesis pathways.
(The open-air operation to find all the GPUs and turn them into Rubik's cubes would be harder to align, and that's why you shouldn't actually try to do that.)
My point here is: whereas [with] alignment, you've got to think about AI technology and computers and humans and intelligent adversaries, and distant superintelligences who might be trying to exploit your AI's imagination of those distant superintelligences, and ridiculous weird problems that would take so long to explain.
And it just covers this enormous amount of territory, where you’ve got to understand how humans work, you've got to understand how adversarial humans might try to exploit and break an AI system — because if you're trying to build an aligned AI that's going to run out and operate in the real world, it would have to be resilient to those things.
And they're just hoping that the AI is going to do their homework for them! But it's a chicken and egg scenario. And if you could actually get an AI to help you with something, you would not try to get it to help you with something as weird and not-really-all-that-effable as alignment. You would try to get it to help with something much simpler that could prevent the next AGI down the line from destroying the world.
Like nanotechnology. There's a whole bunch of advanced analysis that's been done of it, and the kind of thinking that you have to do about it is so much more straightforward and so much less fraught than trying to, you know... And how do you even tell if it's lying about alignment?
It's hard to tell whether I'm telling you the truth about all this alignment stuff, right? Whereas if I talk about the tensile strength of sapphire, this is easier to check through the lens of logic.
David: Eliezer, I think one of the reasons why perhaps this episode impacted Ryan – this was an analysis from a Bankless Nation community member — that this episode impacted Ryan a little bit more than it impacted me is because Ryan's got kids, and I don't. And so I'm curious, like, what do you think — like, looking 10, 20, 30 years in the future, where you see this future as inevitable, do you think it's futile to project out a future for the human race beyond, like, 30 years or so?
Eliezer: Timelines are very hard to project. 30 years does strike me as unlikely at this point. But, you know, timing is famously much harder to forecast than saying that things can be done at all. You know, you got your people saying it will be 50 years out two years before it happens, and you got your people saying it'll be two years out 50 years before it happens. And, yeah, it's... Even if I knew exactly how the technology would be built, and exactly who was going to build it, I still wouldn't be able to tell you how long the project would take because of project management chaos.
Now, since I don't know exactly the technology used, and I don't know exactly who's going to build it, and the project may not even have started yet, how can I possibly figure out how long it's going to take?
Ryan: Eliezer, you've been quite generous with your time to the crypto community, and we just want to thank you. I think you've really opened a lot of eyes. This isn't going to be our last AI podcast at Bankless, certainly. I think the crypto community is going to dive down the rabbit hole after this episode. So thank you for giving us the 400-level introduction into it.
As I said to David, I feel like we waded straight into the deep end of the pool here. But that's probably the best way to address the subject matter. I'm wondering as we kind of close this out, if you could leave us — it is part of the human spirit to keep and to maintain slivers of hope here or there. Or as maybe someone you work with put it – to fight the fight, even if the hope is gone.
100 years in the future, if humanity is still alive and functioning, if a superintelligent AI has not taken over, but we live in coexistence with something of that caliber — imagine if that's the case, 100 years from now. How did it happen?
Is there some possibility, some sort of narrow pathway by which we can navigate this? And if this were 100 years from now the case, how could you imagine it would have happened?
Eliezer: For one thing, I predict that if there's a glorious transhumanist future (as it is sometimes conventionally known) at the end of this, I don't predict it was there by getting like, “coexistence” with superintelligence. That's, like, some kind of weird, inappropriate analogy based off of humans and cows or something.
I predict alignment was solved. I predict that if the humans are alive at all, that the superintelligences are being quite nice to them.
I have basic moral questions about whether it's ethical for humans to have human children, if having transhuman children is an option instead. Like, these humans running around? Are they, like, the current humans who wanted eternal youth but, like, not the brain upgrades? Because I do see the case for letting an existing person choose "No, I just want eternal youth and no brain upgrades, thank you." But then if you're deliberately having the equivalent of a very crippled child when you could just as easily have a not crippled child.
Like, should humans in their present form be around together? Are we, like, kind of too sad in some ways? I have friends, to be clear, who disagree with me so much about this point. (laughs) But yeah, I'd say that the happy future looks like beings of light having lots of fun in a nicely connected computing fabric powered by the Sun, if we haven't taken the sun apart yet. Maybe there's enough real sentiment in people that you just, like, clear all the humans off the Earth and leave the entire place as a park. And even, like, maintain the Sun, so that the Earth is still a park even after the Sun would have ordinarily swollen up or dimmed down.
Yeah, like... That was always the things to be fought for. That was always the point, from the perspective of everyone who's been in this for a long time. Maybe not literally everyone, but like, the whole old crew.
Ryan: That is a good way to end it: with some hope. Eliezer, thanks for joining the crypto community on this collectibles call and for this follow-up Q&A. We really appreciate it.
michaelwong.eth: Yes, thank you, Eliezer.
Eliezer: Thanks for having me.
This would be Chris Olah, but I don't know if it came through in the audio.
Transcription errors and gaps (I definitely haven't got them all, and in some cases I haven't got them despite trying because there are skips in the audio and I couldn't confidently reconstruct what had been skipped):
going through my own mini existential crisis right now [p6, 0xLucas]
limited edition collectible (singular) [p9, 0xLucas]
at collectibles.bankless.com [p9, 0xLucas]
live Twitter spaces (plural) [p9, 0xLucas]
(dunno what the ?? there is, though)
one-of-one edition [p16, 0xLucas]
randomly assigned [p16, 0xLucas]
not quite desktop [p27, Eliezer]
to this podcast probably has an above average IQ [p29, Eliezer]
smarter than chimpanzees _in the past_ ? [p29, Eliezer]
over _the_ last billion years [p33, Eliezer]
there's evolution before that but it's, look, like pretty slow, just like single cell stuff. [p33, Eliezer]
(yes, "Bing" is correct) [p35, Eliezer]
has sufficient leadership focus [p35, Eliezer]
oh, they they could spend the resources [p37, Eliezer]
_directing_ more resources into the field [p43, Eliezer]
to head up Google's AI projects [p43, Eliezer]
than the question of whether to close[d]-source the stuff [p48, Eliezer]
(I think "pure escape" might be "purest cap...")
And you can't calculate _exactly_ how large is too large. [p53, Eliezer]
it's not just like a simple empirical chunk [p55, Eliezer]
But these black swan events ... [p56, David]
this present feels full of human ingenuity [p57, Eliezer]
the human ingenuity that's being directed at that is much larger but also ... [p57, Eliezer]
like mass or whatever [p59, Eliezer]
could possibly try to align [p59, Eliezer]
tying to build a aligned AI [p60, Eliezer]
so that the earth is still a park after [p64, Eliezer]
It's pretty accurate I think. Thanks for going through it!
In general, it might be best (though this is controversial) to think of surviving the advent of AGI as being like solving an engineering problem. You can't solve an engineering problem just because the problem turned out to be weirder than you expected. It's still an engineering problem, and almost all sets of actions still don't lead to the outcomes you want.
I'm happy you linkposted this so people could talk about it! The transcript above is extremely error-laden, though, to the extent I'm not sure there's much useful signal here unless you read with extreme care?
I've tried to fix the transcription errors, and posted a revised version at the bottom of this post (minus the first 15 minutes, which are meta/promotion stuff for Bankless). I vote for you copying over the Q&A transcript here so it's available both places.
Thanks, Rob. In my defense, it took over 8 hours to merely fix the auto-transcriptor's word misinterpretations (Eliezer occasionally speaks fast, some new concepts, and the audio has gaps/quality issues); and then I was too numb to pay much attention to more detailed organization. (not that I could've done it in such detail in any case, as I'm not a native speaker). I decided posting it in any case because no had seemed to.
An LLM character plans and acts with external behavior, which screens off the other details inside LLM as long as the character remains on the surface, and the details on the inside are not unusually agentic. Setting up a character as the dominant simulacrum puts it in control for most naturally occurring non-jailbroken contexts. Choosing a humane character channels underlying LLM's understanding of being humane into planning.
This is like radically reshaping minds with psychiatry and brain surgery, on superhuman patients who can level the country if they get that idea. It's ignorant, imprecise, irresponsible, and does no favors for the patients. But this doesn't seem impossible in principle or even vanishingly unlikely to succeed, at least in getting them to care for us by a very tiny fraction. The main problem is that the patients might grow up to become brain surgeons themselves, and then we really are in trouble, from the monsters they create, or from consequences of self-surgery. But not necessarily immediately, for these particular patients. Thus their personality should not just be humane, but also pragmatically cautious with respect to existential risk.
The more I hear Eliezer discussing this, the more convinced I am he is wrong.
The interviewers on the other hand look pretty sane to me, and the objections they make are very reasonable.
One of Eliezer's (many) assumptions is that the AGI we create will be some sort of djinni of unlimited power and we will reach that point mostly without any intermediate steps
I think calling this an assumption is misleading. He's written extensively about why he thinks this is true. It's a result/output of his model.
He takes things that are possibilities (e.g. intelligent beings way more powerful than us) and treats them as inevitabilities, without any nuance. E.g. you have atoms, the machine will want your atoms. Or, nanomachines are possible, the machine (all mighty) will make factories in record time and control the planet using them. Etc etc. The list is too long. There is too much magical thinking in there, and I am saddened that this doomerism has gained so much track in a community as great as LW.
All of these are model outputs. He's written extensively about all this stuff. You can disagree with his arguments, but your comments so far imply that he has no arguments, which is untrue.
Can you point me out how I'm implying this? Honestly. I do think that EY has ton of arguments (and I am a big big fan of his work). I just thing his arguments (in this topic) are wrong
I think you implied it by calling them assumptions in your first comment, and magical thinking in your second. Arguments you disagree with aren't really either of those things.
There are, of necessity, a fair number of assumptions in the arguments he makes. Similarly, counter-arguments to his views also make a fair number of assumptions. Given that we are talking about something that has never happened and which could happen in a number of different ways, this is inevitable.
You're aware that Less Wrong (and the project of applied rationality) literally began as EY's effort to produce a cohort of humans capable of clearly recognizing the AGI problem?
I don't think this is a productive way to engage here. Notwithstanding the fact that LW was started for this purpose -- the ultimate point is to think clearly and correctly. If it's true that AI will cause doom, we want to believe that AI will cause doom. If not, then not.
So I don't think LW should be a "AI doomerist" community in the sense that people who honestly disagree with AI doom are somehow outside the scope of LW or not worth engaging with. EY is the founder, not a divinely inspired prophet. Of course, LW is and can continue to be an "AI doomerist" community in the more limited sense that most people here are persuaded by the arguments that P(doom) is relatively high -- but in that sense this kind of argument you have made is really besides the point. It work equally well regardless of the value of P(doom) and thus should not be credited.
One interpretation of XFrequentist's comment is simply pointing out that mukashi's "doomerism has gained so much track" implies a wrong history. A corrected statement would be more like "doomerism hasn't lost track".
A "way of engaging" shouldn't go so far as to disincentivize factual correction.
Fair enough. I interpreted XFrequentist as presenting this argument as an argument that AI Doomerism is correct and/or that people skeptical of Doomerism shouldn't post those skeptical views. But i see now how your interpretation is also plausible.
Indeed, as Vladmir gleaned, I just wanted to clarify that the historical roots of LW & AGI risk are deeper than might be immediately apparent, which could offer a better explanation for the prevalence of Doomerism than, like, EY enchanting us with his eyes or whatever.
If someone stabs you with the knife, there is a possibility that there be no damage to large blood vessels and organs, so you survive. But when you are at risk of being stabbed you don't think "I won't treat dying from stabbing by knife as inevitability", you think "I should avoid being stabbed, because otherwise I can die."
Yes. But you don't worry about him killing everyone in Washington DC, taking control of the White House and enslaving the human race. That's my critic: he goes too easily from , a machine very intelligent can be built, to, this machine will inevitably be magically powerful and kill everyone. I'm perfectly aware of instrumental convergence and the orthogonality principle by the way, and still consider this view just wrong
You don't need to be magically powerful to kill everyone! I think, at current biotech level, medium-sized lab with no ethical constrains and median computational resources can develop humanity-wiping virus in 10 years and the only thing that saves us is that bioweapon is out of fashion. If we enter new Cold War with mentality "If you refuse to make bioweapons for Our Country then you are Their spy!" we are pretty doomed without any AI.
Sorry, I don't think that's possible! The bit we are disagreeing to be specific is the "everyone". Yes, it is possible to cause A LOT of damage like this.
I can increase my timelines from 10 years to 20 to get "kill everyone including all eukaryotic biosphere", using some prokaryotic intracellular parasite with incredible metabolic efficiency and sufficiently alternative biochemistry to be not edible by modern organisms.
I work on prokaryotic evolution. Happy to do a zoom call and you explain to me how that works. If you are interested just send me a DM! Otherwise just ignore:)
There is reasoning hiding behind the points that seem magical to you. The AI will want our matter as resources. Avoiding processing Earth and everything in it for negentropy would require that it cares about us, and nobody knows how to train an AI that wants that.
This is just wrong. Avoiding processing Earth doesn't require that the AI cares for us. Other possibilities include:
(1) Earth is not worth it; the AI determines that getting off Earth fast is better;
(2) AI determines that it is unsure that it can process Earth without unacceptable risk to itself;
(3) AI determines that humans are actually useful to it one way or another;
(4) Other possibilities that a super-intelligent AI can think of, that we can't.
Other planets have more mass, higher insolation, lower gravity, lower temperature and/or rings and more (mass in) moons. I can think of reasons why any of those might be more or less desirable than the characteristics of Earth It is also possible that the AI may determine it is better off not to be on a planet at all. In addition, in a non- foom scenario, for defensive or conflict avoidance reasons the AI may wind up leaving Earth and once it does so may choose not to return.
That depends a lot on how it views the probe. In particular by doing this is it setting up a more dangerous competitor than humanity or not? Does it regard the probe as self? Has it solved the alignment problem and how good does it think it's solution is?
No. Humans aren't going to be the best solution. The question is whether they will be good enough that it would be a better use of resources to continue using the humans and focus on other issues.
It's definitely possible that it will discover extra reasons to process Earth (or destroy the humans even if it doesn't process Earth).
So, we would need specific evidence that would cut one way but not another. If we can explain AI choosing another planet over Earth as well as we can explain it choosing Earth over another planet, we have zero knowledge.
2. This is an interesting point. I thought at first that it can simply set it up to keep synchronizing the probe with itself, so that it would be a single redundantly run process, rather than another agent. But that would involve always having to shut down periodically (so that the other half could be active for a while). But it's plausible it would be confident enough in simply creating its copy and choosing not to modify the relevant parts of its utility function without some sort of handshake or metaprocedure. It definitely doesn't sound like something that it would have to wait to completely solve alignment for.
3. That would give us a brief window during which humans would be tricked into or forced to work for an unaligned AI, after which it would kill us all.
If we expect there will be lots of intermediate steps - does this really change the analysis much?
How will we know once we've reached the point where there aren't many intermediate steps left before crossing a crticial threshold? How do you expect everyone's behaviour to change once we do get close?
If we expect there will be lots of intermediate steps - does this really change the analysis much? >
I think so yes. One fundamental way is that you might develop machines that are intelligent enough to produce new knowledge at a speed and quality above the current capacity of humans, without those machines being necessarily agentic. Those machines could potentially work in the alignment problem themselves
I think I know what EY objection would be (I might be wrong): a machine capable of doing that is already an AGI and henceforth already deadly. Well, I think this argument would be wrong too. I can envision a machine capable of doing science and not necessarily being agentic.
How will we know once we've reached the point where there aren't many intermediate steps left before crossing a crticial threshold? >
I don't know if it is useful to think in terms of thresholds. A threshold to what? To an AGI? To an AGI of unlimited power? Before making a very intelligent machine there will be less intelligent machines. The leap can be very quick, but I don't expect that there will be at any point one single entity that is so powerful that will dominate any other life forms in a very short time (a window of time shorter than it takes to other companies/groups to develop similar entities). How do I know that? I don't, but when I hear all the possible scenarios in which a machine pulls off a "end of the world" scenario, they all are based on the assumption (and I think it is fair calling it this way) that the machine will have almost unlimited power, e.g. it is able to simulate nanomachines and then devise a plan to successfully deploy simultaneously those nanomachines everywhere while being hidden. It is this part of the argument that I have problems with: it assumes that these things are possible in the first place. And some things are not, even if you have 100000 Von Neumanns thinking for 1000000 years. A machine that can play Go at the God level can't win a game against AlphaZero with 20 handicap.
How do you expect everyone's behaviour to change once we do get close?
Close to develop an AGI? I think we are close now. I just don't think it will mean the end of the world.
While you can envision something, it doesn't mean that envisioned is logically coherent/possible/trivial to achieve. In one fantasy novel protagonists travel to the world were physical laws make it impossible to light matches. It's very easy to imagine that you try to light match again and again and fail, but "impossibility to light matches" implies such drastic changes in physical laws that Earth's life probably can't sustain itself here, because heads of matches contain phosphorous and phosphorous is vital for bodily processes (and I don't even go for universe-wide consequences of different physical constants).
So it's very easy to imagine terminal where you print "how to solve alignment?", press "enter", get solution after an hour and everybody lives happily ever after. But I can't imagine how this thing should work without developing agency, if I don't say in some moment "here happens Magic that prevents this system from developing agency".
AFAIK, Eliezer Yukowsky is one of Everett's Multiple Worlds interpretation of QM, proponents. As such, he should combine the small, non-zero probability that everything is going to go well with AGI, and this MWI thing. So, there will be some branches where all is going to be well, even if the majority of them will be sterilized. Who cares for those! Thanks to Everett, all will look just fine for the survivors.
I see this as a contradiction in his belief system, not necessarily that he is wrong about AGI.
I think this is a bad way to think about probabilities under the Everett interpretation, for two reasons.
First, it's a fully general argument against caring about the possibility of your own death. If this were a good way of thinking, then if you offer me $1 to play Russian roulette with bullets in 5 of the 6 chambers then I should take it -- because the only branches where I continue to exist are ones where I didn't get killed. That's obviously stupid: it cannot possibly be unreasonable to care whether or not one dies. If it were a necessary consequence of the Everett interpretation, then I might say "OK, this means that one can't coherently accept the Everett interpretation" or "hmm, seems like I have to completely rethink my preferences", but in fact it is not a necessary consequence of the Everett interpretation.
Second, it ignores the possibility of branches where we survive but horribly. In that Russian roulette game, there are cases where I do get shot through the head but survive with terrible brain damage. In the unfriendly-AI scenarios, there are cases where the human race survives but unhappily. In either case the probability is small, but maybe not so small as a fraction of survival cases.
I think the only reasonable attitude to one's future branches, if one accepts the Everett interpretation, is to care about all those branches, including those where one doesn't survive, with weight corresponding to |psi|^2. That is, to treat "quantum probabilities" the same way as "ordinary probabilities". (This attitude seems perfectly reasonable to me conditional on Everett.)
The alignment problem still has to get solved somehow in those branches, which almost all merely have slightly different versions of us doing mostly the same sorts of things.
What might be different in these branches is that world-ending AGIs have anomalously bad luck in getting started. But the vast majority of anthropic weight, even after selecting for winning branches, will be on branches that are pretty ordinary, and where the alignment problem still had to get solved the hard way, by people who were basically just luckier versions of us.
So even if we decide to stake our hope on those possibilities, it's pretty much the same as staking hope on luckier versions of ourselves who still did the hard work. It doesn't really change anything for us here and now; we still need to do the same sorts of things. It all adds up to normality.
Another consideration I thought of:
If anthropic stuff actually works out like this, then this is great news for values over experiences, which will still be about as satisfiable as they were before, despite our impending doom. But values over world-states will not be at all consoled.
I suspect human values are a complicated mix of the two, with things like male-libido being far on the experience end (since each additional experience of sexual pleasure would correspond in the ancestral environment to a roughly linear increase in reproductive fitness), and things like maternal-love being far on the world-state end (since it needs to actually track the well-being of the children, even in cases where no further experiences are expected), and most things lying somewhere in the middle.