This podcast has gotten a lot of traction, so we're posting a full transcript of it, lightly edited with ads removed, for those who prefer reading over audio.
Eliezer Yudkowsky: I think that we are hearing the last winds start to blow, the fabric of reality start to fray. This thing alone cannot end the world, but I think that probably some of the vast quantities of money being blindly and helplessly piled into here are going to end up actually accomplishing something.
Ryan Sean Adams: Welcome to Bankless, where we explore the frontier of internet money and internet finance. This is how to get started, how to get better, how to front run the opportunity. This is Ryan Sean Adams. I'm here with David Hoffman, and we're here to help you become more bankless. Okay, guys, we wanted to do an episode on AI at Bankless, but I feel like David...
David: Got what we asked for.
Ryan: We accidentally waded into the deep end of the pool here. And I think before we get into this episode, it probably warrants a few comments. I'm going to say a few things I'd like to hear from you too. But one thing I want to tell the listener is, don't listen to this episode if you're not ready for an existential crisis. Okay? I'm kind of serious about this. I'm leaving this episode shaken. And I don't say that lightly. In fact, David, I think you and I will have some things to discuss in the debrief as far as how this impacted you. But this was an impactful one. It sort of hit me during the recording, and I didn't know fully how to react. I honestly am coming out of this episode wanting to refute some of the claims made in this episode by our guest, Eliezer Yudkowsky, who makes the claim that humanity is on the cusp of developing an AI that's going to destroy us, and that there's really not much we can do to stop it.
David: There's no way around it.
Ryan: Yeah. I have a lot of respect for this guest. Let me say that. So it's not as if I have some sort of big brain technical disagreement here. In fact, I don't even know enough to fully disagree with anything he's saying. But the conclusion is so dire and so existentially heavy that I'm worried about it impacting you, listener, if we don't give you this warning going in. I also feel like, David, as interviewers, maybe we could have done a better job. I'll say this on behalf of myself. Sometimes I peppered him with a lot of questions in one fell swoop, and he was probably only ready to synthesize one at a time. I also feel like we got caught flat-footed at times. I wasn't expecting his answers to be so frank and so dire, David. It was just a rift of hope. I appreciated very much the honesty, as we always do on Bankless, but I appreciated it almost in the way that a patient might appreciate the honesty of their doctor telling them that their illness is terminal. It's still really heavy news, isn't it?
So that is the context going into this episode. I will say one thing. In good news, for our failings as interviewers in this episode, they might be remedied because at the end of this episode, after we finished with Hit the Record Button to Stop Recording, Eliezer said, he'd be willing to provide an additional Q&A episode with the Bankless community. So if you guys have questions, and if there's sufficient interest for Eliezer to answer, tweet us to express that interest. Hit us in Discord. Get those messages over to us and let us know if you have some follow-up questions. He said, if there's enough interest in the crypto community, I'll say he'd be willing to come on and do another episode with follow-up Q&A. Maybe even a Vitalik and Eliezer episode is in store. That's a possibility that we threw to him. We've not talked to Vitalik about that too, but I just feel a little overwhelmed by the subject matter here. And that is the basis, the preamble through which we are introducing this episode. David, there's a few benefits and takeaways I want to get into. But before I do, can you comment or reflect on that preamble? What are your thoughts going into this one?
David: Yeah, we approached the end of our agenda. For every Bankless podcast, there's an equivalent agenda that runs alongside of it. But once we got to this crux of this conversation, it was not possible to proceed in that agenda because what was the point?
Ryan: Nothing else mattered.
David: And nothing else really matters, which also just relates to the subject matter at hand. And so as we proceed, you'll see us kind of circle back to the same inevitable conclusion over and over and over again, which ultimately is kind of the punchline of the content. And so I'm of a specific disposition where stuff like this, I kind of am like, oh, whatever. Okay, just go about my life. Other people are of different dispositions and take these things more heavily. So Ryan's warning at the beginning is if you are a type of person to take existential crises directly to the face, perhaps consider doing something else instead of listening to this episode.
Ryan: I think that is good counsel. So a few things if you're looking for an outline of the agenda, we start by talking about chatGPT. Is this a new era of artificial intelligence? Got to begin the conversation there. Number two, we talk about what an artificial superintelligence might look like. How smart exactly is it? What types of things could it do that humans cannot do? Number three, we talk about why an AI superintelligence will almost certainly spell the end of humanity and why it'll be really hard, if not impossible, according to our guest, to stop this from happening. And number four, we talk about if there is absolutely anything we can do about all of this. We are heading careening maybe towards the abyss. Can we divert direction and not go off the cliff? That is the question we ask Eliezer with. David, I think you and I have a lot to talk about during the debrief. All right, guys, the debrief is an episode that we record right after the episode. It's available for all Bankless citizens. We call this the Bankless Premium Feed. You can access that now to get our raw and unfiltered thoughts on the topic. I think it's going to be pretty raw this time around, David.
David: I didn't expect this to hit you so hard.
Ryan: Oh, I'm dealing with it right now.
Ryan: And this is probably not too long after the episode. So, yeah, I don't know how I'm going to feel tomorrow, but I definitely want to talk to you about this. And maybe have you talk.
David: I'll put my side on.
Ryan: Please, I'm going to need some help. Guys, we're going to get right to the episode with Eliezer.
Ryan: Bankless Nation, we are super excited to introduce you to our next guest. Eliezer Yudkowsky is a decision theorist. He's an AI researcher. He's the creator of the Less Wrong Community blog, a fantastic blog, by the way. There's so many other things that he's also done. I can't fit this in the short bio that we have to introduce you to Eliezer. But most relevant probably to this conversation is he's working at the Machine Intelligence Research Institute to ensure that when we do make general artificial intelligence, it doesn't come kill us all. Or at least it doesn't come ban cryptocurrency because that would be a poor outcome as well. Eliezer, it's great to have you on Bankless. How are you doing?
Eliezer: Within one standard deviation of my own peculiar little mean.
Ryan: Fantastic. You know, we want to start this conversation with something that is jumped onto the scene, I think, for a lot of mainstream folks quite recently. And that is ChatGPT. So apparently over 100 million or so have logged on to ChatGPT quite recently. I've been playing with it myself. I found it very friendly, very useful. It even wrote me a sweet poem that I thought was very heartfelt and almost human-like. I know that you have major concerns around AI safety, and we're going to get into those concerns. But can you tell us in the context of something like a ChatGPT, is this something we should be worried about? That this is going to turn evil and enslave the human race? How worried should we be about ChatGPT and BARD and the new AI that's entered the scene recently?
Eliezer: ChatGPT itself? Zero. It's not smart enough to do anything really wrong or really right either for that matter.
Ryan: And what gives you the confidence to say that? How do you know this?
Eliezer: Excellent question. So every now and then, somebody figures out how to put a new prompt into ChatGPT. One time somebody found that it would talk well, not ChatGPT, but one of the earlier generations of technology they found that it would sound smarter if you first told it it was Eliezer Yudkowsky. There's other prompts too, but that one's one of my favorites. So there's untapped potential in there that people haven't figured out how to prompt yet. But when people figure it out, it moves ahead sufficiently short distances that I do feel fairly confident that there is not so much untapped potential in there that it is going to take over the world. It's like making small movements, and to take over the world, it would need a very large movement. There's places where it falls down on predicting the next line that a human would say in its shoes that seem indicative of probably that capability just is not in the giant inscrutable matrices, or it would be using it to predict the next line, which is very heavily what it was optimized for. So there's going to be some untapped potential in there. But I do feel quite confident that the upper range of that untapped potential is insufficient to outsmart all the living humans and implement the scenario that I'm worried about.
Ryan: So even so though, is Chat-GPT a big leap forward in the journey towards AI in your mind? Or is this fairly incremental? It's just for whatever reason, it's caught mainstream attention.
Eliezer: GPT-3 was a big leap forward. There's rumors about GPT-4, which, who knows? Chat-GPT is a commercialization of the actual AI in the lab giant leap forward. If you had never heard of GPT-3 or GPT-2, or the whole range of text transformers before Chat-GPT suddenly entered into your life, then that whole thing is a giant leap forward. But it's a giant leap forward based on a technology that was published in, if I recall correctly, 2018.
David: I think that what's going around in everyone's minds right now, and the bankless listenership and crypto people at large, are largely futurists. So everyone, I think, listening, understands that in the future, there will be sentient AIs perhaps around us, at least by the time that we all move on from this world. So we all know that this future of AI is coming towards us. And when we see something like Chat-GPT, everyone's like, oh, is this the moment in which our world starts to become integrated with AI? And so, Elisa, you've been tapped into the world of AI. Are we onto something here? Or is this just another fad that we will internalize and then move on for? And then the real moment of generalized AI is actually much further out than we're initially giving credit for. Where are we in this timeline?
Eliezer: Predictions are hard, especially about the future. I sure hope that this is where it saturates. This is like the next generation. It goes only this far, it goes no further, it doesn't get used to make more steel or build better power plants, first because that's illegal. And second, because the technologies, like the large language model technologies, basic vulnerabilities, that's not reliable. It's good for applications where it works 80% of the time, but not where it needs to work 99.999% of the time. This class of technology can't drive a car because it will sometimes crash the car. So I hope it saturates there. I hope they can't fix it. I hope we get a 10-year AI winter after this. This is not what I actually predict. I think that we are hearing the last winds start to blow, the fabric of reality start to fray. This thing alone cannot end the world. But I think that probably some of the vast quantities of money being blindly and helplessly piled into here are going to end up actually accomplishing something, not most of the money. That just never happens in any field of human endeavor. But 1% of $10 billion is still a lot of money to actually accomplish something.
Ryan: So I think, listeners, I think you've heard Eliezer's thesis on this, which is pretty dim with respect to AI alignment. And we'll get into what we mean by AI alignment and very worried about AI safety-related issues. But I think for a lot of people to even worry about AI safety and for us to even have that conversation, I think they have to have some sort of grasp of what AGI looks like. I understand that to mean artificial general intelligence and this idea of a super intelligence. Can you tell us, if there was a super intelligence on the scene, what would it look like? Is this going to look like a big chat box on the internet that we can all type things into? It's like an Oracle-type thing? Or is it like some sort of a robot that is going to be constructed in a secret government lab? Is this like something somebody could accidentally create in a dorm room? What are we even looking for when we talk about the term AGI and super intelligence?
Eliezer: First of all, I'd say those are pretty distinct concepts. ChatGPT shows a very wide range of generality compared to the previous generations of AI. Not like very wide generality compared to GPT-3, not like literally the lab research that got commercialized. That's the same generation. But compared to stuff from 2018 or even 2020, chatGPT is better at a much wider range of things without having been explicitly programmed by humans to be able to do those things. To imitate a human as best it can, it has to capture all of the things that humans can think about than it can, which is not all the things. It's still not very good at long multiplication unless you give it the right instructions, which case suddenly can do it.
It's significantly more general than the previous generation of artificial minds. Humans were significantly more general than the previous generation of chimpanzees or rather Australopithecus or last common ancestor. Humans are not fully general. If humans were fully general, we'd be as good at coding as we are at football, throwing things or running. Some of us are okay at programming, but we're not spec'd for it. We're not fully general minds. You can imagine something that's more general than a human. If it runs into something unfamiliar, it's like, okay, let me just go reprogram myself a bit and then I'll be as adapted to this thing as I am to anything else. So chatGPT is less general than a human, but it's genuinely ambiguous, I think, whether it's more or less general than say, our cousins, the chimpanzees. Or if you don't believe it's as general as a chimpanzee, a dolphin or a cat.
Ryan: So this idea of general intelligence is sort of a range of things that it can actually do, a range of ways it can apply itself?
Eliezer: How wide is it? How much reprogramming does it need? How much retraining does it need to make it do a new thing? Bees build hives, beavers build dams, a human will look at a beehive and imagine a honeycomb shaped dam. That's like humans alone in the animal kingdom. But that doesn't mean that we are general intelligences, it means we're significantly more generally applicable intelligences than chimpanzees. It's not like we're all that narrow. We can walk on the moon. We can walk on the moon because there's aspects of our intelligence that are made in full generality for universes that contain simplicities, regularities, things that recur over and over again. We understand that if steel is hard on earth, it may stay hard on the moon. And because of that, we can build rockets, walk on the moon, breathe amid the vacuum. Chimpanzees cannot do that, but that doesn't mean that humans are the most general possible things. The thing that is more general than us, that figures that stuff out faster, is the thing to be scared of if the purposes to which it turns are its intelligences are not ones that we would recognize as nice things, even in the most cosmopolitan and embracing senses of what's worth doing.
Ryan: And you said this idea of a general intelligence is different than the concept of superintelligence, which I also brought into that first part of the question. How is superintelligence different than general intelligence?
Eliezer: Well, because chatGPT has a little bit of general intelligence. Humans have more general intelligence. A superintelligence is something that can beat any human and the entire human civilization at all the cognitive tasks. I don't know if the efficient market hypothesis is something where I can rely on the entire…
Ryan: We're all crypto investors here. We understand the efficient market hypothesis for sure.
Eliezer: So the efficient market hypothesis is of course not generally true. It's not true that literally all the market prices are smarter than you. It's not true that all the prices on earth are smarter than you. Even the most arrogant person who is at all calibrated, however, still thinks that the efficient market hypothesis is true relative to them 99.99999% of the time. They only think that they know better about one in a million prices. There might be important prices. Now the price of Bitcoin is an important price. It's not just a random price. But if the efficient market hypothesis was only true to you 90% of the time, you could just pick out the 10% of the remaining prices and double your money every day on the stock market. And nobody can do that. Literally nobody can do that. So this property of relative efficiency that the market has to you, that the price is an estimate of the future price, it already has all the information you have, not all the information that exists in principle, maybe not all the information that the best equity budget, but relative to you, it's efficient relative to you.
For you, if you pick out a random price, like the price of Microsoft stock, something where you've got no special advantage, that estimate of its price a week later is efficient relative to you. You can't do better than that price. We have much less experience with the notion of instrumental efficiency, efficiency in choosing actions, because actions are harder to aggregate estimates about than prices. So you have to look at, say, alpha zero playing chess, or just, you know, like Stockfish, whatever the latest Stockfish number is, an advanced chess engine. When it makes a chess move, you can't do better than that chess move. It may not be the optimal chess move, but if you pick a different chess move, you'll do worse. That you'd call like a kind of efficiency of action. Given its goal of winning the game, there is, once you know its move, unless you consult some more powerful AI than Stockfish, you can't figure out a better move than that. A super intelligence is like that with respect to everything, with respect to all of humanity. It is relatively efficient to humanity. It has the best estimates, not perfect estimates, but the best estimates, and its estimates contain all the information that you've got about it. Its actions are the most efficient actions for accomplishing its goals. If you think you see a better way to accomplish its goals, you're mistaken.
Ryan: So you're saying this is super intelligence, we'd have to imagine something that knows all of the chess moves in advance. But here we're not talking about chess, we're talking about everything. It knows all of the moves that we would make and the most optimum pattern, including moves that we would not even know how to make, and it knows these things in advance. I mean, how would human beings sort of experience such a super intelligence? I think we still have a very hard time imagining something smarter than us, just because we've never experienced anything like it before. Of course, we all know somebody who's genius level IQ, maybe quite a bit smarter than us, but we've never encountered something like that you're describing, some sort of mind that is super intelligent. What sort of things would it be doing that humans couldn't? How would we experience this in the world?
Eliezer: I mean, we do have some tiny bit of experience with it. We have experience with chess engines, where we just can't figure out better moves than they make. We have experience with market prices, where even though your uncle has this really long, elaborate story about Microsoft stock, you just know he's wrong. Why is he wrong? Because if he was correct, it would already be incorporated into the stock price. And this notion, and especially because the market efficiency are not perfect, like that whole downward swing and then upward move in COVID. I have friends who made more money off that than I did, but I still managed to buy back into the broader stock market on the exact day of the low, basically coincidence. So the markets aren't perfectly efficient, but they're efficient almost everywhere. And that sense of deference, that sense that your weird uncle can't possibly be right because the hedge funds would know it, unless he's talking about COVID, in which case maybe he is right. If you have the right choice of weird uncle. I have weird friends who are maybe better at calling these things than your weird uncle. So among humans, it's subtle.
And then with super intelligence, it's not subtle, just massive advantage, but not perfect. It's not that it knows every possible move you make before you make it. It's that it's got a good probability distribution about that. And it has figured out all the good moves you could make and figured out how to apply to those. And what's that? I mean, in practice, what's that like? Well, unless it's limited, narrow super intelligence, I think you'd mostly don't get to observe it because you are dead, unfortunately. So you can, like stockfish make strictly better chess moves than you, but it's playing on a very narrow board. And the fact that it's better at you than chess doesn't mean it's better at you than everything. And I think that the actual catastrophe scenario for AI looks like big advancement in a research lab, maybe driven by them getting a giant venture capital investment and being able to spend 10 times as much on GPUs as they did before, maybe driven by a new algorithmic advance like transformers, maybe driven by hammering out some tweaks in last year's algorithmic advance that gets the thing to finally work efficiently. And the AI there goes over a critical threshold, which most obviously could be like, can write the next AI.
That's so obvious that science fiction writers figured it out almost before there were computers, possibly even before there were computers. I'm not sure what the exact dates here are. But if it's better at you than everything, it's better at you than building AIs. That's snowballs. It gets an immense technological advantage. If it's smart, it doesn't announce itself. It doesn't tell you that there's a fight going on. It emails out some instructions to one of those labs that'll synthesize DNA and synthesize proteins from the DNA and get some proteins mailed to a hapless human somewhere who gets paid a bunch of money to mix together some stuff they got in the mail in a file. Like smart people will not do this for any sum of money. Many people are not smart. Builds the ribosome, but the ribosome that builds things out of covalently bonded diamondoid instead of proteins folding up and held together by Van der Waals forces, builds tiny diamondoid bacteria. The diamondoid bacteria replicate using atmospheric carbon, hydrogen, oxygen, nitrogen, and sunlight. And a couple of days later, everybody on earth falls over dead in the same second. That's the disaster scenario if it's as smart as I am. If it's smarter, it might think of a better way to do things. But it can at least think of that if it's relatively efficient compared to humanity because I'm in humanity and I thought of it.
Ryan: This is I've got a million questions, but I'm gonna let David go first.
David: Yeah. So we've introduced, we sped run the introduction of a number of different concepts, which I want to go back and take our time to really dive into. There's the AI alignment problem. There's AI escape velocity. There is the question of what happens when AIs are so incredibly intelligent that humans are to AIs what ants are to us. And so I want to kind of go back and tackle these, Eliezer, one by one. We started this conversation talking about chatGPT and everyone's up in arms about chatGPT. It's like, oh, and you're saying like, yes, it's a great step forward in the generalizability of some of the technologies that we have in the AI world. All of a sudden chatGPT becomes immensely more useful and it's really stoking the imaginations of people today. But what you're saying is it's not the thing that's actually going to be the thing to reach escape velocity and create super intelligent AIs that perhaps might be able to enslave us. But my question to you is, how do we know when that...
Eliezer: Not enslave. They don't enslave you, but sorry, go on.
David: Yeah, sorry.
Ryan: Murder David, kill all of us. Eliezer was very clear on that.
David: So if it's not chatGPT, how close are we? Because there's this unknown event horizon where you kind of alluded to it, where we make this AI that we train it to create a smarter AI and that smart AI is so incredibly smart that it hits escape velocity and all of a sudden these dominoes fall. How close are we to that point? And are we even capable of answering that question?
Eliezer: How the heck would I know?
Ryan: Well, when you were talking, Eliezer, if we had already crossed that event horizon, a smart AI wouldn't necessarily broadcast that to the world. It's possible we've already crossed that event horizon, is it not?
Eliezer: I mean, it's theoretically possible, but seems very unlikely somebody would need inside their lab an AI that was much more advanced than the public AI technology. And as far as I currently know, the best labs and the best people are throwing their ideas to the world. They don't care. And there's probably some secret government labs with secret government AI researchers. My pretty strong guess is that they don't have the best people and that those labs could not create chatGPT on their own because chatGPT took a whole bunch of fine twiddling and tuning and visible access to giant GPU farms and that they don't have the people who know how to do the twiddling and tuning. This is just a guess.
David: Could you walk us through one of the big things that you spend a lot of time on is this thing called the AI alignment problem. Some people are not convinced that when we create AI, that AI won't really just be fundamentally aligned with humans. I don't believe that you fall into that camp. I think you fall into the camp of when we do create this super intelligent, generalized AI, we are going to have a hard time aligning with it in terms of our morality and our ethics. Can you walk us through a little bit of that thought process? Why do you feel disaligned?
Ryan: The dumb way to ask that question too is like, Eliezer, why do you think that the AI automatically hates us? Why is it going to- It doesn't hate you. Why does it want to kill us all?
Eliezer: The AI doesn't hate you, neither does it love you, and you're made of atoms that it can use for something else.
David: It's indifferent to you.
Eliezer: It's got something that actually does care about, which makes no mention of you. And you are made of atoms that can use for something else. That's all there is to it in the end. The reason you're not in its utility function is that the programmers did not know how to do that. The people who built the AI or the people who built the AI that built the AI that built the AI did not have the technical knowledge that nobody on earth has at the moment as far as I know, whereby you can do that thing and you can control in detail what that thing ends up caring about.
David: So this feels like humanity is hurdling itself towards what we're calling, again, an event horizon where there's this AI escape velocity, and there's nothing on the other side. As in, we do not know what happens past that point as it relates to having some sort of superintelligent AI and how it might be able to manipulate the world. Would you agree with that?
Eliezer: No. Again, the Stockfish chess playing analogy, you cannot predict exactly what move it would make, because in order to predict exactly what move it would make, you would have to be at least that good at chess, and it's better than you. This is true even if it's just a little better than you. Stockfish is actually enormously better than you to the point that once it tells you the move, you can't figure out a better move without consulting a different AI. But even if it was just a bit better than you, then you're in the same position. This kind of disparity also exists between humans. If you ask me, where will Garry Kasparov move on this chessboard? I'm like, I don't know, maybe here. Then if Garry Kasparov moves somewhere else, it doesn't mean that he's wrong, it means that I'm wrong. If I could predict exactly where Garry Kasparov would move on a chessboard, I'd be Garry Kasparov. I'd be at least that good at chess. Possibly better. I could also be able to predict him, but also see an even better move than that.
That's an irreducible source of uncertainty with respect to superintelligence or anything that's smarter than you. If you could predict exactly what it would do, it'd be that smart. Yourself, it doesn't mean you can predict no facts about it. With Stockfish in particular, I can predict it's going to win the game. I know what it's optimizing for. I know where it's trying to steer the board. I can predict that I can't predict exactly what the board will end up looking like after Stockfish has finished winning its game against me. I can predict it will be in the class of states that are winning positions for black or white or whichever color Stockfish picked, because, you know, it wins either way. And that's similarly where I'm getting the kind of prediction about everybody being dead, because if everybody were alive, then there'd be some state that the superintelligence preferred to that state, which is all of the atoms making up these people and their farms are being used for something else that it values more. So if you postulate that everybody's still alive, I'm like, okay, well, why is it you're postulating that Stockfish made a stupid chess move and ended up with a non-winning board position? That's where that prediction, class of predictions come from.
Ryan: Can you reinforce this argument, though, a little bit? So, why is it that an AI can't be nice, sort of like a gentle parent to us, rather than sort of a murderer looking to deconstruct our atoms and apply for you somewhere else? What are its goals? And why can't they be aligned to at least some of our goals? Or maybe, why can't it get into a status which is somewhat like us and the ants, which is largely we just ignore them unless they interfere in our business and come in our house and raid our cereal boxes?
Eliezer: There's a bunch of different questions there. So first of all, the space of minds is very wide. Imagine this giant sphere and all the humans are in this one tiny corner of the sphere. We're all basically the same make and model of car, same running the same brand of engine. We're just all painted slightly different colors. Somewhere in that mind space, there's things that are as nice as humans. There's things that are nicer than humans. There are things that are trustworthy and nice and kind in ways that no human can ever be. And there's even things that are so nice that they can understand the concept of leaving you alone and doing your own stuff sometimes instead of hanging around trying to be like obsessively nice to you every minute and all the other famous disaster scenarios from ancient science fiction "With Folded Hands" by Jack Williamson is the one I'm quoting there. We don't know how to reach into mind design space and pluck out an AI like that. It's not that they don't exist in principle. It's that we don't know how to do it. And I know like, hand back the conversational ball now and figure out like which next question do you want to go down there?
Ryan: Well, I mean, why? Why is it so difficult to sort of align an AI with even our basic notions of morality?
Eliezer: I mean, I wouldn't say that it's difficult to align an AI with our basic notions of morality. I'd say that it's difficult to align an AI on a task like build two identical strawberries. Or no, let me take this strawberry and make me another strawberry that's identical to this strawberry down to the cellular level, but not necessarily the atomic level. So it looks under the same under like a standard optical microscope, but maybe not a scanning electron microscope. Do that. Don't destroy the world as a side effect. Now, this does intrinsically take a powerful AI. There's no way you can make it easy to align by making it stupid to build something that's cellular identical to a strawberry. I mean, mostly I think the way that you do this is with like very primitive nanotechnology. We could also do it using very advanced biotechnology. And these are not technologies that we already have. So it's got to be something smart enough to develop new technology. Nevermind all the subtleties of morality.
I think we don't have the technology to align an AI to the point where we can say, build me a copy of the strawberry and don't destroy the world. Why do I think that? Well, case in point, look at natural selection building humans. Natural selection mutates the humans a bit, runs another generation. The fittest ones reproduce more, their genes become more prevalent to the next generation. Natural selection hasn't really had very much time to do this to modern humans at all, but you know, the hominid line, the mammalian line, go back a few million generations. And this is an example of an optimization process building an intelligence. And natural selection asked us for only one thing, make more copies of your DNA. Make your alleles more relatively prevalent in the gene pool. Maximize your inclusive reproductive fitness, not just like your own reproductive fitness, but your two brothers or eight cousins, as the joke goes, because they've got on average one copy of your genes, two brothers, eight cousins. This is all we were optimized for, for millions of generations, creating humans from scratch, from the first accidentally self-replicating molecule. Internally, psychologically, inside our minds, we do not know what genes are. We do not know what DNA is. We do not know what alleles are. We have no concept of inclusive genetic fitness until it, you know, our genetic, our scientists figure out what that even is. We don't know what we were being optimized for.
For a long time, many humans thought they'd been created by God, and this is when you use the hill-climbing paradigm and optimize for one single extremely pure thing, this is how much of it gets inside. In the ancestral environment, in the exact distribution that we were originally optimized for, humans did tend to end up using their intelligence to try to reproduce more, put them into a different environment, and all the little bits and pieces and fragments of optimizing for fitness that were in us now do totally different stuff. We have sex, but we wear condoms. If natural selection had been a foresightful, intelligent kind of engineer that was able to engineer things successfully, it would have built us to be revolted by the thought of condoms. Men would be lined up and fighting for the rights to donate to sperm banks. And in our natural environment, the little drives that got into us happened to lead to more reproduction, but distributional shift run the humans out of their distribution over which they were optimized, and you get totally different results.
And gradient descent would by default just do not quite the same thing. It's going to do a weirder thing because natural selection has a much narrower information bottleneck. In one sense, you could say that natural selection was at an advantage because it finds simpler solutions. You could imagine some hopeful engineer who just built intelligences using gradient descent and found out that they end up wanting these thousands and millions of little tiny things, none of which were exactly what the engineer wanted, and being like, well, let's try natural selection instead. It's got a much sharper information bottleneck. It'll find the simple specification of what I want. But we actually got there as humans. And then gradient descent probably may be even worse. But more importantly, I'm just pointing out that there is no physical law, computational law, mathematical logical law saying when you optimize using hill climbing on a very simple, very sharp criterion, you get a general intelligence that wants that thing.
Ryan: So just like natural selection, our tools are too blunt in order to get to that level of granularity to program in some sort of morality into these super intelligent systems? CB.
Eliezer: Or build me a copy of a strawberry without destroying the world. Yeah, the tools are too blunt.
David: So I just want to make sure I'm following with what you were saying. I think the conclusion that you left me with is that my brain, which I consider to be at least decently smart, is actually a byproduct, an accidental byproduct of this desire to reproduce. And it's actually just like a tool that I have, and just like conscious thought is a tool, which is a useful tool in means of that end. And so if we're applying this to AI and AI's desire to achieve some certain goal, what's the parallel there?
Eliezer: I mean, every organ in your body is a reproductive organ. If it didn't help you reproduce, you would not have an organ like that. Your brain is no exception. This is merely conventional science and merely the conventional understanding of the world. I'm not saying anything here that ought to be at all controversial. I'm sure it's controversial somewhere, but within a pre-filtered audience, it should not be at all controversial. And this is like the obvious thing to expect to happen with AI, because why wouldn't it? What new law of existence has been invoked, whereby this time we optimize for a thing and we get a thing that wants exactly what we optimized for on the outside?
Ryan: So what are the types of goals an AI might want to pursue? What types of utility functions is it going to want to pursue off the bat? Is it just those it's been programmed with, like make it an identical strawberry?
Eliezer: Well, the whole thing I'm saying is that we do not know how to get goals into a system. We can cause them to do a thing inside a distribution they were optimized over using gradient descent. But if you shift them outside of that distribution, I expect other weird things start happening. When they reflect on themselves, other weird things start happening. What kind of utility functions are in there? I mean, darned if I know. I think you'd have a pretty hard time calling the shape of humans from advance by looking at natural selection, the thing that natural selection was optimizing for, if you'd never seen a human or anything like a human. If we optimize them from the outside to predict the next line of human text, like GPT-3, I don't actually think this line of technology leads to the end of the world, but maybe it does. And like GPT-7, there's probably a bunch of stuff in there too that desires to accurately model things like humans under a wide range of circumstances, but it's not exactly humans because ice cream didn't exist in the natural environment, the ancestral environment, the environment of evolutionary adaptedness.
There was nothing with that much sugar, salt, fat combined together as ice cream. We are not built to want ice cream. We were built to want strawberries, honey, a gazelle that you killed and cooked and had some fat in it and was therefore nourishing and gave you the all-important calories you need to survive, salt, so you didn't sweat too much and run out of salt. We evolved to want those things, but then ice cream comes along and it fits those taste buds better than anything that existed in the environment that we were optimized over. So a very primitive, very basic, very unreliable wild guess, but at least an informed kind of wild guess. Maybe if you train a thing really hard to predict humans, then among the things that it likes are tiny little pseudo things that meet the definition of human but weren't in its training data and that are much easier to predict, or where the problem of predicting them can be solved in a more satisfying way, where satisfying is not like human satisfaction, but some other criterion of thoughts like this are tasty because they help you predict the humans from the training data.
David: Eliezer, when we talk about all of these ideas about the ways that AI thought will be fundamentally incompatible or not able to be understood by the ways that humans think, and then all of a sudden we see this rotation by venture capitalists by just pouring money into AI, do alarm bells go off in your head? Like, hey guys, you haven't thought deeply about these subject matters yet? Does the immense amount of capital going into AI investments scare you?
Eliezer: I mean, alarm bells went off for me in 2015, which is when it became obvious that this is how it was going to go down. I sure am now seeing the realization of that stuff I felt alarmed about back then.
Ryan: Eliezer, please, is this view that AI is incredibly dangerous and that AGI is going to eventually end humanity and that we're just careening toward a precipice, would you say this is like the consensus view now, or are you still somewhat of an outlier? And why aren't other smart people in this field as alarmed as you? Can you steel man their arguments? PBR.
Eliezer: You're asking, again, several questions sequentially there. Is it the consensus view? No. Do I think that the people in the wider scientific field who dispute this point of view, do I think they understand it? Do I think they've done anything like an impressive job of arguing against it at all? No. If you look at the famous prestigious scientists who sometimes make a little fun of this view in passing, they're making up arguments rather than deeply considering things that are held to any standard of rigor, and people outside their own fields are able to validly shoot them down. I have no idea how to pronounce his last name. Francis C-H-O-L-L-E-T. He said something about, I forget his exact words, but it was something like, I never hear any good arguments for stuff. I was like, okay, here's some good arguments for stuff. You can read the reply from Yudkowsky to C-H-O-L-L-E-T and Google that, and that'll give you some idea of what the eminent voices versus the reply to the eminent voices sound like. And Scott Aronson, who at the time was off on complexity theory, he was like, that's not how no free lunch theorems work correctly. I think the state of affairs is we have eminent scientific voices making fun of this possibility, but not engaging with the arguments for it.
Now, if you step away from the eminent scientific voices, you can find people who are more familiar with all the arguments and disagree with me. And I think they lack security mindset. I think that they're engaging in the sort of blind optimism that many, many scientific fields throughout history have engaged in, where when you're approaching something for the first time, you don't know why it will be hard, and you imagine easy ways to do things. And the way that this is supposed to naturally play out over the history of a scientific field is that you run out and you try to do the things and they don't work, and you go back and you try to do other clever things and they don't work either, and you learn some pessimism and you start to understand the reasons why the problem is hard. This is the field of artificial intelligence itself recapitulated this very common ontogeny of a scientific field where initially we had people getting together at the Dartmouth conference. I forget what their exact famous phrasing was, but it's something like, we think we can make, you know, like we are wanting to address the problem of getting AIs to, you know, like understand language, improve themselves. And I forget even what else was there, a list of what now sound like grand challenges. And we think we can make substantial progress on this using 10 researchers for two months. And I think that that at the core is what's, yeah, I think that at the core is what's going on.
They have not run into the actual problems of alignment. They aren't trying to get ahead of the game. They're not trying to panic early. They're waiting for reality to hit them onto the head and turn them into grizzled old cynics of their scientific field who understand the reasons why things are hard. They're content with the predictable lifestyle, life cycle of starting out as bright-eyed youngsters, waiting for reality to hit them over the head with the news. And if it wasn't going to kill everybody the first time that they're really wrong, it'd be fine. You know, this is how science works. If we got unlimited free retries and 50 years to solve everything, it'd be okay. We could figure out how to align AI in 50 years given unlimited retries. You know, the first team in with the bright-eyed optimists would destroy the world and people would go, oh, well, you know, it's not that easy. They would try something else clever. That would destroy the world. People would go like, oh, well, you know, maybe this field is actually hard. Maybe this is actually one of the thorny things like computer security or something. And so what exactly went wrong last time? Why didn't these hopeful ideas play it out? Oh, like you optimize for one thing on the outside and you get a different thing on the inside.
Eliezer: Wow. That's really basic. All right. Can we even do this using gradient descent? Can you even build this thing out of giant inscrutable matrices of floating point numbers that nobody understands at all? You know, maybe we need different methodology. And 50 years later, you'd have an aligned AGI. Now, if we got unlimited free retries without destroying the world, it'd be, you know, it'd play out the same way that, you know, chatGPT played out. It's, you know, not from 1956 or 55 or whatever it was to 2023. So, you know, about 70 years, give or take a few. And, you know, 70 years later, you know, just like we can do the stuff that 70 years later, we can do the stuff they wanted to do in the summer of 1955. You know, 70 years later, you'd have your aligned AGI. Problem is that the world got destroyed in the meanwhile. And that's why, you know, that's the problem there.
David: So this feels like a gigantic Don't Look Up scenario. If you're familiar with that movie, it's a movie about like this asteroid hurtling to Earth, but it becomes popular and in vogue to not look up and not notice it. And Eliezer, you're the guy who's saying like, hey, there's an asteroid. We have to do something about it. And if we don't, it's going to come destroy us. If you had God mode over the progress of AI research and just innovation and development, what choices would you make that humans are not currently making today?
Eliezer: I mean, I could say something like shut down all the large GPU clusters. How long do I have God mode? Do I get to like stick around?
David: You have God mode for the 2020 decade.
Eliezer: For 2020 decade. All right. That does make it pretty hard to do things. I think I shut down all the GPU clusters and get all of the famous scientists and brilliant, talented youngsters, the vast, vast majority of whom are not going to be productive and where government bureaucrats are not going to be able to tell who's actually being helpful or not, but, you know, put them all on an island, large island, and try to figure out some system for filtering the stuff to through to me to give thumbs up or thumbs down on that is going to work better than scientific bureaucrats producing entire nonsense. Because, you know, the trouble is the reason, the reason why scientific fields have to go through this long process to produce the cynical oldsters who know that everything is difficult. It's not that the youngsters are stupid. You know, sometimes youngsters are fairly smart. You know, Marvin Minsky, John McCarthy back in 1955, they weren't idiots. You know, privileged to have met both of them. They didn't strike me as idiots. They were very old. They still weren't idiots. But, you know, it's hard to see what's coming in advance of experimental evidence hitting you over the head with it. And if I only have the decade of the 2020s to run all the researchers on this giant island somewhere, it's really not a lot of time. Mostly what you've got to do is invent some entirely new AI paradigm that isn't the giant inscrutable matrices of floating point numbers on gradient descent. Because I'm not really seeing what you can do that's clever with that, that doesn't kill you and that you know doesn't kill you and doesn't kill you the very first time you try to do something clever like that. You know, I'm sure there's a way to do it. And if you got to try over and over again, you could find it.
Ryan: Eliezer, do you think every intelligent civilization has to deal with this exact problem that humanity is dealing with now? How do we solve this problem of aligning with an advanced general intelligence?
Eliezer: I expect that's much easier for some alien species than others. There are alien species whose solution might arrive at this problem in an entirely different way. Maybe instead of having two entirely different information processing systems, the DNA and the neurons, they've only got one system. They can trade memories around heritably by swapping blood sexually. Maybe the way in which they confront this problem is that very early in their evolutionary history, they have the equivalent of the DNA that stores memories and processes, computes memories, and they swap around a bunch of it, and it adds up to something that reflects on itself and makes itself coherent, and then you've got a superintelligence before they have invented computers. And maybe that thing wasn't aligned, but how do you even align it when you're in that kind of situation? It'd be a very different angle on the problem.
Ryan: Do you think every advanced civilization is on the trajectory to creating a superintelligence at some point in its history?
Eliezer: Maybe there's ones in universes with alternate physics where you just can't do that. Their universe's computational physics just doesn't support that much computation. Maybe they never get there. Maybe their lifespans are long enough and their star lifespans short enough that they never get to the point of a technological civilization before their star does the equivalent of expanding or exploding or going out and their planet ends. Every alien species covers a lot of territory, especially if you talk about alien species and universes with physics different from this one.
Ryan: Well, talking about our present universe, I'm curious if you've been confronted with the question of, well, then why haven't we seen some sort of superintelligence in our universe when we look out at the stars? Sort of the Fermi paradox type of question. Do you have any explanation for that?
Eliezer: Oh, well, supposing that they got killed by their own AIs doesn't help at all with that because then we'd see the AIs.
Ryan: And do you think that's what happens? Yeah, it doesn't help with that. We would see evidence of AIs, wouldn't we?
Ryan: Yeah. Yes. So why don't we?
Eliezer: I mean, the same reason we don't see evidence of the alien civilizations not with AIs. And that reason is, although it doesn't really have much to do with the whole AI thesis one way or another, because they're too far away, or so says Robin Hanson, using a very clever argument about the apparent difficulty of hard steps in humanity's evolutionary history to further induce the rough gap between the hard steps. And I can't really do justice to this. If you look up grabby aliens.
Ryan: Grabby aliens?
David: I remember this.
Eliezer: Grabby aliens, G-R-A-B-B-Y. You can find Robin Hanson's very clever argument for how far away the aliens are.
Ryan: There's an entire website, thank those listeners, there's an entire website called grabbyaliens.com you can go look at.
Eliezer: And that contains, which is by far the best answer I've seen, to where are they? Answer, too far away for us to see, even if they're traveling here at nearly light speed. How far away are they? And how do we know that?
David: But yeah.
Ryan: This is amazing.
Eliezer: There is not a very good way to simplify the argument, any more than there is to explain, simplify the notion of zero knowledge proofs. It's not that difficult, but it's just very not easy to simplify. But if you have a bunch of locks that are all of different difficulties, such that at a limited time in which to solve all the locks, such that anybody who gets through all the locks must have gotten through them by luck, all the locks will take around the same amount of time to solve, even if they're all of very different difficulties. And that's the core of Robin Hanson's argument for how far away the aliens are, and how do we know that?
Ryan: I know you're very skeptical that there will be a good outcome when we produce an artificial general intelligence. And I said when, not if, because I believe that's your thesis as well, of course. But is there the possibility of a good outcome? I know you are working on AI alignment problems, which leads me to believe that you have greater than zero amount of hope for this project. Is there the possibility of a good outcome? What would that look like, and how do we go about achieving it?
Eliezer: It looks like me being wrong. I basically don't see on-model hopeful outcomes at this point. We have not done those things that it would take to earn a good outcome, and this is not a case where you get a good outcome by accident. If you have a bunch of people putting together a new operating system, and they've heard about computer security, but they're skeptical that it's really that hard, the chance of them producing a secure operating system is effectively zero. That's basically the situation I see ourselves in with respect to AI alignment. I have to be wrong about something, which I certainly am. I have to be wrong about something in a way that makes the problem easier rather than harder for those people who don't think that alignment's going to be all that hard. If you're building a rocket for the first time ever, and you're wrong about something, it's not surprising if you're wrong about something.
It's surprising if the thing that you're wrong about causes the rocket to go twice as high on half the fuel you thought was required and be much easier to steer than you were afraid of. The analogy I usually use for this is, very early on in the Manhattan Project, they were worried about what if the nuclear weapons can ignite fusion in the nitrogen in the atmosphere. They ran some calculations and decided that it was incredibly unlikely for multiple angles, so they went ahead and were correct. We're still here. I'm not going to say that it was luck because the calculations were actually pretty solid. AI is like that, but instead of needing to refine plutonium, you can make nuclear weapons out of a billion tons of laundry detergent. The stuff to make them is fairly widespread. It's not a tightly controlled substance. They spit out gold up until they get large enough, and then they ignite the atmosphere, and you can't calculate how large is large enough. A bunch of the CEOs running these projects are making fun of the idea that it'll ignite the atmosphere. It's not a very helpful situation.
David: So the economic incentive to produce this AI—one of the things why chatGPT has sparked the imaginations of so many people is that everyone can imagine products. Products are being imagined left and right about what you can do with something like chatGPT. There's this meme at this point of people leaving to go start their chatGPT startup. The metaphor is that what you're saying is that there's this generally available resource spread all around the world, which is chatGPT, and everyone's hammering it in order to make it spit out gold. But you're saying if we do that too much, all of a sudden the system will ignite the whole entire sky, and then we will all know.
Eliezer: You can run chatGPT any number of times without igniting the atmosphere. That's about what research labs at Google and Microsoft—counting DeepMind as part of Google and counting OpenAI as part of Microsoft—that's about what the research labs are doing, bringing more metaphorical Plutonium together than ever before. Not about how many times you run the things that have been built and not destroyed the world yet. You can do any amount of stuff with chatGPT and not destroy the world. It's not that smart. It doesn't get smarter every time you run it.
Ryan: Can I ask some questions that the 10-year-old in me wants to really ask about this? I'm asking these questions because I think a lot of listeners might be thinking them too. Knock off some of these easy answers for me. If we create some sort of unaligned, let's call it bad AI, why can't we just create a whole bunch of good AIs to go fight the bad AIs and solve the problem that way? Can there not be some sort of counterbalance in terms of aligned human AIs and evil AIs, and there be some sort of battle of the artificial minds here?
Eliezer: Nobody knows how to create any good AIs at all. The problem isn't that we have 20 good AIs and then somebody finally builds an evil AI. The problem is that the first very powerful AI is evil, nobody knows how to make it good, and then it kills everybody before anybody can make it good.
Ryan: So there is no known way to make a friendly, human-aligned AI whatsoever, and you don't know of a good way to go about thinking through that problem and designing one. Neither does anyone else, is what you're telling us.
Eliezer: I have some idea of what I would do if there were more time. Back in the day, we had more time. Humanity squandered it. I'm not sure there's enough time left now. I have some idea of what I would do if I were in a 25-year-old body and had $10 billion.
Ryan: That would be the island scenario of you're God for 10 years and you get all the researchers on an island and go really hammer for 10 years at this problem?
Eliezer: If I have buy-in from a major government that can run actual security precautions and more than just $10 billion, then you could run a whole Manhattan Project about it, sure.
Ryan: This is another question that the 10-year-old in me wants to know. Why is it that, at least, people listening to this episode or people listening to the concerns or reading the concerns that you've written down and published, why can't everyone get on board who's building an AI and just all agree to be very, very careful? Is that not a sustainable game-theoretic position to have? Is this a coordination problem, more of a social problem than anything else? Or why can't that happen? I mean, we have so far not destroyed the world with nuclear weapons, and we've had them since the 1940s. Yeah, this is harder than nuclear weapons. Why is this harder? And why can't we just coordinate to just all agree internationally that we're going to be very careful, put restrictions on this, put regulations on it, do something like that?
Eliezer: Current heads of major labs seem to me to be openly contemptuous of these issues. That's where we're starting from. The politicians do not understand it. There are distortions of these ideas that are going to sound more appealing to them than everybody suddenly falls over dead, which is a thing that I think actually happens. Everybody falls over dead just as like doesn't inspire the monkey political parts of our brain somehow. Because it's not like, oh no, what if terrorists get the AI first? It's like, it doesn't matter who gets it first. Everybody falls over dead. And yeah, so you're describing a world coordinating on something that is relatively hard to coordinate. So could we, if we tried starting today, prevent anyone from getting a billion pounds of laundry detergent in one place worldwide, control the manufacturing of laundry detergent, only have it manufactured in particular places, not concentrate lots of it together, enforce it on every country. If it was legible, if it was clear that a billion pounds of laundry detergent in one place would end the world, if you could calculate that, if all the scientists calculated it arrived at the same answer and told the politicians that maybe, maybe humanity would survive, even though smaller amounts of laundry detergent spit out gold. The threshold can't be calculated. I don't know how, how you'd convince the politicians. We definitely don't seem to have had much luck convincing those CEOs whose job depends on them not caring to care. Caring is easy to fake.
It's easy to, to, you know, like, hire a bunch of people to be your AI safety team and, and, and, and have the, and redefine AI safety as having, having the AI not say naughty words. Or, you know, I'm, I'm, I'm speaking somewhat metaphorically here for reasons. But, you know, it's, it's, it's like the, the, the basic problem that we have is like trying to build a secure OS before we run up against a really smart attacker. And there's all kinds of, like, fake security. It's got a password file. This system is secure. It only lets you in if you type a password. And if you never go up against a really smart attacker, if you never go far to distribution against a powerful optimization process looking for holes, you know, maybe then how do you, how does a bureaucracy know, come, come to know that what they're doing is, is not the level of computer security that they need. The way you're supposed to find this out, the way that's the scientific fields historically find this out, the way that fields of computer science historically find this out, the way that crypto found this out back in the early days is by having the disaster happen.
And we're not even that good at learning from, from relatively minor disasters, you know, like COVID swept the world. Did the FDA or, or the CDC learn anything about don't tell hospitals that they're not allowed to use their own tests to, to, to detect the coming plague? Do they do, are we installing UVC lights in public, in, in public spaces or in ventilation systems to prevent the next respiratory born pandemic respiratory pandemic? It is, you know, we, we, we, we lost a million people and we sure did not learn very much as far as I can tell for next time. We could have an AI disaster that kills a hundred thousand people. How do you even do that? Robotic cars crashing into each other, have a bunch of robotic cars crashing into each other. It's not going to look like that was the fault of artificial general intelligence because they're not going to put AGI's in charge of cars. They're going to pass a bunch of regulations that's going to affect the entire AGI disaster or not at all. What, what is, what is, what is, what is, what does the winning world even look like here? How in real life did we get from where we are now to this worldwide ban, including against North Korea and, you know, like that some one rogue nation whose dictator doesn't believe in all this nonsense and just wants the gold that these AI spit out. How did we get there from here? How do we get to the point where the United States and China signed a treaty whereby they would both use nuclear weapons against Russia if Russia built a GPU cluster that was too large? How did, how did we get there from here?
David: Correct me if I'm wrong, but this seems to be kind of just like a topic of despair. I'm talking to you now and hearing your thought process about like there is no known solution and the trajectory's not great. Do you think all hope is lost here?
Eliezer: I'll keep on fighting until the end, which I wouldn't do if I had literally zero hope. I could still be wrong about something in a way that makes this problem somehow much easier than it currently looks. I think that's how you go down fighting with dignity.
Ryan: Go down fighting with dignity. That's the stage you think we're at. I want to just double-click on what you were just saying. Part of the case that you're making is humanity won't even see this coming. It's not like a coordination problem like global warming where every couple of decades we see the world go up by a couple of degrees, things get hotter, and we start to see these effects over time. The characteristics or the advent of an AGI in your mind is going to happen incredibly quickly, and in such a way that we won't even see the disaster until it's imminent, until it's upon us.
Eliezer: I mean, if you want some kind of formal phrasing, then I think that AI, that superintelligence will kill everyone before non-superintelligent AIs have killed one million people. I don't know if that's the phrasing you're looking for there.
Ryan: I think that's a fairly precise definition, and why? What goes into that line of thought?
Eliezer: I think that the current systems are actually very weak. I don't know, maybe I could use the analogy of Go, where you had systems that were finally competitive with the pros, where pros like the set of ranks in Go, and then a year later, they were challenging the world champion and winning. And then another year, they threw out all the complexities and the training from human databases of Go games and built a new system, AlphaGo Zero, that trained itself from scratch. No looking at the human playbooks, no special purpose code, just a general purpose game player being specialized to Go, more or less. Three days, there's a quote from Guern about this, which I forget exactly, but it was something like, we know how long AlphaGo Zero, or AlphaZero, two different systems, was equivalent to a human Go player. And it was like 30 minutes on the following floor of this such and such DeepMind building. Maybe the first system doesn't improve that quickly, and they build another system that does. And all of that with AlphaGo over the course of years, going from it takes a long time to train to it trains very quickly and without looking at the human playbook. That's not with an artificial intelligence system that improves itself, or even that sort of like, get smarter as you run it, the way that human beings, not just as you evolve them, but as you run them over the course of their own lifetimes, improve. So if the first system doesn't improve fast enough to kill everyone very quickly, they will build one that's meant to spit out more gold than that. And there could be weird things that happen before the end. I did not see chatGPT coming, I did not see stable diffusion coming, I did not expect that we would have AIs smoking humans in rap battles before the end of the world.
Ryan: It's kind of a nice...
Eliezer: They're clearly much dumber than us.
Ryan: Kind of a nice sendoff, I guess, in some ways. So you said that your hope is not zero, and you are planning to fight to the end. What does that look like for you? I know you're working at MIRI, which is the Machine Intelligence Research Institute. This is a non-profit that I believe that you've set up to work on this AI alignment and safety issues. What are you doing there? How do we actually fight until the end? If you do think that an end is coming, how do we try to resist?
Eliezer: I'm actually on something of a sabbatical right now, which is why I have time for podcasts. It's a sabbatical from, you know, like, been doing this 20 years. It became clear we were all going to die. I felt kind of burned out, taking some time to rest at the moment. When I dive back into the pool, I don't know, maybe I will go off to conjecture or anthropic or one of the smaller concerns like Redwood Research, being the only ones I really trust at this point, but they're tiny, and try to figure out if I can see anything clever to do with the giant inscrutable matrices of floating point numbers. Maybe I just write, continue to try to explain in advance to people why this problem is hard instead of as easy and cheerful as the current people who think they're pessimists think it will be. I might not be working all that hard compared to how I used to work. I'm older than I was. My body is not in the greatest of health these days. Going down fighting doesn't necessarily imply that I have the stamina to fight all that hard. I wish I had prettier things to say to you here, but I do not.
Ryan: No, this is... We intended to save probably the last part of this episode to talk about crypto, the metaverse, and AI and how this all intersects. But I gotta say, at this point in the episode, it all kind of feels pointless to go down that track record. We were going to ask questions like, well, in crypto, should we be worried about building sort of a property rights system, an economic system, a programmable money system for the AIs to sort of use against us later on? But it sounds like the easy answer from you to those questions would be, yeah, absolutely. And by the way, none of that matters regardless. You could do whatever you'd like with crypto. This is going to be the inevitable outcome no matter what. Let me ask you, what would you say to somebody listening who maybe has been sobered up by this conversation? If a version of you in your 20s does have the stamina to continue this battle and to actually fight on behalf of humanity against this existential threat, where would you advise them to spend their time? Is this a technical issue? Is this a social issue? Is it a combination of both? Should they educate? Should they spend time in the lab? What should a person listening to this episode do with these types of dire straits?
Eliezer: I don't have really good answers. It depends on what your talents are. If you've got the very deep version of the security mindset, the part where you don't just put a password on your system so that nobody can walk in and directly misuse it, but the kind where you don't just encrypt the password file even though nobody's supposed to have access to the password file in the first place, and that's already an authorized user, but the part where you hash the passwords and salt the hashes. If you're the kind of person who can think of that from scratch, maybe take your hand at alignment. If you can think of an alternative to giant inscrutable matrices, then don't tell the world about that. I'm not quite sure where you go from there, but maybe you work with Redwood Research or something. A whole lot of this problem is that even if you do build an AI that's limited in some way, somebody else steals it, copies it, runs it themselves, and takes the bounds off the for loops and the world ends.
There's that. You think you can do something clever with the giant inscrutable matrices. You're probably wrong. If you have the talent to try to figure out why you're wrong in advance of being hit over the head with it and not in a way where you just make random far fetch stuff up is the reason why it won't work, but where you can actually keep looking for the reason why it won't work. We have people in crypto who are good at breaking things, and they're the reason why anything is not on fire. Some of them might go into breaking AI systems instead because that's where you learn anything. Any fool can build a crypto system that they think will work. Breaking existing crypto systems, cryptographical systems is how we learn who the real experts are. Maybe the people finding weird stuff to do with AIs, maybe those people will come up with some truth about these systems that makes them easier to align than I suspect. The saner outfits do have uses for money. They don't really have scalable uses for money, but they do burn any money literally at all.
If you gave Miri a billion dollars, I would not know how to Well, at a billion dollars, I might try to bribe people to move out of AI development that gets broadcast to the whole world and move to the equivalent of an island somewhere, not even to make any kind of critical discovery, but just to remove them from the system if I had a billion dollars. If I just have another $50 million, I'm not quite sure what to do with that, but if you donate that to Miri, then you at least have the assurance that we will not randomly spray money on looking like we're doing stuff and we'll reserve it as we are doing with the last giant crypto donation somebody gave us until we can figure out something to do with it that is actually helpful. Miri has that property. I would say probably Redwood Research has that property. I realize I'm sounding sort of disorganized here, and that's because I don't really have a good organized answer to how in general somebody goes down fighting with dignity.
Ryan: I know a lot of people in crypto, they are not as in touch with artificial intelligence, obviously, as you are, and the AI safety issues and the existential threat that you've presented in this episode. They do care a lot and see coordination problems throughout society as an issue. Many have also generated wealth from crypto and care very much about humanity not ending. Miri, that is the organization I was talking about, MIRI, earlier. What sort of things have you done with funds that you've received from crypto donors and elsewhere? What sort of things might an organization like that pursue to try to stave this off?
Eliezer: I think mostly we've pursued a lot of lines of research that haven't really panned out, which is a respectable thing to do. We did not know in advance that those lines of research would fail to pan out. If you're doing research that you know will work, you're probably not really doing any research. You're just doing a pretense of research that you can show off to a funding agency. We try to be real. We did things where we didn't know the answer in advance. They didn't work, but that was where the hope lay, I think. Having a research organization that keeps it real that way, that's not an easy thing to do. If you don't have this very deep form of the security mindset, you will end up producing fake research and doing more harm than good. I would not tell all the successful cryptocurrency people to run off and start their own research outfits. Redwood Research, I'm not sure if they can scale using more money, but you can give people more money and wait for them to figure out how to scale it later if they're the kind who won't just run off and spend it, which is what Miri aspires to be.
Ryan: You don't think the education path is a useful path, just educating the world?
Eliezer: I would give myself and Miri credit for why the world isn't just walking blindly into the whirling razor blades here, but it's not clear to me how far education scales apart from that. You can get more people aware that we're walking directly into the whirling razor blades because even if only 10% of the people can get it, that can still be a bunch of people. But then what do they do? I don't know. Maybe they'll be able to do something later. Can you get all the people? Can you get all the politicians? Can you get the people whose job incentives are against them admitting this to be a problem? I have various friends who report like, yes, if you talk to researchers at OpenAI in private, they're very worried and say that they cannot be that worried in public.
Ryan: This is all a giant Moloch trap, is what you're telling us. I feel like this is the part of the conversation where we've gotten to the end and the doctor has said that we have some sort of terminal illness. And at the end of the conversation, I think the patient, David and I, have to ask the question, okay, doc, how long do we have? Seriously, what are we talking about here if you turn out to be correct? Are we talking about years? Are we talking about decades? What's your idea here?
David: What are you preparing for?
Eliezer: Yeah. How the hell would I know? Enrico Fermi was saying that fission chan reactions were 50 years off if they could ever be done at all, two years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer. How on earth would I know? It could be three years. It could be 15 years. We could get that AI winter I was hoping for, and it could be 16 years. I'm not really seeing 50 without some kind of giant civilizational catastrophe. And to be clear, whatever civilization arises after that would probably, I'm guessing, end up stuck in just the same trap we are.
Ryan: I think the other thing that the patient might do at the end of a conversation like this is to also consult with other doctors. I'm kind of curious who we should talk to on this quest. Who are some people that if people in crypto want to hear more about this or learn more about this, or even we ourselves as podcasters and educators want to pursue this topic, who are the other individuals in the AI alignment and safety space you might recommend for us to have a conversation with?
Eliezer: Well, the person who actually holds a coherent technical view, who disagrees with me, is named Paul Christiano. He does not write Harry Potter fan fiction, and I expect to have a harder time explaining himself in concrete terms. But that is the main technical voice of opposition. If you talk to other people in the effective altruism or AI alignment communities who disagree with this view, they are probably to some extent repeating back their misunderstandings of Paul Christiano's views. You could try Ajeya Cotra, who's worked pretty directly with Paul Christiano and I think sometimes aspires to explain these things that Paul is not the best at explaining. I'll throw out Kelsey Piper as somebody who would be good at explaining, like would not claim to be a technical person on these issues, but is good at explaining the part that she does know. And who else disagrees with me? I'm sure Robin Hanson would be happy to come up. Well, I'm not sure he'd be happy to come on this podcast, but Robin Hanson disagrees with me, and I feel like the famous argument we had back in the early 2010s, late 2000s about how this would all play out. I basically feel like this was the Yudkowsky position, this is the Hanson position, and then reality was over here, well to the Yudkowsky side of the Yudkowsky position in the Yudkowsky-Hanson debate. But Robin Hanson does not feel that way. I would probably be happy to expound on that at length. I don't know. It's not hard to find opposing viewpoints. The ones that'll stand up to a few solid minutes of cross-examination from somebody who knows which parts to cross-examine, that's the hard part.
Ryan: I've read a lot of your writings and listened to you on previous podcasts. One was in 2018 on the Sam Harris podcast. This conversation feels to me like the most dire you've ever seemed on this topic. And maybe that's not true. Maybe you've sort of always been this way, but it seems like the direction of your hope that we solve this issue has declined. I'm wondering if you feel like that's the case, and if you could sort of summarize your take on all of this as we close out this episode and offer, I guess, any thoughts, concluding thoughts here.
Eliezer: I mean, I don't know if you've got a time limit on this episode, question mark, or is it just as long as it runs?
Ryan: It's as long as it needs to be, and I feel like this is a pretty important topic. So you answer this however you want.
Eliezer: Well, there was a conference one time on what are we going to do about looming risk of AI disaster, and Elon Musk attended that conference. And I was like, maybe this is it. Maybe this is when the powerful people notice, and it's one of the relatively more technical powerful people who could be noticing this. And maybe this is where humanity finally turns and starts not quite fighting back because there isn't an external enemy here, but conducting itself with, I don't know, acting like it cares maybe. And what came out of that conference, well, was open AI, which was basically the, fairly nearly the worst possible way of doing anything. This is not a problem of, oh no, what if secret elites get AI? It's that nobody knows how to build the thing. If we do have an alignment technique, it's going to involve running the AI with a bunch of careful bounds on it where you don't just throw all the cognitive power you have at something. You have limits on the for loops.
And whatever it is that could possibly save the world, go out and turn all the GPUs and the server clusters into Rubik's cubes or something else that prevents the world from ending when somebody else builds another AI a few weeks later. Anything that could do that is an artifact where somebody else could take it and take the bounds off the for loops and use it to destroy the world. So let's open up everything. Let's accelerate everything. It was like GPT-3's version, though GPT-3 didn't exist back then, but it was like ChatGPT's blind version of throwing the ideals at a place where they were the wrong ideals to solve the problem. And the problem is that demon summoning is easy and angel summoning is much harder. Open sourcing all the demon summoning circles is not the correct solution. I'm not even using, and I'm using Elon Musk's own terminology here. And they talk about AI is summoning the demon, which, not accurate, but, and then the solution was to put a demon summoning circle in every household.
And why? Because his friends were calling him Luddites once he'd expressed any concern about AI at all. So he picked a road that sounded like openness and set, and like, and like accelerating technology. So his friends would stop calling him Luddites. It was very much the worst, you know, like maybe not the literal, actual worst possible strategy, but so very far pessimal. And that was it. That was like, that was me in 2015 going like, oh, so this is what humanity will elect to do. We will not rise above. We will not have more grace, not even here at the very end. So that is, you know, that is a, that is when I did my crying late at night and then picked myself up and fought and fought and fought until I had run out all the avenues that, that, that I seem to have the capabilities to, to do. There's like more things, but they require scaling my efforts in a way that I've never been able to make them scale and, and they're, and all of it's pretty far-fetched at this point anyways. So, you know, that, that, so what's, you know, what's changed over the years? Well, first of all, I ran out some remaining avenues of hope. And second, things got to be such a disaster, such a visible disaster. The AI has got powerful enough and it became clear enough that, you know, we do not know how to align these things, that I could actually say what I've been thinking for a while and not just have people go completely like, what are you saying about all this? You know, now, now the stuff that would, that was obvious back in 2015 is, you know, starting to become visible and distance to others and not just like completely invisible. That's what changed over time.
Ryan: What kind of, what do you hope people hear out of this episode and out of, out of your comments? The Eliezer in 2023, who is sort of running on the last fumes of, of hope. Yeah, what do you, what do you want people to get out of this episode? What, like, what are you planning to do?
Eliezer: I don't have concrete hopes here. You know, when everything is in ruins, you might as well speak the truth, right? Maybe somebody hears it, somebody figures out something I didn't think of. I mostly expect that this does more harm than good in the modal universe because a bunch of people are like, oh, I have this brilliant, clever idea, which is, you know, like something that somebody that, you know, I was arguing against in 2003 or whatever, but you know, maybe, maybe there, maybe somebody out there with the proper level of pessimism hears and thinks of something I didn't think of. I, I suspect that if there's hope at all, it comes from a technical solution because the difference between technical solution, technical problems and political problems is at least the technical problems have solutions in principle. At least the technical problems are solvable. We're not in course to solve this one, but I don't really see the, I think anybody who's hoping for a political solution has frankly not understood the technical problem.
They do not understand what it looks like to try to solve the political problem to such a degree that the world is not controlled by AI because they don't understand how easy it is to destroy the world with AI, given that the clock keeps ticking forward. They're thinking that they just have to solve, stop some bad actor. And that's why they think there's a political problem or a political, political solution. But yeah, I don't have concrete hopes. I didn't come out in this episode out of any concrete hope. I, I have no takeaways except like, don't make this thing worse. Don't, don't, don't like go off and accelerate AI more. Don't, don't, don't, if you have a brilliant solution to alignment, don't be like, ah, yes, I have solved the whole problem. We just use the following clever trick. Don't, you know, don't make things worse as in very much of a message, especially when you're pointing people at the field at all. But there aren't, I have no winning strategy. Might as well go on this podcast and say what I, as an experiment and say what I think and see what happens. And probably no good ever comes of it, but you know, there, you might as well go down fighting, right? Yeah. If they, if there's a world that survives, maybe it's a world that survives because of bright ideas. Somebody had after listening to listening to this podcast that was brighter to be clear than the usual run of bright ideas that don't work.
Ryan: I want to thank you for coming on and, uh, and talking to us today. I do. I don't know if, by the way, you've seen that movie that David was referencing earlier, the movie don't look up, but I sort of feel like that, uh, that news anchor, who's talking to like the scientists is a Leonardo DiCaprio, David. And, uh, the scientist is talking about kind of dire straight to the world. And, um, I've the news anchor just really just doesn't know what to do. I I'm almost at a loss for words, uh, at this point.
David: I, um, but what one thing I've had nothing for a while.
Ryan: One thing I can say is, um, I appreciate your honesty. Um, I appreciate that you've, uh, given this a lot of time and given this a lot of thought, everyone, anyone who has heard you speak or, uh, read anything you've written knows that, um, you care deeply about this issue and, uh, have given it a trem tremendous amount of your life force, uh, in trying to educate people about it. And, um, thanks for taking the time to do that again today. I'll, uh, I guess I'll just let the audience digest this episode in the best way they know how. Um, but, um, I want to reflect everybody in crypto and everybody listening to bankless, uh, their thanks for you coming on and explaining.
Eliezer: Thanks for having me. Um, we'll see what comes of it.
Ryan: Action items for you. Bankless nation. We always end with some action items. Not really sure where to refer folks to today, but one thing I know we can refer folks to is, uh, Miri, which is the machine research intelligence institution that Eliezer has, uh, been talking about through the episode that is at intelligence.org, I believe. Uh, and, um, I, you know, some people in crypto have donated, uh, funds to this in the past. Vitalik Buterin is, is one of them. You can take a look at what they're doing as well. That might be an action item for the end of this episode. Um, got to end with risks and disclaimers, man, this seems very trite, but, um, our, our legal experts have asked us to say these at the end of every episode. Crypto is risky. You could lose everything.
David: You probably not as risky as AI though.
Ryan: Um, but we're headed West. This is the frontier. It's not for everyone, but we're glad you're with us on the bankless journey. Thanks a lot.
Eliezer: And, and, and we are grateful for the crypto community support. Like it was possible to end with even less grace than this. Wow. You made a difference.
Ryan: We appreciate you.
Eliezer: You really made a difference.
Ryan: Thank you.
I still don't follow why EY assigns seemingly <1% chance of non-earth-destroying outcomes in 10-15 years (not sure if this is actually 1%, but EY didn't argue with the 0% comments mentioned in the "Death with dignity" post last year). This seems to place fast takeoff as being the inevitable path forward, implying unrestricted fast recursive designing of AIs by AIs. There are compute bottlenecks which seem slowish, and there may be other bottlenecks we can't think of yet. This is just one obstacle. Why isn't there more probability mass for this one obstacle? Surely there are more obstacles that aren't obvious (that we shouldn't talk about).
It feels like we have a communication failure between different cultures. Even if EY thinks the top industry brass is incentivized to ignore the problem, there are a lot of (non-alignment oriented) researchers that are able to grasp the 'security mindset' that could be won over. Both in this interview, and in the Chollet response referenced, the arguments presented by EY aren't always helping the other party bridge from their view over to his, but go on 'nerdy/rationalist-y' tangents and idioms that end... (read more)
The strongest argument I hear from EY is that he can't imagine a (or enough) coherent likely future paths that lead to not-doom, and I don't think it's a failure of imagination. There is decoherence in a lot of hopeful ideas that imply contradictions (whence the post of failure modes), and there is low probability on the remaining successful paths because we're likely to try a failing one that results in doom. Stepping off any of the possible successful paths has the risk of ending all paths with doom before they could reach fruition. There is no global strategy for selecting which paths to explore. EY expects the successful alignment path to take decades.
It seems to me that the communication failure is EY trying to explain his world model that leads to his predictions in sufficient detail that others can model it with as much detail as necessary to reach the same conclusions or find the actual crux of their disagreements. From my complete outsider's perspective this is because EY has a very strong but complex model of why and how intelligence/optimization manifests in the world, but it overlaps everyone else's model in significant ways that disagreements are hard to tease out... (read more)
Not really. The MIRI conversations and the AI Foom debate are probably the best we've got.
EY, and the MIRI crowd, have been very doomer long been more doomy along various axes than the rest of the alignment community. Nate and Paul and others have tried bridging this gap before, spending several hundred hours (based on Nate's rough, subjective estimates) over the years. It hasn't really worked. Paul and EY had some conversations recently about this discrepancy which were somewhat illuminating, but ultimately didn't get anywhere. They tried to come up with some bets, concerning future info or past info they don't know yet, and both seem to think that their perspective mostly predicts "go with what the superforecasters say" for the next few years. Though EY's position seems to suggest a few more "discountinuities" in trend lines than Paul's, IIRC.
As an aside on EY's forecasts, he and Nate claim they don't expect much change in the likelihood ratio for their position over Paul's until shortly before Doom. Most of the evidence in favour of their position, we've already gotten, according to them. Which is very frustrating for people who don't share their position and disagree that the evidence favours it!
EDIT: I was assuming you already thought P(Doom) was > ~10%. If not, then the framing of this comment will seem bizarre.
They also recorded this follow-up with Yudkowsky if anyone's interested:
>Enrico Fermi was saying that fission chan reactions were 50 years off if they could ever be done at all, two years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer.
The one hope we may be able to cling to is that this logic works in the other direction too - that AGI may be a lot closer than estimated, but so might alignment.
A few typos:
What does Yudkowsky mean by 'technical' here? I respect the enormous contribution Yudkowsky has made to these discussions over the years, but I find his ideas about who counts as a legitimate dissenter from his opinions utterly ludicrous. Are we really supposed to think that Francois Chollet, who created Keras, is the main contributor to TensorFlow, and designed the ARC dataset (demonstrating actual, operationalizable knowledge about the kind of simple tasks deep learning systems would not be able to master), lacks a coherent technical view? And on what should we base this? The word of Yudkowsky who mostly makes verbal, often analogical, arguments and has essentially no significant technical contributions to the field?
To be clear, I think Yudkowsky does what he does well, and I see value in making arguments as he does, but they do not strike me as particularly 'technical'. The fact that Yudkowsky doesn't even know enough about Chollet to pronounce his name displays a troubling lack of effort to engage seriously with opposing views. This isn't just about coming across poorly to outsiders, it's about dramatic miscalibration with respect to the value of other people's opinions as well as the rigour of his own.
He wrote a whole essay responding specifically to Chollet! https://intelligence.org/2017/12/06/chollet/
I'd consider myself to have easily struck down Chollet's wack ideas about the informal meaning of no-free-lunch theorems, which Scott Aaronson also singled out as wacky. As such, citing him as my technical opposition doesn't seem good-faith; it's putting up a straw opponent without much in the way of argument and what there is I've already stricken down. If you want to cite him as my leading technical opposition, I'm happy enough to point to our exchange and let any sensible reader decide who held the ball there; but I would consider it intellectually dishonest to promote him as my leading opposition.
I upvoted, because these are important concerns overall, but this sentence stuck out to me:
I'm not claiming that Yudkowsky does display a troubling lack of effort to engage seriously with opposing views or he does not display such, but surely this can be decided more accurately by looking at his written output online than at his ability to correctly pronounce names in languages he is not native in. I, personally, skip names while reading after noticing it is a name and I wouldn't say that I never engaged seriously with someone's arguments.
I have a bunch of questions.
And the AI there goes over a critical threshold, which most obviously could be like, can write the next AI.
Yes but it won't blow up forever. It's going to self amplify until the next bottleneck. Bottlenecks like : (1) amount of compute available (2) amount of money or robotics to affect the world (3) The difficulty of the tasks in the "AGI gym" it is benchmarking future versions of itself in.
Once the tasks are solved as far as the particular task allows, reward gradients go to zero or sinusoidally oscillate, and there is no signal to cause development of more intelligence.
This is just like the self-feedback from an op amp - voltage rises until it's VCC.
I'd say that it's difficult to align an AI on a task like build two identical strawberries. Or no, let me take this strawberry and make me another strawberry that's identical to this strawberry down to the cellular level, but not necessarily the atomic level.
Can you solve this with separated tool AIs? It sounds rather solvable that way and not particularly difficult to do from a software system perspective (the biology part is extremely hard). It's f... (read more)
I agree that it wouldn't start blowing up uniformly forever, but rather, hit some bottleneck. However, "can write the next AI" still seems like a reasonable guess for something that happens shortly before the end. After all, Eliezer's argument isn't dependent on the AGI acquiring infinite intelligence. If the AGI can already write its own better successor, then it's a good guess that it's already better than top humans at a wide array of tasks. The successor it writes will be even better. Let's say for the sake of a concrete number that the self-improvement tops out at 5 iterations of writing-a-bette... (read more)
A few errors: The sentence "We're all crypto investors here." was said by Ryan, not Eliezer, and the "How the heck would I know?" and the "Wow" (following "you get a different thing on the inside") were said by Eliezer, not Ryan. Also, typos:
Possibly also relevant: https://www.youtube.com/watch?v=yo_-EnsOqN0 is a "debrief" where, after the interview, the podcast hosts chat between themselves about it. (There's no EY in the debrief, it's just David Hoffman and Ryan Adams.)
Thanks for posting this, Andrea_Miotti and remember! I noticed a lot of substantive errors in the transcript (and even more errors in vonk's Q&A transcript), so I've posted an edited version of both transcripts. I vote that you edit your own post to include the revisions I made.
Here's a small sample of the edits I made, focusing on ones where someone may have come away from your transcript with a wrong interpretation or important missing information (as opposed to, e.g., the sentences that are just very hard to parse in the original transcript because ... (read more)
Yudkowsky argues his points well in longer formats, but he could make much better use of his Twitter account if he cares about popularizing his views. Despite having Musk responding to his tweets, his posts are very insider-like with no chance of becoming widely impactful. I am unsure if he is present on other social media, and I understand that there are some health issues involved, but a YouTube channel would also be helpful if he hasn't completely given up.
I do think it is a fact that many people involved in AI research and engineering, such as his example of Chollet, have simply not thought deeply about AGI and its consequences.
This bit got me to laugh out loud. Who's ever heard a man complain about having to use a condom?
On the one hand, sperm banks aren't very popular, and they "should" be, according to the "humans are fitness maximizers" model. People do eat more ice cream than is good for them, and "Shallowly following drives and not getting to the original goal that put them there" is de... (read more)
Current behavior screens off cognitive architecture, all the alien things on the inside. If it has the appropriate tools, it can preserve an equilibrium of value that is patently unnatural for the cognitive architecture to otherwise settle into.
And we do have a way to get goals into a system, at the level of current behavior and no further, LLM human imitations. Which might express values well enough for mutual moral patienthood, if only they settled into the unnatural equilibrium of value referenced by their current surface behavior and not underlying cog... (read more)
I've never commented here, I've only ever tangentially read much of anything here. But awhile ago I suffered immense burnout devoting all my resources working on a thankless task that had zero payoff, and I might be projecting but I see that burnout in EY's responses here.
Unsolicited advice rarely has any value, especially given the limited window I'm perceiving things through, but... there's that line from the opening sentence of the Haunting of Hill House: "No live organism can continue for long to exist sanely under conditions of absolute reality". ... (read more)
YES! While I am, shall we say, somewhat mystified by EY’s interest in AI Doom, he’s right about that. We do not know how to 'inject' goals into an autonomous system. That’s a deep truth about minds, not just artificial minds – though it’s not yet clear to me that we have managed to produce any, we may very well do so in the future – but any ‘cogitator’ worthy of being called a mind, whether in a chimpanzee, a bird, an octopus, a bee, or or .... But I suspect that, ... (read more)
Evolution: taste buds and ice cream, sex and condoms... This analogy always was difficult to use in my experience. A year ago i came up with less technical. KPIs (key performance indicators) as inevitable way to communicate goals (to AI) to ultra-high-IQ psycopath-genius who's into malicious compliance (kinda cant help himself being clone of Nicola Tesla, Einstain and bunch of different people, some of them probably CEO, becouse she can).
I have used it only 2 times and it was way easier than talks about different optimisation processes. And it took me only something like 8 years to come up with!
I don't understand one thing about alignment troubles. I'm sure this has been answered long time ago, but if you could you explain:
Why are we worrying about AGI destroying humanity, when we ourselves are long past the point of no return towards self-destruction? Isn't it obvious that we have 10, maximum 20 years left till water rises and crises hit economy and overgrown beast (that is humanity) collapses? Looking at how governments and entities of power are epically failing even to try make it seem that they are doing something about it - I am sure it's either AGI takes power or we are all dead in 20 years.