YouTube description:

We wanted to do an episode on AI… and we went deep down the rabbit hole. As we went down, we discussed ChatGPT and the new generation of AI, digital superintelligence, the end of humanity, and if there’s anything we can do to survive. 

This conversation with Eliezer Yudkowsky sent us into an existential crisis, with the primary claim that we are on the cusp of developing AI that will destroy humanity. 

Be warned before diving into this episode, dear listener. Once you dive in, there’s no going back.

New Comment
55 comments, sorted by Click to highlight new comments since:

I liked the sincerity of the podcast hosts and how they adjusted their plan for the podcast by not going into "importance of AI for crypto" questions after realizing that Eliezer's view is basically "we all seem doomed, so obviously stuff like crypto is a bit pointless to talk about." 

Pretty sobering, and it's pretty clear that it's beyond the time we got serious about this. I might put together a post of my own calling for any creative ideas that ordinary people can implement to help the cause, but the most obvious thing is to raise awareness. I hope Yudkowsky gets the chance to do a lot more interviews like this. 

Yudkowsky, if you ever see this, please don't give up hope. Crazy breakthroughs do happen, and more people are getting into alignment as time goes on. 

I can't even get a good answer of "What's the GiveWell of AI Safety" so I can quickly donate to a very reputable and agreed upon option with little thinking without at best getting old lists to a ton of random small orgs and giving up. I'm not very optimistic ordinary less convinced people who want to help are having an easier time.

Long-Term Future Fund. They give grants to AI safety orgs and researchers, especially the kind that Yudkowsky would be less sad about.

I’d encourage you to do that.

I've thought for awhile here that the primary problem in alarmism (I'm one) is painting a concise, believable picture.  It takes a willful effort and  open mind to build a "realistic" picture of this here-to-fore unknown mechanism/organism for oneself, and is near impossible to do for someone who is skeptical or opposed to the potential conclusion.

Being a web dev I've brainstormed on occasion ways to build short, crowd-ranked chains of causal steps which people could evaluate for themselves, with various doubts and supporting evidence given to each.   It's still a vague vision which is why I raise it here, to see if anyone else wants to get involved on that abstract design level.  ( hmu )

I think the general problem of painting a believable picture could be solved by a number of different mediums though.  Unfortunately we're drowning in inaccurate or indulgent dystopias which end up creating a "boy who cried wolf" effect for earnest ones.


Any ideas you have for overcoming the problems of alarmism would be good. 

Listening to Eliezer walk through a hypothetical fast takeoff scenario left me with the following question:  Why is it assumed that humans will almost surely fail the first time at their scheme for aligning a superintelligent AI, but the superintelligent AI will almost surely succeed the first time at its scheme for achieving its nonaligned outcome? 

Speaking from experience, it's hard to manufacture things in the real world. Doubly so for anything significantly advanced. What is the rationale for assuming that a nonaligned superintelligence won't trip up at some stage in the hypothetical "manufacture nanobots" stage of its plan? 

If I apply the same assumption of initial competence extended to humanity's attempt to align an AGI to that AGI's competence in successfully manufacturing some agentic-increasing tool, then the most likely scenario I get is that we'll see the first unaligned AGI's attempt at takeoff long before it actually succeeds in destroying humanity.

A superintelligent AI could fail to achieve its goals and still get us all killed. For example, it could design a virus that kills all humans, and then find out that there were some parts of economy it did not understand, so it runs out of resources and dies, without converting the galaxy to paperclips. Nonetheless, humans remain dead.

See, this is the perfect encapsulation of what I'm saying - it could design a virus, sure. But when it didn't understand parts of the economy, that's all it would be - a design. Taking something from the design stage to the "physical, working product with validated processes that operate with sufficient consistency to achieve the desired outcome" is a vast, vast undertaking, one that requires intimate involvement with the physical world. Until that point is reached, it's not a "kill all humans but fail to paperclip everyone" virus, it's just a design concept. Nothing more. More and more I see those difficulties being elided over by hypothetical scenarios that skip straight from the design stage and presuppose that the implementation difficulties aren't worth consideration, or that if they are they won't serve as a valid impediment. 

It's hard to manufacture things, but it's not that hard to do so in a way that pretty much can't kill you. Just keep the computation that is you at somewhat of a distance from the physical experiment, and don't make stuff that might consume the whole earth. Making a general intelligence is an extraordinary special case: if you're actually doing it, it might self-improve and then kill you.

It is the very same rationale that stands behind assumptions like "why Stockfish won't execute losing set of moves" - it is just that good at chess. Or better - it is just that smart when it come down to chess.

In this thought experiment the way to go is not to "i see that AGI could likely fail at this step, therefore it will fail" but to keep thinking and inventing better moves for AGI to execute, which won't be countered as easily. It is an important part of "security mindset" and probably major reason why Eliezer speaks about lack of pessimism in the field.

There exists a diminishing returns to thinking about moves versus performing the moves and seeing the results that the physics of the universe imposes on the moves as a consequence. 

Think of it like AlphaGo - if it only ever could train itself by playing Go against actual humans, it would never have become superintelligent at Go. Manufacturing is like that - you have to play with the actual world to understand bottlenecks and challenges, not a hypothetical artificially created simulation of the world. That imposes rate-of-scaling limits that are currently being discounted. 

Think of it like AlphaGo - if it only ever could train itself by playing Go against actual humans, it would never have become superintelligent at Go.

This is obviously untrue in both the model-free and model-based RL senses. There are something like 30 million human Go players who can play a game in two hours. AlphaGo was trained on policy gradients from, as it happens, on the order of 30m games; so it could accumulate a similar order of games in under a day; the subset of pro games can be upweighted to provide most of the signal - and when they stop providing signal, well then, it must have reached superhuman... (For perspective, a good 0.5m or whatever professional games used to imitation-train AG came from a single Go server, which was not the most popular, and that's why AlphaGo Master ran its pro matches on a different larger server.) Do this for a few days or weeks, and you will likely have exactly that, in a good deal less time than 'never', which is a rather long time. More relevantly, because you're not making a claim about the AG architecture specifically but about all learning agents in general: with no exploration, MuZero can bootstrap its model-based self-play from somewhere in the neighborhood of hundreds/thousands of 'real' games (as should not be a surprise as Go rules are simple), and achieves superhuman gameplay easily by self-play inside the learned model, with little need for any good human opponents at all; even if that is 3 orders of magnitude off, it's still within a day of human gameplay sample-size. Or consider meta-learning sim2real like Dactyl which are trained exclusively in silico on unrealistic simulations, and adapt within seconds to reality. So either way. The sample-inefficiency of DL robotics, DL, or R&D, is more of a fact about our compute-poverty than it is about the inherent necessity of interacting with the real world (which is both highly parallelizable, learnable offline, and far smaller than existing methods).

Lack of clarity when i think about this limits makes hard for me to see how end result will change if we could somehow "stop discounting" them. 
It seems to my that we will have to be much more elaborete in describing parameters of this thought experiment. In particular we will have to agree on deeds and real world achivments that hypothetical AI has, so we will both agree to call it AGI (like writing interesting story and making illustrations so this particular research team now have a new revenue strem from selling it online - will this make AI an AGI?). And security conditions (air-gapped server-room?). This will get us closer to understanding "the rationale".
But then your question is not about AGI but "superintelligent AI" so we will have to do elaborate describing again with new parameters. And that is what i expect Eliezer (alone and with other people) had done a lot. And look what it did to him (this is a joke but at the same time - not). So i will not be an active participant further. 
It is not even about a single SAI in some box: compeeting teams, people running copies (legal and not) and changing code, corporate espionage, dirty code...

Since there was no full transcript of the podcast, I just made one.  You can find it here.

I tweeted my notes of Eliezer's points with abridged clips. Interesting that it seems like he's leaving MIRI

That part was a bit unclear. I guess he could work with redwood/conjecture without necessarily quitting his MIRI position?

Why is Eliezer such a downer? We simply don't know how things are going to turn out - I believe he's right about how we should approach AI. And regarding any technical points I'd guess he is more right than anyone else. That doesn't justify, in my opinion, going around and instilling a defeatist attitude in anyone who wants to take him seriously. Seriously, people are looking up to you. That's not how you treat them.

EDIT: I rewatched part of the podcast. Previously, I had only seen the final snippets, in particular where he talks about the conference involving Elon Musk. The problem I have doesn't pertain to the rest of the podcast, which makes up a big proportion, so I would weaken the indignant tone above quite a bit.

Still, I have an issue with that snippet, and I believe it is no coincidence that it was isolated for emotional effect. Concretely, Eliezer strikes this hurt tone and starts talking about how we couldn't even react in a dignified way and so on. This does strike me as unnecessary and even bordering toward offending the viewer/society at large (whether or not that is intended or might be justified aside).

We simply don't know how things are going to turn out

His model does say how things are going to turn out: with everyone dying.

Seriously, people are looking up to you. That's not how you treat them.

What's the preferred alternative? Lying or withholding relevant arguments, out of general principle, without even an expected benefit to anyone?


I’m sorry if I’m misunderstanding- but is your claim that Yudkowsky’s model actually does tell us for certain, or some extremely close approximation of ‘certain’, about what’s going to happen?

(This is of course just my understanding of his model, but) yes. The analogy he uses is that while you cannot predict Stockfish's next move in chess, you can predict for 'certain' that it will win the game. I think the components of the model are roughly:

  • it is 'certain' that, given the fierce competition and the number of players and the incentives involved, somebody will build an AGI before we've solved alignment.
  • it is 'certain' that if one builds an AGI without solving alignment first, one gets basically a random draw from mindspace.
  • it is 'certain' that a random draw from mindspace doesn't care about humans
  • it is 'certain' that, like Stockfish, this random draw AGI will 'win the game', and that since it doesn't care about humans, a won gameboard does not have any humans on it (because those humans were made of atoms which could be used for whatever it does care about)

Why is it a necessary condition that human atoms must be allocated elsewhere? There are plenty of other atoms to work with. We humans dominate the globe but we don’t disassemble literally everything (although we do a lot) to use the atoms for other purposes. Isn’t it arguable that ASI or even AGI will have a better appreciation for systems ecology than we do…


There are a bunch of baked assumptions here from EY.  Remember he came up with many of these ideas years ago, before deep learning existed.

(1) the AGIs have to be agentic with a global score.  This is not true, but very early AI agents often did work this way.  Take one of the simplest possible RL agents, the q-learner.  All it does is pick the action that has the maximum discounted reward.  Thus the q-learner learns it's environment, filling out an array in memory called the q-table, and then just does whatever it's source code told it has the max reward.  (some of the first successful deep learning papers just replaced that array with a neural network)

You could imagine building an "embodied" robot that from the moment you switch it on, it always tries to make that "reward" number go ever higher, the same way.  

This kind of AGI is likely lethally dangerous.

(2) Intelligence scales very high.  In a simple game, intelligence has diminishing returns that collapse to zero.  (once you have enough intelligence to solve a task, you have 0 error gradient or reason to develop any more)

 In more complex games (including reality), intelligence goes further, but there always is a limit.  For example, if you think about a task like "primate locates and picks apples", ever more intelligence can make the primate more efficient at searching for the apple, or to take a more efficient path towards reaching and grasping the apple.  But it's logarithmically diminishing returns, and no amount of intelligence will let the primate find an apple if it's paralyzed or unable to explore at least some of the forest.  Nor can it instruct another primate to find the apple for it if the paralyzed one has never seen the forest at all.  

Note also that in reality, an agent's reward equals (resource gain - resource cost).  One term in 'resource cost' is the cost of compute.  Hence, for example you would not want to make a robot that mines copper too smart as adding more and more cognitive capacity adds less and less incremental efficiency gain in how much copper it collects, but costs more and more compute to realize.  Similarly there is no reason to train the agent in simulation past a certain point, for the same cost reason.  Intelligence stops adding marginal net utility.   

EY posits that technologies that we think will probably take methodically improvements and careful experiments on a very large scale to develop could be "leapfrogged" by just skipping direct to advanced capabilities.  For example, diamondoid nanotechnology not from carefully studying small assemblies of diamond on a large scale, and methodically working up the tool chain, at a large scale using many billions of dollars of equipment, but instead just hacking it direct from hijacking biology.

From an agent that has no direct experimental data with biology - EY gives examples where the AGI has done everything in sim.  Note EY has never been to high school or college per wikipedia.  He may be an extreme edge of the bell curve genius, but there may be small flaws in his knowledge base that are leading to these faulty assumptions.  Which is exactly the problem an AGI with infinite compute but no empirical data not regurgitated from humans would have.  It would model biology and the nanoscale using all human papers, but small errors would cause the simulation to diverge from reality, causing the AGI to make plans based on nonsense.  (see how RL agents exploit environments by exploiting flaws in the physics sim for an example of this)

We humans dominate the globe but we don’t disassemble literally everything (although we do a lot) to use the atoms for other purposes.

There are two reasons why we don't:

  1. We don't have the resources or technology to. For example there are tons of metals in the ground and up in space that we'd love to get our hands on but don't yet have the tech or the time to do so, and there are viruses we'd love to destroy but we don't know how. The AGI is presumably much more capable than us, and it hardly even needs to be more capable than us to destroy us (the tech and resources for that already exist), so this reason will not stop it.

  2. We don't want to. For example there are some forests we could turn into useful wood and farmland, and yet we protect them for reasons such as "beauty", "caring for the environment", etc. Thing is, these are all very human-specific reasons, and:

Isn’t it arguable that ASI or even AGI will have a better appreciation for systems ecology than we do…

No. Sure it is possible, as in it doesn't have literally zero chance if you draw a mind at random. (Similarly a rocket launched in a random direction could potentially land on the moon, or at least crash into it.) But there are so many possible things an AGI could be optimizing for, and there is no reason that human-centric things like "systems ecology" should be likely, as opposed to "number of paperclips", "number of alternating 1s and 0s in its memory banks", or an enormous host of things we can't even comprehend because we haven't discovered the relevant physics yet.

(My personal hope for humanity lies in the first bullet point above being wrong: given surprising innovations in the past, it seems plausible that someone will solve alignment before it's too late, and also given some semi-successful global coordination things in the past (avoiding nuclear war, banning CFCs), it seems plausible that a few scary pre-critical AIs might successfully galvanize the world into successful delaying action for long enough that alignment could be solved)

if an AI appreciates ecology more than we do, among its goals is to prevent human harm to ecosystems, and so among its early actions will be to kill most or all humans. You didn’t think of this, because it’s such an inhuman course of action.  
Almost every goal that is easy to specify leads to human disempowerment or extinction, if a superhuman entity tries hard enough to accomplish it.  This regrettable fact takes a while to convince yourself of, because it is so strange and terrible. In my case, it was roughly 1997-2003.  Hopefully humanity learns a bit faster.

Evolution favours organisms that grow as fast as possible.  AGIs that expand aggressively are the ones that will become ubiquitous.

Computronium needs power and cooling.  Only dense, reliable and highly scalable form of power available on earth is nuclear, why would ASI care about ensuring no release of radioactivity into the environment?

Similarly mineral extraction - which at huge scales needed for VInge's "aggressively hegemonizing" AI will be using inevitably low grade ores becomes extremely energy intensive and highly polluting.  Why would ASI care about the pollution?

If/when ASI power consumption rises to petaWatt levels the extra heat is going to start having a major impact on climate.  Icecaps gone etc.  Oceans are probably most attractive locations for high power intensity ASI due to vast cooling potential.

Imagine fiancéespace (or fiancéspace) - as in the space of romantic partners that would marry you (assuming you're not married and you want to be). You can imagine "drawing" from that space, but once you draw nearly all of the work is still ahead of you. Someone that was initially "friendly" wouldn't necessarily stay that way, and someone that was unfriendly wouldn't necessarily stay that way. It's like asking "how do you make sure a human mind stays friendly to you forever?" We can't solve that with our lowly ape minds, and I'm not sure that we'd want to. The closest solution to that I know if with humans is Williams syndrome, and we probably wouldn't want an AGI with an analogous handicap. The relationship cultured overtime with other minds is more important in many respects the the initial conditions of the other minds. 

Maybe dogs are the better metaphor.  We want AGIs to be like very smart Labradors. Random, "feral," AGIs may be more like wolves. So if we made them so they could be "selectively bred" using something like a genetic algorithm? Select for more Lab-y and less Wolf-y traits.

If a Labrador was like 10 or 100 times smarter than it's owner, would it still be mostly nice most of the time? I would hope so. Maybe the first AGI works like Garm->Fenrir in God of War (spoiler, sorry). 

Just thinking out loud a bit...

You can't selectively breed labradors if the first wolf kills you and everyone else.

Of course you can, you just have to make the first set of wolves very small. 

Well, he might be right - and I align with his views more than with many others - but you still have to realize that you can't literally predict the future.

I think there's a difference between not wanting to elicit false hope and taking out your negative emotions on others (even if it's about a reasonable expectation of the world). Of course, he has a right to experience these emotions - but I believe it would be more considerate to do that in private.

I should say that I have great respect to him and his efforts and insights. This is not a critique of the person, just of a concrete behavior.

Maybe making people realize the reality of the situation and telling the truth is more important than sparing their feelings.

There is a misalignment hazard to this framing: the person who decides to withhold the truth is not the audience who'd care to have their feelings spared. So the question of whether it's "more important" might be ill-posed.

Thanks for bringing this up. Yes I think that is very important and is not what I'm trying to criticize. I will update the previous comment to clarify.

Well, he might be right

That doesn't matter for the points I was responding to, a matter of policy for what things to claim, given what your own understanding of the world happens to be.

you still have to realize that you can't literally predict the future

You can't know the future with certainty, but you can predict it. The sun will rise tomorrow. I'm much less certain that it will rise in 20 years, and not because there is nobody to observe it.

a right to experience these emotions

There are claims about the facts of the world being made, apart from any emotions. Presence of emotional correlates doesn't make corresponding events in the concrete physical world irrelevant.

I think there's a kind of division of labor going on, and I'm going to use a software industry metaphor. If you're redteaming, auditing, or QAing at a large org, you should really be maxing out on paranoia, being glass half empty, etc. because you believe that elsewhere in the institution, other peoples' jobs are to consider your advice and weigh it against the risk tolerance implied by the budget or by regulation or whatever. Whereas I think redteaming, auditing, or QAing at a small org you kind of take on some of the responsibility of measuring threatmodels against given cost constraints. It's not exactly obvious that someone else in the org will rationally integrate information you provide into the organization's strategy and implementation, you want them to follow your recommendations in a way that makes business sense.

My guess is that being a downer comes from this large org register of a redteam's job description being literally just redteam, whereas it might make sense for other researchers or communicators to take a more small org approach where the redteam is probably multitasking in some way.

Intuition pump: I don't really know a citation, but I once saw a remark that the commercial airline crash rate in the late soviet union was plausibly more rational than the commercial airline crash rate in the US. Airplane risk intolerance in the US is great for QA jobs, but that doesn't mean it's based on an optimal tradeoff between price and safety with respect to stakeholder preferences (if you could elicit them in some way). Economists make related remarks re nuclear energy.

Is there a transcript anywhere?

click on 3 dots (if on a browser) and there is a transcript option in the pop-up menu.

Unfortunately without speaker labels the YouTube transcript is less useful unless you're listening while reading.

Double speed makes it usable


The secure OS metaphor is actually very helpful, and I'm surprised I've never heard it before.

I don't think there is any chance of malign ASI killing everyone off in less than a few years, because it would take a long time to reliably automate the mineral extraction and manufacturing processes and power supplies required to guarantee an ASI in its survival and growth objectives (assuming it is not suicidal).  Building precise stuff reliably is really really hard, robotics and many other elements of infrastructure needed are high maintenance, and demanding of high dexterity maintenance agents, and the tech base required to support current leading edge chip manufacturing probably couldn't be supported by less than a few tens to hundred million humans - that's a lot of high-performance meat-actuators and squishy compute to supplant.  Datacenter's and their power supplies and cooling systems plus myriad other essential elements will be militarily vulnerable for a long time.

I think we'll have many years to contemplate our impending doom after ASI is created.  Though I wouldn't be surprised if it quickly created a pathogenic or nuclear gun to hold to our collective heads and prevent our interfering or interrupting its goals.

I also think it won't be that hard to get large proportion of human population clamoring to halt AI development - with sufficient political and financial strength to stop even rogue nations.  A strong innate tendency towards Millennialism exists in a large subset of humans (as does a likely linked general tendency to anxiousness).  We see it in the Green movement and redirecting it towards AI is almost certainly achievable with the sorts of budgets that existential alignment danger believers (some billionaires in their ranks) could muster.  Social media is a great tool to do these days if you have the budget.,Comment%20Permalink,-Foyle


I believe you are correct, however, EY argues that a sufficiently intelligent AGI might be able to hack biology and use ribosomes and protein machinery to skip straight to diamondoid self replicating nanotechnology.

I suspect this is impossible - that is, there may exist a sequence of steps that would work and achieve this, but it does not mean that the information exists within the pool of [all scientific data collected by humans] to calculate what those steps are.  

Instead you would have to do this iteratively, similar to how humans would do it.  Secure a large number of STM microscopes and vacuum chambers.  Build, using electron beam lathes or other methods, small parts to test nanoscale bonding strategies.  Test many variants and develop a simulation sufficient to design nanoscale machine parts.  Iteratively use the prior data to design and test ever larger and more sophisticated assemblies.  Then, once you have a simulated path to success and high enough confidence, bootstrap a nanoforge.  (bootstrap means you try to choose a path where prior steps on the path make future steps easier)

An ASI holding everyone hostage isn't a winning scenario for the ASI.  Humans are just going to pull the trigger on their own nuclear guns in such a scenario.  

The EY scenarios where the ASI wins generally involve deception.  Everything is fine, until everyone dies all at the same time from some type of immediately lethal but delayed bioweapon or nanotechnology based weapon.  

Botulism toxin is one way this is theoretically achievable - it takes a very small quantity to kill a human.  So a 'time bomb' of a capsule of it, injected painlessly into most humans using nanotechnology, or a virus that edits our genome and inserts the botulism gene and some mechanism to prevent expression for a few months, or similar method.  For one thing, botulism toxin is a protein and is probably much larger than it needs to be...

If all of EY's scenarios require deception, then detection of deception from rogue AI systems seems like a great place to focus on. Is there anyone working on that problem?

[+][comment deleted]10