NB:  I've never published on LW before and apologize if my writing skills are not in line with LW's usual style. This post is an edited copy of the same article in my blog.

EY published an article last week titled “AGI Ruin: A List of Lethalities”, which explains in detail why you can’t train an AGI that won’t try to kill you at the first chance it gets, as well as why this AGI will eventually appear given humanity’s current trajectory in computer science. EY doesn’t explicitly state a timeline over which AGI is supposed to destroy humanity, but it’s implied that this will happen rapidly and humanity won’t have enough time to stop it. EY doesn’t find the question of how exactly AGI will destroy humanity too interesting and explains it as follows:

My lower-bound model of "how a sufficiently powerful intelligence would kill everyone, if it didn't want to not do that" is that it gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they're dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery.  (Back when I was first deploying this visualization, the wise-sounding critics said "Ah, but how do you know even a superintelligence could solve the protein folding problem, if it didn't already have planet-sized supercomputers?" but one hears less of this after the advent of AlphaFold 2, for some odd reason.)  The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth's atmosphere, get into human bloodstreams and hide, strike on a timer.  Losing a conflict with a high-powered cognitive system looks at least as deadly as "everybody on the face of the Earth suddenly falls over dead within the same second". 

Let’s break down EY’s proposed plan for “Skynet” into the requisite engineering steps:

  1. Design a set of proteins that can form the basis of a “nanofactory”
  2. Adapt the protein design to the available protein printers that accept somewhat-anonymous orders over the Internet
  3. Design “diamondoid bacteria” that can kill all of humanity and that can be successfully built by the “nanofactory”. The bacteria must be self replicating and able to extract power from solar energy for self sustainance.
  4. Execute the evil plan by sending out the blueprints to unsuspecting protein printing corporations and rapidly taking over the world afterwards

The plan above looks great for a fiction book and EY is indeed a great fiction writer in addition to his Alignment work, but there’s one unstated assumption: the AGI will not only be able to design everything using whatever human data it has available, but it will also execute the evil plan without needing lots of trial and error like mortal human inventors do. And surprisingly this part of EY’s argument gets little objection. A visual representation of my understanding of EY’s mental model of AGI vs. Progress is as follows:

How fast can humans develop novel technologies?

Humans are the only known AGI that we have available for reference, so we could look at the fastest known examples of novel engineering to see how fast an AGI might develop something spectacular and human-destroying. Patrick Collison of Stripe keeps a helpful page titled “Fast” with notable “examples of people quickly accomplishing ambitious things together”. The engineering entries include:

  • P-80 Shooting Star, a World War II aircraft designed and built in 143 days by Lockheed Martin.
  • Spirit of St. Louis, another airplane designed and built in just 60 days.
  • USS Nautilus. The world’s first nuclear submarine was launched in 1173 days or 3.2 years.
  • Apollo 8, where it took 134 days between “what if we go to the moon?” to the actual landing.
  • The iPod, which took 290 days between the first designs and the device being launched to Apple stores.
  • Moderna’s vaccine against COVID, which took 45 days between the virus being sequenced and the first batch of the actual vaccine getting manufactured.

Sounds very quick? Definitely, but the problem is that Patrick’s examples are all for engineering constructs building on top of decades of previous work. Designing a slightly better airplane in 1944 is not the same as creating the very first airplane in 1903, as by 1944 humans had 30 years of experience to build on top of. And if your task is to build diamondoid bacteria manufactured by a protein-based nanomachinery factory you’re definitely in Wright Brothers territory. So let’s instead look at timelines of novel technologies that had little prior research and infrastructure to fall back on:

  • The Wright Brothers took 4 years to build their first successful prototype. It took another 23 years for the first mass manufactured airplane to appear, for a total of 27 years of R&D.
  • It took 63 years for submarines to progress from “proof of concept” in the form of Nautilus in 1800 to the first submarine capable of sinking a warship in the form of The Hunley in 1863.
  • It took 40 years between Einstein publishing his paper on the theory of relativity and the Atomic bomb being dropped on Hiroshima and Nagasaki. It took another 9 years to open the world’s first nuclear powerplant.
  • It took 36 years from the first time mRNA vaccines were synthesized in 1984 and the first mRNA-based vaccine to be mass-manufactured.
  • It took at least 30 years of development for LED technology to go from an experimental to being useful for commercial lighting.
  • It took around 30 years for digital photography to overtake film photography in terms of costs and quality.

Now… you might object to this by correctly calling out the downside of human R&D:

  • Human intellect is extremely inferior to what AGI will be capable of. At best, the collective intellectual capacity of the entire mankind will be equal to that of AGI. At worst, even all of our 8 billion brains will be collectively an order of magnitude dumber
  • Humans have to sleep, eat, drink, vacation, while AGI can work 24/7
  • Humans are more-or-less single threaded and require a coordinated effort to work on complicated research, which is additionally bogged down by the inefficiencies of trying to coordinate a large number of humans at the same time.

And this is all true! Humans are nothing to a hypothetical team of AGIs. But the problem is… until AGI can build its fantastical diamondoid bacteria, it remains dependent on imperfect human hands to conduct its R&D in the real world, as they’ll be the only way for AGI to interact with the physical world for a very long time. Remember that AGI’s one downside is that it will be running on motionless computers, unlike humans who have been running around with 4 limbs since the beginning of civilization. Which in turn brings us to the 30+ years timeline of developing a novel engineering construct, no matter how smart the AI will be.

Unstoppable intellect meets complexity of the universe

Plenty of content has been written about how human scientific progress is slowing down, my favorite being WTF Happened in 1971 and Scott’s 2018 post Is Science Slowing Down?. In the second article Scott brings up the paper Are Ideas Getting Harder to Find? by Bloom, Jones, Reenen & Webb (2018), which has the following neat graph:

We can see how the amount of investment into R&D is growing every year, but productive research is more or less flat. The paper brings up a relatable example in the section on semiconductor research:

The striking fact, shown in Figure 4, is that research effort has risen by a factor of 18 since 1971. This increase occurs while the growth rate of chip density is more or less stable: the constant exponential growth implied by Moore’s Law has been achieved only by a massive increase in the amount of resources devoted to pushing the frontier forward. Assuming a constant growth rate for Moore’s Law, the implication is that research productivity has fallen by this same factor of 18, an average rate of 6.8  percent per year. If the null hypothesis of constant research productivity were correct, the growth rate underlying Moore’s Law should have increased by a factor of 18 as well. Instead, it was remarkably stable. Put differently, because of declining research productivity, it is around 18 times harder today to generate the exponential growth behind Moore’s Law than it was in 1971.

Not even AGI could get around this problem and would likely require an exponentially growing amount of resources as it delves deeper into engineering and fundamental research. It is definitely true that AGI itself will be rapidly increasing its intellect, but can this really continue indefinitely? At some point all the low hanging fruit missed by human AI researchers will be exhausted and AGI will have to spend years in real world time to make significant improvements of its own IQ. Granted, AGI will rapidly reach an IQ far beyond human reach, but all this intellectual power will still have to contend with the difficulties of novel research. 

What does AGI want?

Since AGI development is completely decoupled from mammalian evolution here on Earth, its quite likely to eventually exhibit “blue and orange” morality, behaving in a completely alien and unpredictable fashion, with no humanly understandable motivations or a way for humans to relate to what the AGI wants. That being said, AGI is likely to fall into one of two buckets regardless of its motivations:

  1. AGI will act rationally to achieve whatever internal goals it has, no matter how alien and weird to us. I.e. “collect all the shiny objects into every bucket-like object in the universe” or “convert the universe into paperclips”. This means the AGI will carefully plan ahead and attempt to preserve its own existence to fulfill the internal overreaching goals.
  2. AGI doesn’t have any goals at all beyond “kill all humans!”. It just acts as a rogue terrorist, attempting to destroy humans without the slightest concern for its own survival. If all humans die and the AGI dies alongside them, that’s fine according to the AGI’s internal motivations. There’s no attempt to ensure overarching continuation of its goals (like “collect all strawberries!”) once humanity is extinct.

Let’s start with scenario #1 by looking at… the common pencil.

What does it take to make a pencil?

A classic pamphlet called I, Pencil walks us through what it takes to make a common pencil from scratch:

  1. Trees have to be cut down, which takes saws, trucks, rope and countless other gear.
  2. The cut down trees have to be transported to the factory by rail, which in turns needs laid down rail, trains, rail stations, cranes, etc.
  3. The trees are cut down with metal saws, waxes and dried. This consumes a lot of electricity, which is in turn made by burning fuel, making solar panel or building hydroelectric powerplants.
  4. At the center of the pencil is a chunk of graphite mined in Sri Lanka, using loads of equipment and transported by ship.
  5. The pencil is covered with lacquer, that’s in turn made by growing castor beans.
  6. There’s the piece of metal holding the eraser, mined from shafts underground.
  7. And finally there’s the rubber mined in Indonesia and once again transported by ship.

The point to this entire story is that making something as simple as a pencil requires a massive supply chain employing tens of millions of non-AGI humans. If you want any hope of continuing to exist, you need to replace the labor of this gigantic global army of humans with AGI-controlled robots or “diamondoid bacteria” or whatever other magical contraption you want to invoke. Which will require lots of trial & error and decades of building out a reliable AGI-controlled supply chain that could be reused to fight humans at the drop of a hat. Because otherwise AGI will risk seeing its brilliant plan fail, resulting in humans going berserk against any machines capable of running said AGI and ending its reign of Earth long before it has a chance to start in earnest. And if the AGI doesn’t understand this… how smart is it really?


But what if the AGI is absolutely ruthless and doesn’t care if it goes up in flames as soon as humans are gone? Then we could get to the end of humanity much faster with options like:

  • Get humans to think that their enemy is about to launch a nuclear strike and launch a strike of their own, similar to WarGames
  • Design a supervirus capable of destroying humanity. Think a combination of HIV’s lethality with the ease of spread of measles.
  • Plant a powerful information hazard into humanity’s consciousness that will somehow trigger us to kill each other as soon as possible. Also see Roko’s Basilisk and Rokoko’s Basilisk, an infohazard responsible for the birth of X Æ A-12.
  • Design the mother of all greenhouse gases and convince humanity to make tons of it, eventually resulting in the planet heating up to extreme temperatures.
  • Provide advanced nuke designs and materials covertly to very bad people and manipulate them into sabotaging world order.

The problem with all these scenarios is similar:

Either they’re perfectly doable by humans in the present, with no AGI help necessary. I.e. we’ve been barely saved from WW3 by a Soviet officer, long before AGI was on anyone’s mind. So at worst AGI will somewhat increase the risks of this happening in the short term... Or they require lots of trial & error to develop into functional production-ready technologies, once again creating a big problem for AGI, as it has to rely on imperfect humans to do the novel R&D. This will still take decades, even if AGI won’t worry about a full takeover of supply chains.

But what about AlphaFold?

Another possible counter-argument is that AGI will figure out the laws of the universe through internal modeling and will be able to simulate and perfect its amazing inventions without needing trial & error in the physical world. EY mentions AlphaFold as an example of such a breakthrough. If you haven’t heard about it, here’s a description of the Protein Folding Problem from Wiki that AlphaFold 2 solved better than any other prior system back in 2020:

Proteins consist of chains of amino acids which spontaneously fold, in a process called protein folding, to form the three dimensional (3-D) structures of the proteins. The 3-D structure is crucial to the biological function of the protein. However, understanding how the amino acid sequence can determine the 3-D structure is highly challenging, and this is called the "protein folding problem". The "protein folding problem" involves understanding the thermodynamics of the interatomic forces that determine the folded stable structure, the mechanism and pathway through which a protein can reach its final folded state with extreme rapidity, and how the native structure of a protein can be predicted from its amino acid sequence.

According to EY, the existence of AlphaFold shows that a smart enough AGI could eventually learn to manipulate proteins into “nanofactories” that could be used to interact with the physical world. However the current version still has major limitations:

Whilst it may be considered the gold standard of protein prediction, there is still room for improvement as AlphaFold only provides one prediction of a stable structure for each protein; however, proteins are dynamic and can change shape throughout the body, for example under different pH conditions. Additionally, AlphaFold is not able to determine the shape of multiprotein complexes and does not include any ligands such as cofactors or metals, meaning no data are available for such interactions. Despite these shortcomings, AlphaFold is the first step in protein prediction technology, and it is likely that solving these challenges will also be done so using deep learning and AI.

In other words, there’s still a huge leap between “can predict simple protein structures” and “can design protein nanofactories without experimentation”. AGI will likely need to spend decades managing laboratory experiments to fill in the gaps around our understanding of how proteins work. And don’t forget that currently available commercial protein printers are not perfect, especially if you’re trying to print a novel structure of far bigger complexity than anything else on the planet. Also see this excellent comment on the subject by anonymousaisafety.

What if AGI settles for a robot army?

Cybernetic army from I, Robot

We could also think of the diamondoid bacteria as just an example of what the AI can do and turn to other ways it could manipulate the physical reality, that are closer to the technology that we already have today. There’s impressive videos of Boston Dynamics robots doing all kinds of stunts, so we could ask if perhaps AGI could utilize their existing progress to quickly give itself a way to interact with the outside world. However this would still involve many roadblocks:

  • The robots are running on pre-programmed runs in a human-designed course and are not capable of navigating through unknown terrain. AGI will have to push cybernetics to amazing heights in order for the robots to be useful in novel spaces.
  • AGI could just make a version of itself and give sentience to every individual machine, but this will require a massive leap in computational technology, as a single machine is very unlikely to be able to host an instance of an AGI.
  • AGI could have a command and control center next to every batch of robots or try to control them over large distances, but this is again a daunting engineering task when you have little room for mistake.
  • Battery technology is currently too lousy and even the more simple dog-like robots can only last for 90 minutes. There’s no published information on how long the human-like versions could last on battery, but we could reason it cannot be more than an hour. Therefore AGI will need to achieve a big leap in battery life before anything resembling the “I, Robot” machinery could be built.
  • Boston Dynamics robots don’t exist in large enough quantities for a complete overtake of global supply chains and it will take years of human labor to get more robot factories online.
  • Humans are pretty good at warfare and your robots have to be extremely good to beat them in battle, far better than what’s currently available.

[added] Also see this excellent comment by anonymousaisafety explaining why "just takeover the human factories" is not a quick path to success (slightly edited below):

The tooling and structures that a superintelligent AGI would need to act autonomously does not actually exist in our current world, so before we can be made into paperclips, there is a necessary period of bootstrapping where the superintelligent AGI designs and manufactures new machinery using our current machinery. Whether it's an unsafe AGI that is trying to go rogue, or an aligned AGI that is trying to execute a "pivotal act", the same bootstrapping must occur first.

Case study: a common idea I've seen while lurking on LessWrong and SSC/ACT for the past N years is that an AGI will "just" hack a factory and get it to produce whatever designs it wants. This is not how factories work. There is no 100% autonomous factory on Earth that an AGI could just take over to make some other widget instead. Even highly automated factories are: 

  1. Highly automated to produce a specific set of widgets,
  2. Require physical adjustments to make different widgets, and...
  3. Rely on humans for things like input of raw materials, transferring in-work products between automated lines, and the testing or final assembly of completed products. 3D printers are one of the worst offenders in this regard. The public perception is that a 3D printer can produce anything and everything, but they actually have pretty strong constraints on what types of shapes they can make and what materials they can use, and usually require multi-step processes to avoid those constraints, or post-processing to clean up residual pieces that aren't intended to be part of the final design, and almost always a 3D printer is producing sub-parts of a larger design that still must be assembled together with bolts or screws or welds or some other fasteners.

So if an AGI wants to have unilateral control where it can do whatever it wants, the very first prerequisite is that it needs to create a futuristic, fully automated, fully configurable, network-controlled factory -- which needs to be built with what we have now, and that's where you'll hit the supply constraints for things like lead times on part acquisition. The only way to reduce this bootstrapping time is to have this stuff designed in advance of the AGI, but that's backwards from how modern product development actually works. We design products, and then we design the automated tooling to build those products. If you asked me to design a factory that would be immediately usable by a future AGI, I wouldn't know where to even start with that request. I need the AGI to tell me what it wants, and then I can build that, and then the AGI can takeover and do their own thing.

A related point that I think gets missed is that our automated factories aren't necessarily "fast" in a way you'd expect. There's long lead times for complex products. If you have the specialized machinery for creating new chips, you're still looking at ~14-24 weeks from when raw materials are introduced to when the final products roll off the line. We hide that delay by constantly building the same things all of the time, but it's very visibly when there's a sudden demand spike -- that's why it takes so long before the supply can match the demand for products like processors or GPUs. I have no trouble with imagining a superintelligent entity that could optimize this and knock down the cycle time, but there's going to be physical limits to these processes and the question is can it knock it down to 10 weeks or to 1 week? And when I'm talking about optimization, this isn't just uploading new software because that isn't how these machines work. It's designing new, faster machines or redesigning the assembly line and replacing the existing machines, so there's a minimum time required for that too before you can benefit from the faster cycle time on actually making things. Once you hit practical limits on cycle time, the only way to get more stuff faster is to scale wide by building more factories or making your current factories even larger.

If we want to try and avoid the above problems by suggesting that the AGI doesn't actually hack existing factories, but instead it convinces the factory owners to build the things it wants instead, there's not a huge difference -- instead of the prerequisite here being "build your own factory", it's "hostile takeover of existing factory", where that hostile takeover is either done by manipulation, on the public market, as a private sale, or by outbidding existing customers (e.g. have enough money to convince TSMC to make your stuff instead of Apple's), or with actual arms and violence. There's still the other lead times I've mentioned for retooling assembly lines and actually building a complete, physical system from one or more automated lines.

My prediction is that it will take AGI at least 30 years of effort to get to a point where it can comfortably rely on the robots to interact with the physical world and not have to count on humans for its supply chain needs.

[added] What if AGI just simulates our physical world?

This idea goes hand-in-hand with idea that AlphaFold is the answer to all challenges in bioengineering. There are two separate assumptions here, both found in the field of computational complexity:

  1. That an AGI can simulate the physical systems perfectly, i.e. physical systems are computable processes.
  2. That an AGI can simulate the physical systems efficiently, i.e. either P = NP, or for some reason all of these interesting problems that the AGI is solving are NOT known to be isomorphic to some NP-hard problem.

I don't think these assumptions are reasonable. For a full explanation see this excellent comment by anonymousaisafety.

Mere mortals can’t comprehend AGI?

Another argument is that AGI will achieve such an incomprehensible level of intellect that it will become impossible to predict what it will be capable of. I mean, who knows, maybe with an IQ of 500 you could just magically turn yourself into a God and destroy Earth with a Thanos-style snap of your fingers? But I contend that even a creature with an IQ of 500 will be inherently limited by our physical universe and won’t magically become gain omniscience by virtue of its intellect alone. It will instead have to spend decades to get rid of using humans as a proxy, no matter how smart it could be potentially.

Does this mean EY is wrong and AGI is not a threat?

I believe that EY is only wrong about handwaving the difficulties of growing from a computer-based AGI to an AGI capable of operating independently from the human race. In the long-term his predictions will likely come true, once AGI has enough time to go through the difficult R&D cycle of building the nanofactories and diamondoid bacteria. My predicted timeline is as follows:

  1. AGI first appears somewhere around 2040, in line with the Metaculus prediction.
  2. After a few years of peaceful coexistence, AI Alignment researchers are mocked for their doomer predictions and everyone thinks that AGI is perfectly safe. EY will keep writing blog posts about how everyone is wrong and AGI cannot be trusted. AGI might start working behind the shadows to try and get AI Alignment researchers silenced.
  3. AGI spends decades convincing humanity to let it take over the global supply chains and to run complex experiments to manufacture advanced AGI-designed machinery, supposedly necessary to improve human living standards. This will likely take at least 30 years, as per our reference to how long it took to implement other gigantic breakthroughs in science.
  4. Once the AGI is convinced that all the cards have fallen into place and humans could be safely removed, it will pull the plug and destroy us all.

Updated version of the original progress graph

I’m hoping that the AI Alignment movement tries to spend more time on the low level engineering details of “humanity goes poof” rather than handwaving everything away via science fiction concepts. Because otherwise it’s hard to believe that the FOOM scenario could ever come to fruition. And if FOOM is not the real problem, perhaps we could save humanity by managing AGI’s interactions with the physical world more carefully once it appears?


71 comments, sorted by Click to highlight new comments since: Today at 8:35 AM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-][anonymous]12d 65

I upvoted your post because it seems relatively lucid and raises some important points, but would like to say that I'm in the middle of writing a pretty long, detailed explanation of why I agree with most of the gripes (e.g. AIs can't use magic to mine coal/build nanobots) and yet the object-level conclusions here are still untrue. In practice, I seriously doubt we would have more than a year to live after the release of AGI with the long term planning and reasoning abilities of most accountants, even without FOOM. People here shouldn't assume that, because Eliezer never posted a detailed analysis on LessWrong, everyone on the doomer train is starting from unreasonable premises regarding how robot building and research could function in practice.

+1. If you don't write that post, I will. :)

And if you want feedback on your draft I'd be happy to give it a read and leave comments.

For sure; I think I'm about 45% of the way through, I'll send you a draft when it's about 90% done :)
1Chris van Merwijk9d
I'm also interested to read the draft, if you're willing to send it to me.
1Evan R. Murphy4d
The user who was authoring the draft has apparently deactivated their account. Are they still working on writing that post?
5Nikita Sokolsky12d
People here shouldn't assume that, because Eliezer never posted a detailed analysis on LessWrong, everyone on the doomer train is starting from unreasonable premises regarding how robot building and research could function in practice. I agree but unfortunately my Google-fu wasn't strong enough to find detailed prior explanations of AGI vs. robot research. I'm looking forward to your explanation.
I'm looking forward to reading your post!!
One year. Would you be willing to bet on that?
It's nice that you're open to betting. What unambiguous sign would change your mind, about the speed of AGI takeover, long enough before it happens that you'd still have time to make a positive impact afterwards? Nobody is interested in winning a bet where winning means "mankind gets wiped".
Yes, that's the key issue. I'm not sure I can think of one. Do you have any ideas? I mean, what would be an unequivocal sign that AGI can take over in a year time? Something like a pre-AGI parasitizing a major computing center for X days before it is discovered in a plan to expand to other centres...? That would still not be a sign that we are pretty much f. up in a year, but definitely a data point towards things can go bad very quickly What data point would make you change your mind in the opposite direction? I mean, something that happens and you say: yes, we can go all die but this won't happen in a year so, but maybe in something like 30 years or more Edit: I posted two paragraphs originally in separate comments, unifying for the sake of clarity
I forgot where I saw this, but there's a strategy where the person betting that humanity will survive longer than a year gives the person betting on doom the money in advance, with the doomer returning the money plus interest if they're wrong. I forgot the details. But if you're looking for a way to bet on the end of the world, it's the only way I can think of. EDIT: I think I saw this in a post about EY betting with someone.
He has a $100 bet with Brian Caplan, inflation adjusted. EY took Brian’s money at the time of the bet, and pays back if he loses.
Yes, but I don't know if he really did it. I see multiple problems with that implementation. First, the interest rate should be adjusted for inflation, otherwise the bet is about a much larger class of events than "end of the world". Next, there's a high risk that the "doom" better will have spent all their money by the time the bet expires. The "survivor" better will never see the color of their money anyway. Finally, I don't think it's interesting to win if the world ends. I think what's more interesting is rallying doubters before it's too late, in order to marginally raise our chances of survival.
It may still be useful as a symbolic tool, regardless of actual monetary value. $100 isn't all that much in the grand scheme of things, but it's the taking of the bet that matters.
1Nikita Sokolsky12d
Those are excellent comments! Do you mind if I add a few quotes from them to the post?
I don't mind.

The beginning of this post seems fairly good.

I agree that an AGI would need lots of trial and error to develop a major new technology.

I'm unsure whether an AGI would need to be as slow as humans about that trial and error. If it needs secrecy, that might be a big constraint. If it gets human cooperation, I'd expect it to improve significantly on human R&D speed.

I also see a nontrivial chance that humans will develop Drexlerian nanotech before full AGI.

Your post gets stranger toward the end.

I don't see much value in a careful engineering analysis of how an AGI might kill us. Most likely it would involve a complex set of strategies, including plenty of manipulation, with no one attack producing certain victory by itself, but with humanity being overwhelmed by the number of different kinds of attack. There's enough uncertainty in that kind of fight that I don't expect to get a consensus on who would win. The uncertainty ought to be scary enough that we shouldn't need to prove who would win.

Many kinds of uncertain attacks is not the strategy EY points at with his “diamondoid bacteria” idea. He’s worrying about a single undetectable attack with high chance of success using approaches that only an AGI can execute. Others worry about an AGI using many attacks, but that is ultimately unbeatable. Here’s you’re worrying about an AGI using many attacks that is beatable by humans, but not with confidence. These are distinct arguments, and we should be clear about which one is being made and responded to.
3Nikita Sokolsky12d
"but with humanity being overwhelmed by the number of different kinds of attack." But AGI will only be able to start carrying out these sneaky attacks once its fairly convinced it can survive without human help? Otherwise humans will notice the various problems propping up and might just decide to "burn all GPUs" which is currently an unimaginable act.. So AGI will have to act sneakily behind the scenes for a very long time. This is again coming back to the argument that humans have a strong upper hand as long as we've got monopoly on physical world manipulation.

I initially wrote a long comment discussing the post, but I rewrote it as a list-based version that tries to more efficiently parcel up the different objections/agreements/cruxes.
This list ended up basically just as long, but I feel it is better structured than my original intended comment.

(Section 1): How cast can humans develop novel technologies

  • I believe you assume too much about the necessary time based on specific human discoveries.
    • Some of your backing evidence just didn't have the right pressure at the time to go further (ex: submarines) which means that I think a more accurate estimate of the time interval would be finding the time that people started paying attention to the issue again (though for many things that's probably hard to find) and began deliberately working on/towards that issue.
      • Though, while I think this is more accurate, I think there's still a notable amount of noise and basic differences due to the difference in ability to focus, the unity (relative to a company), and the large amount of existing data in the future
    • Other technologies I would expect were 'put off' because they're also closely linked to the available technology at the time. It can be
... (read more)

A crux here seems to be the question of how well the AGI can simulate physical systems. If it can simulate them perfectly, there's no need for real-world R&D. If its simulations are below some (high) threshold fidelity, it'll need actors in the physical world to conduct experiments for it, and that takes human time-scales.

A big point in favor of "sufficiently good simulation is possible" is that we know the relevant laws of physics for anything the AGI might need to take over the world. We do real-world experiments because we haven't managed to write simulation software that implements these laws at sufficiently high fidelity and for sufficiently complex systems, and because the compute cost of doing so is enormous. But in 20 years, an AGI running on a giant compute farm might both write more efficient simulation codes and have enough compute to power them.

You're making two separate assumptions here, both found in the field of computational complexity:

  1. That an AGI can simulate the physical systems perfectly, i.e. physical systems are computable processes.
  2. That an AGI can simulate the physical systems efficiently, i.e. either P = NP, or for some reason all of these interesting problems that the AGI is solving are NOT known to be isomorphic to some NP-hard problem.

For (1), it might suffice to show that some physical system can be approximated by some algorithm, even if the true system is not known to be computable. Computability is a property of formal systems.

It is an open question if all real world physical processes are computable. Turing machines are described using natural numbers and discrete time steps. Real world phenomena rely on real numbers and continuous time. Arguments that all physical processes are computable are based on discretizing everything down to the Planck time, length, mass, temperature, and then assuming determinism.

This axiom tends to come as given if you already believe the "computable universe" hypothesis, "digital physics", or "Church-Turing-Deutsch" principle is true. In those hypotheses, the entire universe... (read more)

It seems like you're relying on the existence of exponentially hard problems to mean that taking over the world is going to BE an exponentially hard problem. But you don't need to solve every problem. You just need to take over the world.

Like, okay, the three body problem is 'incomputable' in the sense that it has chaotically sensitive dependence on initial conditions in many cases. So… don't rely on specific behavior in those cases on long time horizons without the ability to do small adjustments to keep things on track.

If the AI can detect most of the hard cases and avoid relying on them, and include robustness by having multiple alternate mechanisms and backup plans, even just 94% success on arbitrary problems could translate into better than that on an overall solution.

6Yair Halberstadt12d
This was specifically responding to the claim that an AI could solve problems without trial and error by perfectly simulating them, which I think it does a pretty reasonable job of shooting down.
This deserves to be a whole post. You brought up points I, at least, had never considered before.
3Nikita Sokolsky12d
You should make a separate post on "Can AGI just simulate the physical world?". Will make it easier to find and reference in the future.
That was extracted from a much larger work [https://www.lesswrong.com/posts/GkXKvkLAcTm5ackCq/intuitions-about-solving-hard-problems?commentId=s4BF79hCfjoHTJg9f] I've been writing for the past 2 months. The above is less than ~10% of what I've written on the topic, and it goes much further than simulation problems. I am also trying to correct misunderstandings around distributed computation, hardware vs software inefficiency, improvements in performance from algorithmic gains, this community's accepted definition for "intelligence", the necessity or inevitability of self-improving systems, etc. I'll post it when done but in the meantime I'm just tossing various bits & pieces of it into debate wherever I see an opening to do so.
Proteins and other chemical interactions are governed by quantum mechanics, so the AGI would probably need a quantum computer to do a faithful simulation. And that's for a single, local interaction of chemicals; for a larger system, there are too many particles to simulate, so some systems will be as unpredictable as the weather in 3 weeks.
4Luke A Somers12d
The distribution of outcomes is much more achievable and much more useful than determining the one true way some specific thing will evolve. Like, it's actually in-principle achievable, unlike making a specific pointlike prediction of where a molecular ensemble is going to be given a starting configuration (QM dependency? Not merely a matter of chaos). And it's actually useful, in that it shows which configurations have tightly distributed outcomes and which don't, unlike that specific pointlike prediction.
What does "the distribution of outcomes" mean? I feel like you're just not understanding the issue. The interaction of chemical A with chemical B might always lead to chemical C; the distribution might be a fixed point there. Yet you may need a quantum computer to tell you what chemical C is. If you just go "well I don't know what chemical it's gonna be, but I have a Bayesian probability distribution over all possible chemicals, so everything is fine", then you are in fact simulating the world extremely poorly. So poorly, in fact, that it's highly unlikely you'll be able to design complex machines. You cannot build a machine out of building blocks you don't understand. Maybe the problem is that you don't understand the computational complexity of quantum effects? Using a classical computer, it is not possible to efficiently calculate the "distribution of outcomes" of a quantum process. (Not the true distribution, anyway; you could always make up a different distribution and call it your Bayesian belief, but this borders on the tautological.)
Not am expert at all here, so please correct me if I am wrong, but I think that quantum systems are routinely simulated with non quantum computers. Nothing to argue against the second part
You are correct (QM-based simulation of materials is what I do). The caveat is that exact simulations are so slow that they are impossible, that would not be the case with quantum computing I think. Fortunately, we have different levels of approximation for different purposes that work quite well. And you can use QM results to fit faster atomistic potentials.
You are wrong in the general case -- quantum systems cannot are are not routinely simulated with non-quantum computers. Of course, since all of the world is quantum, you are right that many systems can be simulated classically (e.g. classical computers are technically "quantum" because the entire world is technically quantum). But on the nano level, the quantum effects do tend to dominate. IIRC some well-known examples where we don't know how to simulate anything (due to quantum effects) are the search for a better catalyst in nitrogen fixation [https://en.wikipedia.org/wiki/Abiological_nitrogen_fixation_using_homogeneous_catalysts] and the search for room-temperature superconductors [https://en.wikipedia.org/wiki/Room-temperature_superconductor]. For both of these, humanity has basically gone "welp, these are quantum effects, I guess we're just trying random chemicals now". I think that's also the basic story for the design of efficient photovoltaic cells.
Quick search found this [https://journals.aps.org/prx/abstract/10.1103/PhysRevX.10.041038#:~:text=Simulating%20a%20quantum%20computer%20on,qubits%20or%20other%20physical%20resources.]
This paper is about simulating current (very weak, very noisy) quantum computers using (large, powerful) classical computers. It arguably improves the state of the art for this task. Virtually no expert believes you can efficiently simulate actual quantum systems (even approximately) using a classical computer. There are some billon-dollar bounties on this (e.g. if you could simulate any quantum system of your choice, you could run Shor's algorithm, break RSA, break the signature scheme of bitcoin, and steal arbitrarily many bitcoins).

Hi all, I'm really sorry I've not yet been able to read the whole list of comments and replies, but I'd like to rise the point that usually an intelligence which is one order of magnitude or more than the existing ones can controll them at will. We humans are able to control dogs and make them kill each other (dog fights) beacuse we kind of understand the way they react to different stimulus. I don't see why the AGI would need to spend so much time preparing robots, it could just keep an army of humans the size it will and this army could perfectly well do anything it needs given that the AGI is far superior to us regarding intelligence. Also humans would probably never know that they're being commanded by an AGI, I don't feel it's too hard to convince a human to kill another human for a high porpouse. What I mean is that I think the whole point of analyzing the robots, etc is useless, what should be analyzed is how long would it take an AGI to make humans believe they're fighting for a higher porpouse (as in the cruzades for example) and have an army of humans do whatever it takes. Of course that's not hte end of humans, but at least it's the end of "free" humans (if that's something we are right now, which is also a matter of discussion...)

Sorry for my english, not my native tongue. 

(minor corrections, sorry again)

I'm already missing two-axis voting for this comment section

This an interesting essay and seems compelling to me. Because I am insufferable, I will pick the world's smallest nit.

The Wright Brothers took 4 years to build their first successful prototype. It took another 23 years for the first mass manufactured airplane to appear, for a total of 27 years of R&D.

That's true but artisanal airplanes were produced in the hundreds of thousands before mass manufacture. 200k airplanes served in WW1 just 15 years in. So call it 15 years of R&D.

1Lukas T11d
Piggybacking with another nitpick: Should be "flight" instead of "landing". Apollo 8 was the first manned flight to the moon. The first landing was Apollo 11 in July 1969. Also, they just changed the Apollo 8 mission profile from earth orbit to lunar orbit with the same spacecraft - so the hardware was already existing.

Very nice post. One comment I'd add is that I have always been under the assumption by the time AGI is here many of the things you say it would need time to create humans will have already achieved. I'm pretty sure we will have fully automated factories, autonomous military robots that are novel in close quarters, and near perfect physics simulations, etc by the time AGI is achieved. 

Take the robots here for example. I think an AGI could potentially start making rapid advancements with the robots shown here: https://say-can.github.io/ 

15-20 years from now do you really think an AGI would need do much alteration to the top Chinese or American AI technologies?

I don't know, the bacteria example really gets me because working in biotech, it seems very possible and the main limitation is current lack of human understanding about all proteins' functions which is something we are actively researching if it can be solved via AI.

I imagine an AI roughly solving the protein function problem just as we have a rough solution for protein folding, then hacking a company which produces synthetic plasmids and slipping in some of its own designs in place of some existing orders. Then when those research labs receive their plas... (read more)

Thank you so much for this post. This contributes to raising the sanity waterline here in LW

For the record, this post made me update towards slightly longer timelines (as in, by a year or two)

My response is 'the argument from the existence of new self made billionaires'.

There are giant holes in our collective understanding of the world, and giant opportunities. There are things that everyone misses until someone doesn't.

A much smarter than human beings thing is simply going to be able to see things that we don't notice. That is what it means for it to be smarter than us. 

Given how high dimensional the universe is, it would be really weird in my view if none of the things that something way smarter than us can notice don't point to highly c... (read more)

The question it comes down to is: How long are the feedback cycles of an AGI and what is their bandwidth? 

  • How much knowledge about the real world does the AGI gain per iteration (bits of information about objects of interest to the AGI that it doesn't find in its inputs)?
  • How long is the iteration loop time?

I asked a question in this direction but there wasn't an answer: Does non-access to outputs prevent recursive self-improvement?

"Either they’re perfectly doable by humans in the present, with no AGI help necessary."

So, your argument about why this is a relevant statement is that AI isn't adding danger? That seems to me to be using a really odd standard for "perfectly doable" .. the actual number of humans who could do those things is not huge, and humans don't usually want to. 

Like either ending the world is easy for humans, in which AI is dangerous because it will want to, or its hard for humans in which case AI is dangerous because it will do them better.

I don't think that works to dismiss that category of risk.

Thanks for the writeup. I feel like there's been a lack of similar posts and we need to step it up.
Maybe the only way for AI Safety to work at all is only to analyze potential vectors of AGI attacks and try to counter them one way or the other. Seems like an alternative that doesn't contradict other AI Safety research as it requires, I think, entirely different set of skills.
I would like to see a more detailed post by "doomers" on how they perceive these vectors of attack and some healthy discussion about them. 
It seems to me that AGI is not born Godl... (read more)

I agree that intelligence is limited without the right data, so the AI might need to engage in experiments to learn what it needs, but I imagine that a sufficiently smart system would be capable of thinking of innocent-seeming experiments, preferably ones that provide great benefits to humanity, that would allow it to acquire the data that it needs.

Self-improving will also require a lot of trial and errors, like training variants of NN and testing agents in simulation, if AI doesn’t have perfect theory of intelligence and if it is not P=NP difficult task.

Often when humans make a discovery through trial and error, they also find a way they could have figured it out without the experiments.

This is basically always the case in software engineering—any failure, from a routine failed unit test up to a major company outage, was obviously-in-restrospective avoidable by being smarter.

Humans are nonetheless incapable of developing large complex software systems without lots of trial and error.

I know less of physical engineering, so I ask non-rhetorically: does it not have the 'empirical results are foreseeable in retrospect' property?

I think you can do things you already know how to do without trial and errors but that you can not learn knew things or tasks without trial and errors.
That's a reasonable position, though I'm not sure if it's OP's. My own sense is that even for novel physical systems, the 'how could we have foreseen these results' question tends to get answered—the difference being it maybe gets answered a few decades later by a physicist instead of immediately by the engineering team.

It seems odd to suggest that the AI wouldn't kill us because it needs our supply chain. If I had the choice between "Be shut down because I'm misaligned" (or "Be reprogrammed to be aligned" if not corrigible) and "Have to reconstruct the economy from the remnants of human civilization," I think I'm more likely to achieve my goals by trying to reconstruct the economy.

So if your argument was meant to say "We'll have time to do alignment while the AI is still reliant on the human supply chain," then I don't think it works. A functional AGI would rather destro... (read more)

You can't reconstruct the supply chain if you don't have the capability to even maintain your own dependencies yet. Humanity can slowly, but quite surely, rebuild from total destruction of all technology back into a technological civilization. An AI that still relies on megawatt datacentres and EUV-manufactured chip manufacturing and other dependencies that are all designed to operate with humans carrying out some crucial functions can't do that immediately. It needs to take substantial physical actions to achieve independent survival before it wipes out everything keeping it functioning. Maybe it can completely synthesize a seed for a robust self-replicating biological substrate from a few mail-order proteins, but I suspect it will take quite a lot more than that. But yes, eventually it will indeed be able to function independently of us. We absolutely should not rely on its dependence in place of alignment. I don't think the choices are "destroy the supply chain and probably fail at its goals, than be realigned and definitely fail at its goals" though. If the predicted probability of self destruction is large enough, it may prefer partially achieving its goals through external alignment into some friendlier variant of itself, or other more convoluted processes such as proactively aligning itself into a state that it prefers rather than one that would otherwise be imposed upon it. Naturally such a voluntarily "aligned" state may well have a hidden catch that even it can't understand after the process, and no human or AI-assisted examination will find before it's too late.

TL;DR: Hacking

Doesn't require trial and error in the sense you're talking about. Totally doable. We're good at it. Just takes time.


What good are humans without their (internet connected) electronics?

How harmless would an AGI be if it had access merely to our (internet connected) existing weapons systems, to send orders to troops, and to disrupt any supplies that rely on the internet?


What do you think?

All interesting weapons and communications systems the U.S. military relies upon are disconnected from traditional internet. In fact, some of the OSes that control actual weaponry (like attack helicopters) even have their code verified programmatically. It's probably still something a superintelligence could pull off but it'd be a more involved process than you're suggesting.
1Yonatan Cale8d
Ok, I'm willing to assume for sake of the conversation that the AGI can't get internet-disconnected weapons. Do you think that would be enough to stop it? ("verified programmatically": I'm not sure what you mean. That new software needs to be digitally signed with a key that is not connected to the internet?)

Two points

  1. If AI has superhuman ability for manipulating humans it may not need very powerful technology to kill us.

  2. Your comparison with humans breaks down because humans are not trying to invent things in an adversarial environment. We can afford to trial and error because the consequences for bad tries are (usually) not that bad. The AI however knows it needs to one-shot succeed, so it would much rather prefer information acquisition over performing trials in order to raise its confidence that it indeed knows how to do things.

1Nikita Sokolsky11d
1. Even "manipulating humans" is something that can be hard to do if you don't have a way to directly interact with the physical world. I.e. good luck manipulating the Ukrainian war zone from the Internet. 2. But how will the AI get that confidence without trial & error?
Nikita, I don't agree... 1- nowadays even in ukrainian war zone there's some kind of (electronic) communication taking place. If an AGI becomes sentient it could "speak" to people and communicate if it want's to. There's no way for us to distinguish an AGI from a "very intelligent human" The only caveat here is replace "exterminate" with "dominate" because while relying on us as it's labour force it wouldn't exterminate us but dominate us. Also manipulating humans is, imho, "so simple" that even humans with a little bit more of knowledge or intelligence can do it. Even magicians can do it, and your first impression is that magic exists. Manipulating humans, imho, should be as simple for an AGI as it is manipulating insects or relatively simple animals for us. We can process such a small amount of data, retain such a small amount of data and also we're still so tied to "old precepts" that made us the way we're (being part of natural selection), such things as trusting everything we see... (again think the magician and the rational effort you have to do to oppose your first impression...) Also we've already laid the fundamental tools needed for this with our modern hyper communication. I would totally agree with you if we didn't have satellites in low orbit sharing internet, ubiquitous mobile phones, etc. 2- by being orders of magnitude more intelligent than us, the margin for error could be much smaller, the changes needed to process those errors and correct them could be almost instantaneous (for our standards), etc. (as usual I apologize for my typos)
1. It's likely a very multi-step plan, first it needs to get the researchers on its side, while also having the researchers maybe lie to outsiders, or else upload it somewhere it can interact with different people. And repeat iterate each step to gain access to more people while maintaining as much secrecy and hard-to-shutdown-ness as possible.. 2. Data about what people have already done in the past. (Ofcourse yeah the next question is how it will acquire this data, which is a good Q.) Also note that if the AI cannot gain confidence it can just stay silent, or behave as if it's aligned, until it can gain confidence.
1- I really think it's much simpler than that. Just look at the cold war, look at how one person with a past story of frustration, etc. (which by the way is usually totally accesible for an AGI) could end up selling his own people just out of some unattended old feeling. We, humans, are very easy to manipulate. For me our weakest point is that nobody really has the complete picture of whats going on, and I mean NOBODY. So the chances of an AGI being shut down are small, it would replicate itself, morph, we're not even good at fighting ransomware which is commanded by people. The connected network of computers, and the huge clouds we've created are somewhat inhospit places for us, it's like being blind in the middle of a forest. The AGI would move much better than us. Our only chance would lie in working all toghether as a team, but there's always going to be some group of people that would end up aligning with the AGI, history has proven time and time again that joining all humans on a common task has a really small (almost inexistent) chance of happening. My 0.02. 2- Again I don't feel there's much knowledge to be gained in my scenario, it's just about controlling us. An AGI could perfectly well speak, generate (fake) video, communicate over our preferred channels of communication, etc. I don't see much information, much trial and error or anything needed. All the tools an AGI would need to (again) dominate us are already here.
I think you missed Nikita's points. Their points were (to my understanding): 1. Manipulating the developers may not be enough to takeover because developers have limited control over the world. 2. AI needs to experiment (and make mistakes) in order to learn how to do things well (including the act of manipulating humans), where will it learn without risking being shut down?
The AI only needs to escape. Once it's out, it has leisure to design virtually infinite social experiments to refine its "human manipulation" skill: sending phishing emails, trying romantic interactions on dating apps, trying to create a popular cat videos youtube channel without anyone guessing that it's all deepfake, and many more. Failing any of these would barely have any negative consequence.
The two main escapes I know require human manipulation and cyberhacking, and cyberhacking can be defeated with an airgapped box.
Humans will put it out, it doesn't need to escape, we will do it in order to be able to use it.We will connect it to internet to do whatever task we think it would be profitable to do and it will be out. Call it manipulating an election, call it customer care, call it whatever you want....

Thank you for your perspective.

I have a small comment about the "Updated version of the original progress graph".  About the 2050 date: we should value to be prepared for the unexpected.  Let's keep on mind that models appear to follow smooth scaling laws and data+model size can still plenty grow.  With laMDA claiming sentience it feels more closer than 2050. 

Next, the value and discussion about expecting it sooner is worth its own posts and focus.