137 Contra EY: Can AGI destroy us without trial & error?

13th Jun 2022

18 min read

137

NB: I've never published on LW before and apologize if my writing skills are not in line with LW's usual style. This post is an edited copy of the same article in my blog.

EY published an article last week titled “AGI Ruin: A List of Lethalities”, which explains in detail why you can’t train an AGI that won’t try to kill you at the first chance it gets, as well as why this AGI will eventually appear given humanity’s current trajectory in computer science. EY doesn’t explicitly state a timeline over which AGI is supposed to destroy humanity, but it’s implied that this will happen rapidly and humanity won’t have enough time to stop it. EY doesn’t find the question of how exactly AGI will destroy humanity too interesting and explains it as follows:

My lower-bound model of "how a sufficiently powerful intelligence would kill everyone, if it didn't want to not do that" is that it gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they're dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery. (Back when I was first deploying this visualization, the wise-sounding critics said "Ah, but how do you know even a superintelligence could solve the protein folding problem, if it didn't already have planet-sized supercomputers?" but one hears less of this after the advent of AlphaFold 2, for some odd reason.) The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth's atmosphere, get into human bloodstreams and hide, strike on a timer. Losing a conflict with a high-powered cognitive system looks at least as deadly as "everybody on the face of the Earth suddenly falls over dead within the same second".

Let’s break down EY’s proposed plan for “Skynet” into the requisite engineering steps:

Design a set of proteins that can form the basis of a “nanofactory”
Adapt the protein design to the available protein printers that accept somewhat-anonymous orders over the Internet
Design “diamondoid bacteria” that can kill all of humanity and that can be successfully built by the “nanofactory”. The bacteria must be self replicating and able to extract power from solar energy for self sustainance.
Execute the evil plan by sending out the blueprints to unsuspecting protein printing corporations and rapidly taking over the world afterwards

The plan above looks great for a fiction book and EY is indeed a great fiction writer in addition to his Alignment work, but there’s one unstated assumption: the AGI will not only be able to design everything using whatever human data it has available, but it will also execute the evil plan without needing lots of trial and error like mortal human inventors do. And surprisingly this part of EY’s argument gets little objection. A visual representation of my understanding of EY’s mental model of AGI vs. Progress is as follows:

How fast can humans develop novel technologies?

Humans are the only known AGI that we have available for reference, so we could look at the fastest known examples of novel engineering to see how fast an AGI might develop something spectacular and human-destroying. Patrick Collison of Stripe keeps a helpful page titled “Fast” with notable “examples of people quickly accomplishing ambitious things together”. The engineering entries include:

P-80 Shooting Star, a World War II aircraft designed and built in 143 days by Lockheed Martin.
Spirit of St. Louis, another airplane designed and built in just 60 days.
USS Nautilus. The world’s first nuclear submarine was launched in 1173 days or 3.2 years.
Apollo 8, where it took 134 days between “what if we go to the moon?” to the actual landing.
The iPod, which took 290 days between the first designs and the device being launched to Apple stores.
Moderna’s vaccine against COVID, which took 45 days between the virus being sequenced and the first batch of the actual vaccine getting manufactured.

Sounds very quick? Definitely, but the problem is that Patrick’s examples are all for engineering constructs building on top of decades of previous work. Designing a slightly better airplane in 1944 is not the same as creating the very first airplane in 1903, as by 1944 humans had 30 years of experience to build on top of. And if your task is to build diamondoid bacteria manufactured by a protein-based nanomachinery factory you’re definitely in Wright Brothers territory. So let’s instead look at timelines of novel technologies that had little prior research and infrastructure to fall back on:

The Wright Brothers took 4 years to build their first successful prototype. It took another 23 years for the first mass manufactured airplane to appear, for a total of 27 years of R&D.
It took 63 years for submarines to progress from “proof of concept” in the form of Nautilus in 1800 to the first submarine capable of sinking a warship in the form of The Hunley in 1863.
It took 40 years between Einstein publishing his paper on the theory of relativity and the Atomic bomb being dropped on Hiroshima and Nagasaki. It took another 9 years to open the world’s first nuclear powerplant.
It took 36 years from the first time mRNA vaccines were synthesized in 1984 and the first mRNA-based vaccine to be mass-manufactured.
It took at least 30 years of development for LED technology to go from an experimental to being useful for commercial lighting.
It took around 30 years for digital photography to overtake film photography in terms of costs and quality.

Now… you might object to this by correctly calling out the downside of human R&D:

Human intellect is extremely inferior to what AGI will be capable of. At best, the collective intellectual capacity of the entire mankind will be equal to that of AGI. At worst, even all of our 8 billion brains will be collectively an order of magnitude dumber
Humans have to sleep, eat, drink, vacation, while AGI can work 24/7
Humans are more-or-less single threaded and require a coordinated effort to work on complicated research, which is additionally bogged down by the inefficiencies of trying to coordinate a large number of humans at the same time.

And this is all true! Humans are nothing to a hypothetical team of AGIs. But the problem is… until AGI can build its fantastical diamondoid bacteria, it remains dependent on imperfect human hands to conduct its R&D in the real world, as they’ll be the only way for AGI to interact with the physical world for a very long time. Remember that AGI’s one downside is that it will be running on motionless computers, unlike humans who have been running around with 4 limbs since the beginning of civilization. Which in turn brings us to the 30+ years timeline of developing a novel engineering construct, no matter how smart the AI will be.

Unstoppable intellect meets complexity of the universe

Plenty of content has been written about how human scientific progress is slowing down, my favorite being WTF Happened in 1971 and Scott’s 2018 post Is Science Slowing Down?. In the second article Scott brings up the paper Are Ideas Getting Harder to Find? by Bloom, Jones, Reenen & Webb (2018), which has the following neat graph:

We can see how the amount of investment into R&D is growing every year, but productive research is more or less flat. The paper brings up a relatable example in the section on semiconductor research:

The striking fact, shown in Figure 4, is that research effort has risen by a factor of 18 since 1971. This increase occurs while the growth rate of chip density is more or less stable: the constant exponential growth implied by Moore’s Law has been achieved only by a massive increase in the amount of resources devoted to pushing the frontier forward. Assuming a constant growth rate for Moore’s Law, the implication is that research productivity has fallen by this same factor of 18, an average rate of 6.8 percent per year. If the null hypothesis of constant research productivity were correct, the growth rate underlying Moore’s Law should have increased by a factor of 18 as well. Instead, it was remarkably stable. Put differently, because of declining research productivity, it is around 18 times harder today to generate the exponential growth behind Moore’s Law than it was in 1971.

Not even AGI could get around this problem and would likely require an exponentially growing amount of resources as it delves deeper into engineering and fundamental research. It is definitely true that AGI itself will be rapidly increasing its intellect, but can this really continue indefinitely? At some point all the low hanging fruit missed by human AI researchers will be exhausted and AGI will have to spend years in real world time to make significant improvements of its own IQ. Granted, AGI will rapidly reach an IQ far beyond human reach, but all this intellectual power will still have to contend with the difficulties of novel research.

What does AGI want?

Since AGI development is completely decoupled from mammalian evolution here on Earth, its quite likely to eventually exhibit “blue and orange” morality, behaving in a completely alien and unpredictable fashion, with no humanly understandable motivations or a way for humans to relate to what the AGI wants. That being said, AGI is likely to fall into one of two buckets regardless of its motivations:

AGI will act rationally to achieve whatever internal goals it has, no matter how alien and weird to us. I.e. “collect all the shiny objects into every bucket-like object in the universe” or “convert the universe into paperclips”. This means the AGI will carefully plan ahead and attempt to preserve its own existence to fulfill the internal overreaching goals.
AGI doesn’t have any goals at all beyond “kill all humans!”. It just acts as a rogue terrorist, attempting to destroy humans without the slightest concern for its own survival. If all humans die and the AGI dies alongside them, that’s fine according to the AGI’s internal motivations. There’s no attempt to ensure overarching continuation of its goals (like “collect all strawberries!”) once humanity is extinct.

Let’s start with scenario #1 by looking at… the common pencil.

What does it take to make a pencil?

A classic pamphlet called I, Pencil walks us through what it takes to make a common pencil from scratch:

Trees have to be cut down, which takes saws, trucks, rope and countless other gear.
The cut down trees have to be transported to the factory by rail, which in turns needs laid down rail, trains, rail stations, cranes, etc.
The trees are cut down with metal saws, waxes and dried. This consumes a lot of electricity, which is in turn made by burning fuel, making solar panel or building hydroelectric powerplants.
At the center of the pencil is a chunk of graphite mined in Sri Lanka, using loads of equipment and transported by ship.
The pencil is covered with lacquer, that’s in turn made by growing castor beans.
There’s the piece of metal holding the eraser, mined from shafts underground.
And finally there’s the rubber mined in Indonesia and once again transported by ship.

The point to this entire story is that making something as simple as a pencil requires a massive supply chain employing tens of millions of non-AGI humans. If you want any hope of continuing to exist, you need to replace the labor of this gigantic global army of humans with AGI-controlled robots or “diamondoid bacteria” or whatever other magical contraption you want to invoke. Which will require lots of trial & error and decades of building out a reliable AGI-controlled supply chain that could be reused to fight humans at the drop of a hat. Because otherwise AGI will risk seeing its brilliant plan fail, resulting in humans going berserk against any machines capable of running said AGI and ending its reign of Earth long before it has a chance to start in earnest. And if the AGI doesn’t understand this… how smart is it really?

YOLO AGI?

But what if the AGI is absolutely ruthless and doesn’t care if it goes up in flames as soon as humans are gone? Then we could get to the end of humanity much faster with options like:

Get humans to think that their enemy is about to launch a nuclear strike and launch a strike of their own, similar to WarGames
Design a supervirus capable of destroying humanity. Think a combination of HIV’s lethality with the ease of spread of measles.
Plant a powerful information hazard into humanity’s consciousness that will somehow trigger us to kill each other as soon as possible. Also see Roko’s Basilisk and Rokoko’s Basilisk, an infohazard responsible for the birth of X Æ A-12.
Design the mother of all greenhouse gases and convince humanity to make tons of it, eventually resulting in the planet heating up to extreme temperatures.
Provide advanced nuke designs and materials covertly to very bad people and manipulate them into sabotaging world order.

The problem with all these scenarios is similar:

Either they’re perfectly doable by humans in the present, with no AGI help necessary. I.e. we’ve been barely saved from WW3 by a Soviet officer, long before AGI was on anyone’s mind. So at worst AGI will somewhat increase the risks of this happening in the short term... Or they require lots of trial & error to develop into functional production-ready technologies, once again creating a big problem for AGI, as it has to rely on imperfect humans to do the novel R&D. This will still take decades, even if AGI won’t worry about a full takeover of supply chains.

But what about AlphaFold?

Another possible counter-argument is that AGI will figure out the laws of the universe through internal modeling and will be able to simulate and perfect its amazing inventions without needing trial & error in the physical world. EY mentions AlphaFold as an example of such a breakthrough. If you haven’t heard about it, here’s a description of the Protein Folding Problem from Wiki that AlphaFold 2 solved better than any other prior system back in 2020:

Proteins consist of chains of amino acids which spontaneously fold, in a process called protein folding, to form the three dimensional (3-D) structures of the proteins. The 3-D structure is crucial to the biological function of the protein. However, understanding how the amino acid sequence can determine the 3-D structure is highly challenging, and this is called the "protein folding problem". The "protein folding problem" involves understanding the thermodynamics of the interatomic forces that determine the folded stable structure, the mechanism and pathway through which a protein can reach its final folded state with extreme rapidity, and how the native structure of a protein can be predicted from its amino acid sequence.

According to EY, the existence of AlphaFold shows that a smart enough AGI could eventually learn to manipulate proteins into “nanofactories” that could be used to interact with the physical world. However the current version still has major limitations:

Whilst it may be considered the gold standard of protein prediction, there is still room for improvement as AlphaFold only provides one prediction of a stable structure for each protein; however, proteins are dynamic and can change shape throughout the body, for example under different pH conditions. Additionally, AlphaFold is not able to determine the shape of multiprotein complexes and does not include any ligands such as cofactors or metals, meaning no data are available for such interactions. Despite these shortcomings, AlphaFold is the first step in protein prediction technology, and it is likely that solving these challenges will also be done so using deep learning and AI.

In other words, there’s still a huge leap between “can predict simple protein structures” and “can design protein nanofactories without experimentation”. AGI will likely need to spend decades managing laboratory experiments to fill in the gaps around our understanding of how proteins work. And don’t forget that currently available commercial protein printers are not perfect, especially if you’re trying to print a novel structure of far bigger complexity than anything else on the planet. Also see this excellent comment on the subject by anonymousaisafety.

What if AGI settles for a robot army?

Cybernetic army from I, Robot

We could also think of the diamondoid bacteria as just an example of what the AI can do and turn to other ways it could manipulate the physical reality, that are closer to the technology that we already have today. There’s impressive videos of Boston Dynamics robots doing all kinds of stunts, so we could ask if perhaps AGI could utilize their existing progress to quickly give itself a way to interact with the outside world. However this would still involve many roadblocks:

The robots are running on pre-programmed runs in a human-designed course and are not capable of navigating through unknown terrain. AGI will have to push cybernetics to amazing heights in order for the robots to be useful in novel spaces.
AGI could just make a version of itself and give sentience to every individual machine, but this will require a massive leap in computational technology, as a single machine is very unlikely to be able to host an instance of an AGI.
AGI could have a command and control center next to every batch of robots or try to control them over large distances, but this is again a daunting engineering task when you have little room for mistake.
Battery technology is currently too lousy and even the more simple dog-like robots can only last for 90 minutes. There’s no published information on how long the human-like versions could last on battery, but we could reason it cannot be more than an hour. Therefore AGI will need to achieve a big leap in battery life before anything resembling the “I, Robot” machinery could be built.
Boston Dynamics robots don’t exist in large enough quantities for a complete overtake of global supply chains and it will take years of human labor to get more robot factories online.
Humans are pretty good at warfare and your robots have to be extremely good to beat them in battle, far better than what’s currently available.

[added] Also see this excellent comment by anonymousaisafety explaining why "just takeover the human factories" is not a quick path to success (slightly edited below):

The tooling and structures that a superintelligent AGI would need to act autonomously does not actually exist in our current world, so before we can be made into paperclips, there is a necessary period of bootstrapping where the superintelligent AGI designs and manufactures new machinery using our current machinery. Whether it's an unsafe AGI that is trying to go rogue, or an aligned AGI that is trying to execute a "pivotal act", the same bootstrapping must occur first.
Case study: a common idea I've seen while lurking on LessWrong and SSC/ACT for the past N years is that an AGI will "just" hack a factory and get it to produce whatever designs it wants. This is not how factories work. There is no 100% autonomous factory on Earth that an AGI could just take over to make some other widget instead. Even highly automated factories are:
Highly automated to produce a specific set of widgets,
Require physical adjustments to make different widgets, and...
Rely on humans for things like input of raw materials, transferring in-work products between automated lines, and the testing or final assembly of completed products. 3D printers are one of the worst offenders in this regard. The public perception is that a 3D printer can produce anything and everything, but they actually have pretty strong constraints on what types of shapes they can make and what materials they can use, and usually require multi-step processes to avoid those constraints, or post-processing to clean up residual pieces that aren't intended to be part of the final design, and almost always a 3D printer is producing sub-parts of a larger design that still must be assembled together with bolts or screws or welds or some other fasteners.
So if an AGI wants to have unilateral control where it can do whatever it wants, the very first prerequisite is that it needs to create a futuristic, fully automated, fully configurable, network-controlled factory -- which needs to be built with what we have now, and that's where you'll hit the supply constraints for things like lead times on part acquisition. The only way to reduce this bootstrapping time is to have this stuff designed in advance of the AGI, but that's backwards from how modern product development actually works. We design products, and then we design the automated tooling to build those products. If you asked me to design a factory that would be immediately usable by a future AGI, I wouldn't know where to even start with that request. I need the AGI to tell me what it wants, and then I can build that, and then the AGI can takeover and do their own thing.
A related point that I think gets missed is that our automated factories aren't necessarily "fast" in a way you'd expect. There's long lead times for complex products. If you have the specialized machinery for creating new chips, you're still looking at ~14-24 weeks from when raw materials are introduced to when the final products roll off the line. We hide that delay by constantly building the same things all of the time, but it's very visibly when there's a sudden demand spike -- that's why it takes so long before the supply can match the demand for products like processors or GPUs. I have no trouble with imagining a superintelligent entity that could optimize this and knock down the cycle time, but there's going to be physical limits to these processes and the question is can it knock it down to 10 weeks or to 1 week? And when I'm talking about optimization, this isn't just uploading new software because that isn't how these machines work. It's designing new, faster machines or redesigning the assembly line and replacing the existing machines, so there's a minimum time required for that too before you can benefit from the faster cycle time on actually making things. Once you hit practical limits on cycle time, the only way to get more stuff faster is to scale wide by building more factories or making your current factories even larger.
If we want to try and avoid the above problems by suggesting that the AGI doesn't actually hack existing factories, but instead it convinces the factory owners to build the things it wants instead, there's not a huge difference -- instead of the prerequisite here being "build your own factory", it's "hostile takeover of existing factory", where that hostile takeover is either done by manipulation, on the public market, as a private sale, or by outbidding existing customers (e.g. have enough money to convince TSMC to make your stuff instead of Apple's), or with actual arms and violence. There's still the other lead times I've mentioned for retooling assembly lines and actually building a complete, physical system from one or more automated lines.

My prediction is that it will take AGI at least 30 years of effort to get to a point where it can comfortably rely on the robots to interact with the physical world and not have to count on humans for its supply chain needs.

[added] What if AGI just simulates our physical world?

This idea goes hand-in-hand with idea that AlphaFold is the answer to all challenges in bioengineering. There are two separate assumptions here, both found in the field of computational complexity:

That an AGI can simulate the physical systems perfectly, i.e. physical systems are computable processes.
That an AGI can simulate the physical systems efficiently, i.e. either P = NP, or for some reason all of these interesting problems that the AGI is solving are NOT known to be isomorphic to some NP-hard problem.

I don't think these assumptions are reasonable. For a full explanation see this excellent comment by anonymousaisafety.

Mere mortals can’t comprehend AGI?

Another argument is that AGI will achieve such an incomprehensible level of intellect that it will become impossible to predict what it will be capable of. I mean, who knows, maybe with an IQ of 500 you could just magically turn yourself into a God and destroy Earth with a Thanos-style snap of your fingers? But I contend that even a creature with an IQ of 500 will be inherently limited by our physical universe and won’t magically become gain omniscience by virtue of its intellect alone. It will instead have to spend decades to get rid of using humans as a proxy, no matter how smart it could be potentially.

Does this mean EY is wrong and AGI is not a threat?

I believe that EY is only wrong about handwaving the difficulties of growing from a computer-based AGI to an AGI capable of operating independently from the human race. In the long-term his predictions will likely come true, once AGI has enough time to go through the difficult R&D cycle of building the nanofactories and diamondoid bacteria. My predicted timeline is as follows:

AGI first appears somewhere around 2040, in line with the Metaculus prediction.
After a few years of peaceful coexistence, AI Alignment researchers are mocked for their doomer predictions and everyone thinks that AGI is perfectly safe. EY will keep writing blog posts about how everyone is wrong and AGI cannot be trusted. AGI might start working behind the shadows to try and get AI Alignment researchers silenced.
AGI spends decades convincing humanity to let it take over the global supply chains and to run complex experiments to manufacture advanced AGI-designed machinery, supposedly necessary to improve human living standards. This will likely take at least 30 years, as per our reference to how long it took to implement other gigantic breakthroughs in science.
Once the AGI is convinced that all the cards have fallen into place and humans could be safely removed, it will pull the plug and destroy us all.

Updated version of the original progress graph

I’m hoping that the AI Alignment movement tries to spend more time on the low level engineering details of “humanity goes poof” rather than handwaving everything away via science fiction concepts. Because otherwise it’s hard to believe that the FOOM scenario could ever come to fruition. And if FOOM is not the real problem, perhaps we could save humanity by managing AGI’s interactions with the physical world more carefully once it appears?

AI RiskAI TimelinesAI

Frontpage

137

Mentioned in

212What does it take to defend the world against out-of-control AGIs?

8Any further work on AI Safety Success Stories?

Contra EY: Can AGI destroy us without trial & error?

2Christopher “Chris” Upshaw

1Dirichlet-to-Neumann

New Comment

72 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:23 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]lc4y*680

I upvoted your post because it seems relatively lucid and raises some important points, but would like to say that I'm in the middle of writing a pretty long, detailed explanation of why I agree with most of the gripes (e.g. AIs can't use magic to mine coal/build nanobots) and yet the object-level conclusions here are still untrue. In practice, I seriously doubt we would have more than a year to live after the release of AGI with the long term planning and reasoning abilities of most accountants, even without FOOM. People here shouldn't assume that, because Eliezer never posted a detailed analysis on LessWrong, everyone on the doomer train is starting from unreasonable premises regarding how robot building and research could function in practice.

[-]Daniel Kokotajlo4y140

+1. If you don't write that post, I will. :)

And if you want feedback on your draft I'd be happy to give it a read and leave comments.

5lc4y

For sure; I think I'm about 45% of the way through, I'll send you a draft when it's about 90% done :)

1Chris van Merwijk4y

I'm also interested to read the draft, if you're willing to send it to me.

1Evan R. Murphy4y

The user who was authoring the draft has apparently deactivated their account. Are they still working on writing that post?

5nsokolsky4y

People here shouldn't assume that, because Eliezer never posted a detailed analysis on LessWrong, everyone on the doomer train is starting from unreasonable premises regarding how robot building and research could function in practice. I agree but unfortunately my Google-fu wasn't strong enough to find detailed prior explanations of AGI vs. robot research. I'm looking forward to your explanation.

4Yitz4y

I'm looking forward to reading your post!!

1mukashi4y

One year. Would you be willing to bet on that?

6Vanilla_cabs4y

It's nice that you're open to betting. What unambiguous sign would change your mind, about the speed of AGI takeover, long enough before it happens that you'd still have time to make a positive impact afterwards? Nobody is interested in winning a bet where winning means "mankind gets wiped".

4mukashi4y

Yes, that's the key issue. I'm not sure I can think of one. Do you have any ideas? I mean, what would be an unequivocal sign that AGI can take over in a year time? Something like a pre-AGI parasitizing a major computing center for X days before it is discovered in a plan to expand to other centres...? That would still not be a sign that we are pretty much f. up in a year, but definitely a data point towards things can go bad very quickly What data point would make you change your mind in the opposite direction? I mean, something that happens and you say: yes, we can go all die but this won't happen in a year so, but maybe in something like 30 years or more Edit: I posted two paragraphs originally in separate comments, unifying for the sake of clarity

4blackstampede4y

9DirectedEvolution4y

He has a $100 bet with Brian Caplan, inflation adjusted. EY took Brian’s money at the time of the bet, and pays back if he loses.

1Vanilla_cabs4y

Yes, but I don't know if he really did it. I see multiple problems with that implementation. First, the interest rate should be adjusted for inflation, otherwise the bet is about a much larger class of events than "end of the world". Next, there's a high risk that the "doom" better will have spent all their money by the time the bet expires. The "survivor" better will never see the color of their money anyway. Finally, I don't think it's interesting to win if the world ends. I think what's more interesting is rallying doubters before it's too late, in order to marginally raise our chances of survival.

3Yitz4y

It may still be useful as a symbolic tool, regardless of actual monetary value. $100 isn't all that much in the grand scheme of things, but it's the taking of the bet that matters.

[-]anonymousaisafety4y390

Related reading, if you're interested -- I tried to make these same arguments a few months ago:

1nsokolsky4y

Those are excellent comments! Do you mind if I add a few quotes from them to the post?

1anonymousaisafety4y

I don't mind.

[-]PeterMcCluskey4y260

The beginning of this post seems fairly good.

I agree that an AGI would need lots of trial and error to develop a major new technology.

I'm unsure whether an AGI would need to be as slow as humans about that trial and error. If it needs secrecy, that might be a big constraint. If it gets human cooperation, I'd expect it to improve significantly on human R&D speed.

I also see a nontrivial chance that humans will develop Drexlerian nanotech before full AGI.

Your post gets stranger toward the end.

I don't see much value in a careful engineering analysis of how an AGI might kill us. Most likely it would involve a complex set of strategies, including plenty of manipulation, with no one attack producing certain victory by itself, but with humanity being overwhelmed by the number of different kinds of attack. There's enough uncertainty in that kind of fight that I don't expect to get a consensus on who would win. The uncertainty ought to be scary enough that we shouldn't need to prove who would win.

6DirectedEvolution4y

Many kinds of uncertain attacks is not the strategy EY points at with his “diamondoid bacteria” idea. He’s worrying about a single undetectable attack with high chance of success using approaches that only an AGI can execute. Others worry about an AGI using many attacks, but that is ultimately unbeatable. Here’s you’re worrying about an AGI using many attacks that is beatable by humans, but not with confidence. These are distinct arguments, and we should be clear about which one is being made and responded to.

3nsokolsky4y

"but with humanity being overwhelmed by the number of different kinds of attack." But AGI will only be able to start carrying out these sneaky attacks once its fairly convinced it can survive without human help? Otherwise humans will notice the various problems propping up and might just decide to "burn all GPUs" which is currently an unimaginable act.. So AGI will have to act sneakily behind the scenes for a very long time. This is again coming back to the argument that humans have a strong upper hand as long as we've got monopoly on physical world manipulation.

[-]MinusGix4y*220

I initially wrote a long comment discussing the post, but I rewrote it as a list-based version that tries to more efficiently parcel up the different objections/agreements/cruxes.
This list ended up basically just as long, but I feel it is better structured than my original intended comment.

(Section 1): How fast can humans develop novel technologies

I believe you assume too much about the necessary time based on specific human discoveries.
- Some of your backing evidence just didn't have the right pressure at the time to go further (ex: submarines) which means that I think a more accurate estimate of the time interval would be finding the time that people started paying attention to the problem again (though for many things that's probably hard to find) and began deliberately working on/towards that issue.
  - Though, while I think focusing on when they began deliberately working is more accurate, I think there's still a notable amount of noise and basic differences due to the difference in ability to focus of humans relative to AGI, the unity (relative to a company), and the large amount of existing data in the future
- Other technologies I would expect were 'put off' because they're

... (read more)

[-]Adam Jermyn4y180

A crux here seems to be the question of how well the AGI can simulate physical systems. If it can simulate them perfectly, there's no need for real-world R&D. If its simulations are below some (high) threshold fidelity, it'll need actors in the physical world to conduct experiments for it, and that takes human time-scales.

A big point in favor of "sufficiently good simulation is possible" is that we know the relevant laws of physics for anything the AGI might need to take over the world. We do real-world experiments because we haven't managed to write simulation software that implements these laws at sufficiently high fidelity and for sufficiently complex systems, and because the compute cost of doing so is enormous. But in 20 years, an AGI running on a giant compute farm might both write more efficient simulation codes and have enough compute to power them.

[-]anonymousaisafety4y590

You're making two separate assumptions here, both found in the field of computational complexity:

That an AGI can simulate the physical systems perfectly, i.e. physical systems are computable processes.
That an AGI can simulate the physical systems efficiently, i.e. either P = NP, or for some reason all of these interesting problems that the AGI is solving are NOT known to be isomorphic to some NP-hard problem.

For (1), it might suffice to show that some physical system can be approximated by some algorithm, even if the true system is not known to be computable. Computability is a property of formal systems.

It is an open question if all real world physical processes are computable. Turing machines are described using natural numbers and discrete time steps. Real world phenomena rely on real numbers and continuous time. Arguments that all physical processes are computable are based on discretizing everything down to the Planck time, length, mass, temperature, and then assuming determinism.

This axiom tends to come as given if you already believe the "computable universe" hypothesis, "digital physics", or "Church-Turing-Deutsch" principle is true. In those hypotheses, the entire universe... (read more)

[-]Luke A Somers4y190

It seems like you're relying on the existence of exponentially hard problems to mean that taking over the world is going to BE an exponentially hard problem. But you don't need to solve every problem. You just need to take over the world.

Like, okay, the three body problem is 'incomputable' in the sense that it has chaotically sensitive dependence on initial conditions in many cases. So… don't rely on specific behavior in those cases on long time horizons without the ability to do small adjustments to keep things on track.

If the AI can detect most of the hard cases and avoid relying on them, and include robustness by having multiple alternate mechanisms and backup plans, even just 94% success on arbitrary problems could translate into better than that on an overall solution.

8Yair Halberstadt4y

This was specifically responding to the claim that an AI could solve problems without trial and error by perfectly simulating them, which I think it does a pretty reasonable job of shooting down.

6MSRayne4y

This deserves to be a whole post. You brought up points I, at least, had never considered before.

3nsokolsky4y

You should make a separate post on "Can AGI just simulate the physical world?". Will make it easier to find and reference in the future.

7anonymousaisafety4y

That was extracted from a much larger work I've been writing for the past 2 months. The above is less than ~10% of what I've written on the topic, and it goes much further than simulation problems. I am also trying to correct misunderstandings around distributed computation, hardware vs software inefficiency, improvements in performance from algorithmic gains, this community's accepted definition for "intelligence", the necessity or inevitability of self-improving systems, etc. I'll post it when done but in the meantime I'm just tossing various bits & pieces of it into debate wherever I see an opening to do so.

4LGS4y

Proteins and other chemical interactions are governed by quantum mechanics, so the AGI would probably need a quantum computer to do a faithful simulation. And that's for a single, local interaction of chemicals; for a larger system, there are too many particles to simulate, so some systems will be as unpredictable as the weather in 3 weeks.

4Luke A Somers4y

The distribution of outcomes is much more achievable and much more useful than determining the one true way some specific thing will evolve. Like, it's actually in-principle achievable, unlike making a specific pointlike prediction of where a molecular ensemble is going to be given a starting configuration (QM dependency? Not merely a matter of chaos). And it's actually useful, in that it shows which configurations have tightly distributed outcomes and which don't, unlike that specific pointlike prediction.

1LGS4y

What does "the distribution of outcomes" mean? I feel like you're just not understanding the issue. The interaction of chemical A with chemical B might always lead to chemical C; the distribution might be a fixed point there. Yet you may need a quantum computer to tell you what chemical C is. If you just go "well I don't know what chemical it's gonna be, but I have a Bayesian probability distribution over all possible chemicals, so everything is fine", then you are in fact simulating the world extremely poorly. So poorly, in fact, that it's highly unlikely you'll be able to design complex machines. You cannot build a machine out of building blocks you don't understand. Maybe the problem is that you don't understand the computational complexity of quantum effects? Using a classical computer, it is not possible to efficiently calculate the "distribution of outcomes" of a quantum process. (Not the true distribution, anyway; you could always make up a different distribution and call it your Bayesian belief, but this borders on the tautological.)

4mukashi4y

Not am expert at all here, so please correct me if I am wrong, but I think that quantum systems are routinely simulated with non quantum computers. Nothing to argue against the second part

3jacopo4y

You are correct (QM-based simulation of materials is what I do). The caveat is that exact simulations are so slow that they are impossible, that would not be the case with quantum computing I think. Fortunately, we have different levels of approximation for different purposes that work quite well. And you can use QM results to fit faster atomistic potentials.

2LGS4y

You are wrong in the general case -- quantum systems cannot are are not routinely simulated with non-quantum computers. Of course, since all of the world is quantum, you are right that many systems can be simulated classically (e.g. classical computers are technically "quantum" because the entire world is technically quantum). But on the nano level, the quantum effects do tend to dominate. IIRC some well-known examples where we don't know how to simulate anything (due to quantum effects) are the search for a better catalyst in nitrogen fixation and the search for room-temperature superconductors. For both of these, humanity has basically gone "welp, these are quantum effects, I guess we're just trying random chemicals now". I think that's also the basic story for the design of efficient photovoltaic cells.

2Measure4y

Quick search found this

4LGS4y

This paper is about simulating current (very weak, very noisy) quantum computers using (large, powerful) classical computers. It arguably improves the state of the art for this task. Virtually no expert believes you can efficiently simulate actual quantum systems (even approximately) using a classical computer. There are some billon-dollar bounties on this (e.g. if you could simulate any quantum system of your choice, you could run Shor's algorithm, break RSA, break the signature scheme of bitcoin, and steal arbitrarily many bitcoins).

3Donald Hobson3y

Simulating a nanobot is a lower bar than simulating a quantum computer. Also a big unsolved problem like P=NP might be easy to an ASI.

1LGS3y

It remains to be seen whether it's easier. It could also be harder (the nanobot interacts with a chaotic environment which is hard to predict). "Also a big unsolved problem like P=NP might be easy to an ASI." I don't understand what this means. The ASI may indeed be good at proving that P does not equal NP, in which case it has successfully proven that it itself cannot do certain tasks (the NP complete ones). Similarly, if the ASI is really good at complexity theory, it could prove that BQP is not equal to BPP, at which point is has proven that it itself cannot simulate quantum computation on a classical computer. But that still does not let it simulate quantum computation on a classical computer!

2Donald Hobson3y

The reason for the exponential term is that a quantum computer uses a superposition of exponentially many states. A well functioning nanomachine doesn't need to be in a superposition of exponentially many states. For that matter, the AI can make its first nanomachines using designs that are easy to reason about. This is a big hole in any complexity theory based argument. Complexity theory only applies in the worst case. The AI can actively optimize its designs to be easy to simulate. Its possible the AI shows P!=NP, but also possible the AI shows P=NP, and finds a fast algorithm. Maybe the AI realizes that BQP=BPP.

1LGS3y

Maybe the AI can make its first nanomachines easy to reason about... but maybe not. We humans cannot predict the outcome of even relatively simple chemical interactions (resorting to the lab to see what happens). That's because these chemical interactions are governed by the laws of quantum mechanics, and yes, they involve superpositions of a large number of states. "Its possible the AI shows P!=NP, but also possible the AI shows P=NP, and finds a fast algorithm. Maybe the AI realizes that BQP=BPP." It's also possible the AI finds a way to break the second law of thermodynamics and to travel faster than light, if we're just gonna make things up. (I have more confidence in P!=NP than in just about any phsyical law.) If we only have to fear an AI in a world where P=NP, then I'm personally not afraid.

2Donald Hobson3y

Not sure why you are quite so confident P!=NP. But that doesn't really matter. Consider bond strength. Lets say the energy taken to break a C-C bond varies by ±5% based on all sorts of complicated considerations involving the surrounding chemical structure. An AI designing a nanomachine can just apply 10% more energy than needed. A quantum computer doesn't just have a superposition of many states, its a superposition carefully chosen such that all the pieces destructively and constructively interfere in exactly the right way. Not that the AI needs exact predictions anyway. Also, the AI can cheat. As well as fundamental physics, it has access to a huge dataset of experiments conducted by humans. It doesn't need to deduce everything from QED, it can deduce things from random chemistry papers too.

[-]matute84y130

Hi all, I'm really sorry I've not yet been able to read the whole list of comments and replies, but I'd like to rise the point that usually an intelligence which is one order of magnitude or more than the existing ones can controll them at will. We humans are able to control dogs and make them kill each other (dog fights) beacuse we kind of understand the way they react to different stimulus. I don't see why the AGI would need to spend so much time preparing robots, it could just keep an army of humans the size it will and this army could perfectly well do anything it needs given that the AGI is far superior to us regarding intelligence. Also humans would probably never know that they're being commanded by an AGI, I don't feel it's too hard to convince a human to kill another human for a high porpouse. What I mean is that I think the whole point of analyzing the robots, etc is useless, what should be analyzed is how long would it take an AGI to make humans believe they're fighting for a higher porpouse (as in the cruzades for example) and have an army of humans do whatever it takes. Of course that's not hte end of humans, but at least it's the end of "free" humans (if that's something we are right now, which is also a matter of discussion...)

Sorry for my english, not my native tongue.

(minor corrections, sorry again)

[-]Rafael Harth4y130

I'm already missing two-axis voting for this comment section

[-]Tim Liptrot4y120

This an interesting essay and seems compelling to me. Because I am insufferable, I will pick the world's smallest nit.

The Wright Brothers took 4 years to build their first successful prototype. It took another 23 years for the first mass manufactured airplane to appear, for a total of 27 years of R&D.

That's true but artisanal airplanes were produced in the hundreds of thousands before mass manufacture. 200k airplanes served in WW1 just 15 years in. So call it 15 years of R&D.

1Lukas Trötzmüller4y

Piggybacking with another nitpick: Should be "flight" instead of "landing". Apollo 8 was the first manned flight to the moon. The first landing was Apollo 11 in July 1969. Also, they just changed the Apollo 8 mission profile from earth orbit to lunar orbit with the same spacecraft - so the hardware was already existing.

[-]scott loop4y70

Very nice post. One comment I'd add is that I have always been under the assumption by the time AGI is here many of the things you say it would need time to create humans will have already achieved. I'm pretty sure we will have fully automated factories, autonomous military robots that are novel in close quarters, and near perfect physics simulations, etc by the time AGI is achieved.

Take the robots here for example. I think an AGI could potentially start making rapid advancements with the robots shown here: https://say-can.github.io/

15-20 years from now do you really think an AGI would need do much alteration to the top Chinese or American AI technologies?

[-]Aditya Jain4y60

I don't know, the bacteria example really gets me because working in biotech, it seems very possible and the main limitation is current lack of human understanding about all proteins' functions which is something we are actively researching if it can be solved via AI.

I imagine an AI roughly solving the protein function problem just as we have a rough solution for protein folding, then hacking a company which produces synthetic plasmids and slipping in some of its own designs in place of some existing orders. Then when those research labs receive their plas... (read more)

[-]mukashi4y60

Thank you so much for this post. This contributes to raising the sanity waterline here in LW

[-]Timothy Underwood4y51

My response is 'the argument from the existence of new self made billionaires'.

There are giant holes in our collective understanding of the world, and giant opportunities. There are things that everyone misses until someone doesn't.

A much smarter than human beings thing is simply going to be able to see things that we don't notice. That is what it means for it to be smarter than us.

Given how high dimensional the universe is, it would be really weird in my view if none of the things that something way smarter than us can notice don't point to highly c... (read more)

[-]Yitz4y30

For the record, this post made me update towards slightly longer timelines (as in, by a year or two)

[-]Gunnar_Zarncke4y30

The question it comes down to is: How long are the feedback cycles of an AGI and what is their bandwidth?

How much knowledge about the real world does the AGI gain per iteration (bits of information about objects of interest to the AGI that it doesn't find in its inputs)?
How long is the iteration loop time?

I asked a question in this direction but there wasn't an answer: Does non-access to outputs prevent recursive self-improvement?

[-]Christopher “Chris” Upshaw4y20

"Either they’re perfectly doable by humans in the present, with no AGI help necessary."

So, your argument about why this is a relevant statement is that AI isn't adding danger? That seems to me to be using a really odd standard for "perfectly doable" .. the actual number of humans who could do those things is not huge, and humans don't usually want to.

Like either ending the world is easy for humans, in which AI is dangerous because it will want to, or its hard for humans in which case AI is dangerous because it will do them better.

I don't think that works to dismiss that category of risk.

[-]Ozyrus4y20

Thanks for the writeup. I feel like there's been a lack of similar posts and we need to step it up.
Maybe the only way for AI Safety to work at all is only to analyze potential vectors of AGI attacks and try to counter them one way or the other. Seems like an alternative that doesn't contradict other AI Safety research as it requires, I think, entirely different set of skills.
I would like to see a more detailed post by "doomers" on how they perceive these vectors of attack and some healthy discussion about them.
It seems to me that AGI is not born Godl... (read more)

[-]Chris_Leong4y20

I agree that intelligence is limited without the right data, so the AI might need to engage in experiments to learn what it needs, but I imagine that a sufficiently smart system would be capable of thinking of innocent-seeming experiments, preferably ones that provide great benefits to humanity, that would allow it to acquire the data that it needs.

4Donald Hobson3y

The ideal experiment takes 5 seconds on a random hacked IOT smart toilet and is noticed by no one.

[-]avturchin4y20

Self-improving will also require a lot of trial and errors, like training variants of NN and testing agents in simulation, if AI doesn’t have perfect theory of intelligence and if it is not P=NP difficult task.

[-]ADifferentAnonymous4y10

Often when humans make a discovery through trial and error, they also find a way they could have figured it out without the experiments.

This is basically always the case in software engineering—any failure, from a routine failed unit test up to a major company outage, was obviously-in-restrospective avoidable by being smarter.

Humans are nonetheless incapable of developing large complex software systems without lots of trial and error.

I know less of physical engineering, so I ask non-rhetorically: does it not have the 'empirical results are foreseeable in retrospect' property?

1Dirichlet-to-Neumann4y

I think you can do things you already know how to do without trial and errors but that you can not learn knew things or tasks without trial and errors.

1ADifferentAnonymous4y

That's a reasonable position, though I'm not sure if it's OP's. My own sense is that even for novel physical systems, the 'how could we have foreseen these results' question tends to get answered—the difference being it maybe gets answered a few decades later by a physicist instead of immediately by the engineering team.

[-]awenonian4y10

It seems odd to suggest that the AI wouldn't kill us because it needs our supply chain. If I had the choice between "Be shut down because I'm misaligned" (or "Be reprogrammed to be aligned" if not corrigible) and "Have to reconstruct the economy from the remnants of human civilization," I think I'm more likely to achieve my goals by trying to reconstruct the economy.

So if your argument was meant to say "We'll have time to do alignment while the AI is still reliant on the human supply chain," then I don't think it works. A functional AGI would rather destro... (read more)

4JBlack4y

You can't reconstruct the supply chain if you don't have the capability to even maintain your own dependencies yet. Humanity can slowly, but quite surely, rebuild from total destruction of all technology back into a technological civilization. An AI that still relies on megawatt datacentres and EUV-manufactured chip manufacturing and other dependencies that are all designed to operate with humans carrying out some crucial functions can't do that immediately. It needs to take substantial physical actions to achieve independent survival before it wipes out everything keeping it functioning. Maybe it can completely synthesize a seed for a robust self-replicating biological substrate from a few mail-order proteins, but I suspect it will take quite a lot more than that. But yes, eventually it will indeed be able to function independently of us. We absolutely should not rely on its dependence in place of alignment. I don't think the choices are "destroy the supply chain and probably fail at its goals, than be realigned and definitely fail at its goals" though. If the predicted probability of self destruction is large enough, it may prefer partially achieving its goals through external alignment into some friendlier variant of itself, or other more convoluted processes such as proactively aligning itself into a state that it prefers rather than one that would otherwise be imposed upon it. Naturally such a voluntarily "aligned" state may well have a hidden catch that even it can't understand after the process, and no human or AI-assisted examination will find before it's too late.

[-]Yonatan Cale4y10

TL;DR: Hacking

Doesn't require trial and error in the sense you're talking about. Totally doable. We're good at it. Just takes time.

What good are humans without their (internet connected) electronics?

How harmless would an AGI be if it had access merely to our (internet connected) existing weapons systems, to send orders to troops, and to disrupt any supplies that rely on the internet?

What do you think?

3lc4y

All interesting weapons and communications systems the U.S. military relies upon are disconnected from traditional internet. In fact, some of the OSes that control actual weaponry (like attack helicopters) even have their code verified programmatically. It's probably still something a superintelligence could pull off but it'd be a more involved process than you're suggesting.

1Yonatan Cale4y

Ok, I'm willing to assume for sake of the conversation that the AGI can't get internet-disconnected weapons. Do you think that would be enough to stop it? ("verified programmatically": I'm not sure what you mean. That new software needs to be digitally signed with a key that is not connected to the internet?)

[+]Shmi4y-70

[+][comment deleted]4y10

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

137

Contra EY: Can AGI destroy us without trial & error?

137

How fast can humans develop novel technologies?

Unstoppable intellect meets complexity of the universe

What does AGI want?

What does it take to make a pencil?

YOLO AGI?

But what about AlphaFold?

What if AGI settles for a robot army?

[added] What if AGI just simulates our physical world?

Mere mortals can’t comprehend AGI?

Does this mean EY is wrong and AGI is not a threat?

137