Summary: Yudkowsky argues that an unaligned AI will figure out a way to create self-replicating nanobots, and merely having internet access is enough to bring them into existence. Because of this, it can very quickly replace all human dependencies for its existence and expansion, and thus pursue an unaligned goal, e.g. making paperclips, which will most likely end up in the extinction of humanity.
I however will write below why I think this description massively underestimates the difficulty in creating self-replicating nanobots (even assuming that they are physically possible), which requires focused research in the physical domain, and is not possible without involvement of top-tier human-run labs today.
Why it matters? Some of the assumptions of pessimistic AI alignment researchers, especially by Yudkowsky, rest fundamentally on the fact that the AI will find quick ways to replace humans required for the AI to exist and expand.
- We have to get AI alignment right the first time we build a Super-AI, and there are no ways to make any corrections after we've built it
- As long as the AI does not have a way to replace humans outright, even if its ultimate goal may be non-aligned, it can pursue proximate goals that are aligned and safe for it to do. Alignment research can continue and can attempt to make the AI fully aligned or shut it down before it can create nanobots.
- The first time we build a Super-AI, we don't just have to make sure it's aligned, but we need it to perform a pivotal act like create nanobots to destroy all GPUs
- I argue below that this framing may be bad because it means performing one of the most dangerous steps first — creating nanobots — which may be best performed by an AI that is much more aligned than a first attempt
What this post is not about: I make no argument about the feasibility of (non-biological) self-replicating nanobots. There may be fundamental reasons why they are impossible/difficult (even for superintelligent AIs)/will not outcompete biological life, an interesting question that is explored more by bhaut. I also don't claim that AI alignment doesn't matter. I think that it's extremely important, but I think it's unlikely that (1) a one-shot process will lead to it and also (2) that one-shot is necessary; I actually think that this kind of thinking increases risk.
Finally, I don't claim that there aren't easier ways to kill all, or almost all, humans, for example pandemics or causing nuclear wars. However, most of these scenarios do not leave any good paths for an AI to expand because there would be no way to get more of its substrate (e.g. GPUs) or power supplies.
Core argument
Building something like a completely new type of nanobot is a massive research undertaking. Even under the assumption that an AI is much more intelligent and can learn and infer from much less data, it cannot do so from no data.
Building a new type of nanobot (not based on biological life) requires not just the ability to design from existing capabilities, but actually doing completely new experiments on how the nanomachinery that is going to be used to do this interacts with itself and the external world. It isn't possible to completely cut out all experiments from the design process, because at least some of the experiments will be about how the physical world works. If you don't know anything about physics, you clearly can't design any kind of machine; I am pretty certain that right now we do not know enough about nanomachines to design a new kind of non-biological self-replicating nanobot that immediately works out of the box.
To build it, you would need high quality labs to do very well specified experiments, build prototypes in later stages and report detailed information on how they failed, until you could arrive at a first sample of a self-replicating nanobot, at which point the AGI might be in a position to replace all humans.
Counterargument 1: We can build some complex machines from blueprints, and they work the first time. As an example, we can certainly design a complex electronics product, manufacture the PCB and add all the chips and other parts. If an experienced engineer does this, there is a good chance it will work the first time. However, new nanomachines would be different, because they cannot be assembled from parts that are already extremely well studied in isolation. We make chips such that when they are used in their specified way, their behaviour is extremely predictable, but no such components currently exist in the world of nanomachines.
Counterargument 2: The AI can simply simulate everything instead of performing any physical experiments. All of the required laws of physics are known: The standard model describes the microscopic world extremely well, and (microscopic) gravity is irrelevant for constructing nanobot, so no (currently unknown) physical theory unifying all laws would be required. While it is indeed possible or even likely that the standard model theoretically describes all details of a working nanobot with the required precision, the problem is that in practice it is impossible to simulate large physical systems using it. Many complex physical systems are still largely modelled empirically (ad-hoc models validated using experiments) rather than it being possible to derive them from first principles. While physicists sometimes claim to derive things from first principles, in practice these derivations often ignore a lot of details which still has to be justified using experiments. An AI can also make progress on better simulation, but simulating complex nanomachines outright is exceedingly unlikely.
Counterargument 3: Nanobots already exist, the AI will just use existing biology. Existing biology is indeed good for making self-replicating nanobots, but at least two difficult problems will remain: To make any kind of effective use of the network of nanobots, it will require creating a communication network using cells that allows them to come together to execute some more complex software to at least connect to the internet (and thus back to the AI). That's still a monumental task to achieve using biological systems and would still require a lot of research.
Counterargument 4: The AI can do the experiments in secret, or hide the true nature of the experiments in things that seem aligned. This could certainly be relevant in the long run, especially if we want the AI to solve complex problems. But on shorter timescales, most of what the AI would need to learn is going to be extremely specific to nanomachines. You do not get data about this by making completely unrelated experiments that do not involve nanotechnology.
Significance for AI alignment
I don't claim that this means we don't need alignment, or that an AI won't eventually be able to build nanobots (if it is feasible at all, of course) — just that it seems highly possible to delay this step by years, if it is the intention of the operator to do so (and it has some minimal cooperation on this from the rest of the world).
This means that it is possible to study AIs with capabilities potentially far exceeding human capabilities. Alignment is likely an iterative process and no one-shot solution exists, but that's probably ok, because well-enough aligned AIs can coexist with humans, be studied, and be improved for the next iteration, without immediately seeing human bodies only as bags of atoms to be harvested to do other things.
Pivotal acts
I think pivotal acts may be a bad idea in general. The arguments for this have been spelled out before, for example by Andrew_Critch. However, even if one believes (a) in the feasibility of nanobots and (b) pivotal acts are necessary, then using nanobots to carry out a pivotal action might be a really bad idea.
If someone decides that a pivotal act should be carried out using nanobots (either on their own or by this being the suggested best option by an AI), they might be inclined to do anything to perform any physical acts necessary for the AI to achieve this, making the AI much more dangerous if it is not perfectly aligned (which in itself may be an impossible problem). Pivotal acts that do not require giving an AI full human-equivalent or better physical capabilities would be much safer (probably still a bad idea).
How could this argument change in the future
I think my argument that building nanobots without massive help from first-tier human labs is true now and for at least several more years. However, over several decades, some things might change substantially, for example:
1. Production processes could be much more automated than they are now. If factories exist that can make new, complex machines without major retooling, they could make it much simpler for an AI to perform completely new tasks in the physical world with minimal human interaction
2. Robotics can advance. Humanoid robots that can peform many physical human tasks may make it possible for the AI to build completely human-independent labs.
3. More research into building nanomachines that eliminates more of the unknowns.
4. More biotech research could also allow more control of the physical world, for example if cell networks can be built to perform some tasks.
5. It is maybe possible that quantum computers are powerful enough to simulate much more complex physical processes than is possible on classical computers, and thus an AI with access to a quantum computer may be able to massively reduce the number of experiments necessary to construct nanobots. (Feels unlikely to me but cannot a priori be excluded)
So whether an AI can achieve nanobots just via internet access will potentially have to be re-evaluated in the future when one or more of these are developed.
What this shouldn't be taken as
I am not arguing alignment is not important, in fact I think it is very important.
1. Regardless of the feasibility of nanobots, I think there are probably vastly easier ways to kill all humans, however they would leave an AI without a practical way to continue existing or expanding.
2. It is also possible that many scenarios exist where an AI does (1) by accident.
3. AIs don't need nanobots to take control of humans and human institutions. There are many other ways that involve using humans against each other and are probably exploitable by much less powerful AIs. (Crucially, however, they do depend on some humans and might require different tools to control AGI risk.)
4. I don't think that this makes the AI alignment trivial to solve, or claim that this gives a recipe to solve it. I just think that it may be fruitful to look into research that starts from moderately aligned AIs and figures out how to get them more aligned rather than having to perform a very risky one-shot experiment.
I suspect my own intuitions regarding this kind of thing are similar to Eliezer's. It's possible that my intuitions are wrong, but I'll try to share some thoughts.
It seems that we think quite differently when it comes to this, and probably it's not easy for us to achieve mutual understanding. But even if all we do here is to scratch the surface, that may still be worthwhile.
As mentioned, maybe my intuitions are wrong. But maybe your intuitions are wrong (or maybe both). I think a desirable property of plans/strategies for alignment would be robustness to either of us being wrong about this 🙂
Among people who would suspect me of underestimating the difficulty of developing advanced nanotech, I would suspect most of them of underestimating the difference made by superintelligence + the space of options/techniques/etc that a superintelligent mind could leverage.
In Drexler's writings about how to develop nanotech, one thing that was central to his thinking was protein folding. I remember that in my earlier thinking, it felt likely to me that a superintelligence would be able to "solve" protein folding (to a sufficient extent to do what it wanted to do). My thinking was "some people describe this as infeasible, but I would guess for a superintelligence to be able to do this".
This was before AlphaFold. The way I remember it, the idea of "solving" protein folding was more controversial back in the day (although I tried to google this now, and it was harder to find good examples than I thought it would be).
As humans we are "pathetic" in terms of our mental abilities. We have a high error-rate in our reasoning / the work we do, and this makes us radically more dependent on tight feedback-loops with the external world.
This point of error-rate in one's thinking is a really important think. With lower error-rate + being able to do much more thinking / mental work, it becomes possible to learn and do much much more without physical experiments.
The world, and guesses regarding how the world works (including detail-oriented stuff relating to chemistry/biology), are highly interconnected. For minds that are able to do vast about of high-quality low error-rate thinking, it may be possible to combine subtle and noisy Bayesian evidence into overwhelming evidence. And for approaches it explores regarding this kind of thinking, it can test how good it does at predicting existing info/data that it already has access to.
The images below are simple/small examples of the kind of thinking I'm thinking of. But I suspect superintelligences can take this kind of thinking much much further.
The post Einstein's Arrogance also feels relevant here.
It is infeasible to simulate in "full detail", but it's not clear what we should conclude based on that. Designs that work are often robust to the kinds of details that we need precise simulation in order to simulate correctly.
The specifics of the level of detail that is needed depends on the design/plan in question. A superintelligence may be able to work with simulations in a much less crude way than we do (with much more fine-grained and precise thinking in regards to what can be abstracted away for various parts of the "simulation").
The construction-process/design the AI comes up with may:
Here are some relevant quotes from Radical Abudance by Eric Drexler:
"Coping with limited knowledge is a necessary part of design and can often be managed. Indeed, engineers designed bridges long before anyone could calculate stresses and strains, which is to say, they learned to succeed without knowledge that seems essential today. In this light, it’s worth considering not only the extent and precision of scientific knowledge, but also how far engineering can reach with knowledge that remains incomplete and imperfect.
For example, at the level of molecules and materials—the literal substance of technological systems—empirical studies still dominate knowledge. The range of reliable calculation grows year by year, yet no one calculates the tensile strength of a particular grade of medium-carbon steel. Engineers either read the data from tables or they clamp a sample in the jaws of a strength-testing machine and pull until it breaks. In other words, rather than calculating on the basis of physical law, they ask the physical world directly.
Experience shows that this kind of knowledge supports physical calculations with endless applications. Building on empirical knowledge of the mechanical properties of steel, engineers apply physics-based calculations to design both bridges and cars. Knowing the empirical electronic properties of silicon, engineers apply physics-based calculations to design transistors, circuits, and computers.
Empirical data and calculation likewise join forces in molecular science and engineering. Knowing the structural properties of particular configurations of atoms and bonds enables quantitative predictions of limited scope, yet applicable in endless circumstances. The same is true of chemical processes that break or make particular configurations of bonds to yield an endless variety of molecular structures.
Limited scientific knowledge may suffice for one purpose but not for another, and the difference depends on what questions it answers. In particular, when scientific knowledge is to be used in engineering design, what counts as enough scientific knowledge is itself an engineering question, one that by nature can be addressed only in the context of design and analysis.
Empirical knowledge embodies physical law as surely as any calculation in physics. If applied with caution—respecting its limits—empirical knowledge can join forces with calculation, not just in contemporary engineering, but in exploring the landscape of potential technologies.
To understand this exploratory endeavor and what it can tell us about human prospects, it will be crucial to understand more deeply why the questions asked by science and engineering are fundamentally different. One central reason is this: Scientists focus on what’s not yet discovered and look toward an endless frontier of unknowns, while engineers focus on what has been well established and look toward textbooks, tabulated data, product specifications, and established engineering practice. In short, scientists seek the unknown, while engineers avoid it.
Further, when unknowns can’t be avoided, engineers can often render them harmless by wrapping them in a cushion. In designing devices, engineers accommodate imprecise knowledge in the same way that they accommodate imprecise calculations, flawed fabrication, and the likelihood of unexpected events when a product is used. They pad their designs with a margin of safety.
The reason that aircraft seldom fall from the sky with a broken wing isn’t that anyone has perfect knowledge of dislocation dynamics and high-cycle fatigue in dispersion-hardened aluminum, nor because of perfect design calculations, nor because of perfection of any other kind. Instead, the reason that wings remain intact is that engineers apply conservative design, specifying structures that will survive even unlikely events, taking account of expected flaws in high-quality components, crack growth in aluminum under high-cycle fatigue, and known inaccuracies in the design calculations themselves. This design discipline provides safety margins, and safety margins explain why disasters are rare."
"Engineers can solve many problems and simplify others by designing systems shielded by barriers that hold an unpredictable world at bay. In effect, boxes make physics more predictive and, by the same token, thinking in terms of devices sheltered in boxes can open longer sightlines across the landscape of technological potential. In my work, for example, an early step in analyzing APM systems was to explore ways of keeping interior working spaces clean, and hence simple.
Note that designed-in complexity poses a different and more tractable kind of problem than problems of the sort that scientists study. Nature confronts us with complexity of wildly differing kinds and cares nothing for our ability to understand any of it. Technology, by contrast, embodies understanding from its very inception, and the complexity of human-made artifacts can be carefully structured for human comprehension, sometimes with substantial success.
Nonetheless, simple systems can behave in ways beyond the reach of predictive calculation. This is true even in classical physics.
Shooting a pool ball straight into a pocket poses no challenge at all to someone with just slightly more skill than mine and a simple bank shot isn’t too difficult. With luck, a cue ball could drive a ball to strike another ball that drives yet another into a distant pocket, but at every step impacts between curved surfaces amplify the effect of small offsets, and in a chain of impacts like this the outcome soon becomes no more than a matter of chance—offsets grow exponentially with each collision. Even with perfect spheres, perfectly elastic, on a frictionless surface, mere thermal energy would soon randomize paths (after 10 impacts or so), just as it does when atoms collide.
Many systems amplify small differences this way, and chaotic, turbulent flow provides a good example. Downstream turbulence is sensitive to the smallest upstream changes, which is why the flap of a butterfly’s wing, or the wave of your hand, will change the number and track of the storms in every future hurricane season.
Engineers, however, can constrain and master this sort of unpredictability. A pipe carrying turbulent water is unpredictable inside (despite being like a shielded box), yet can deliver water reliably through a faucet downstream. The details of this turbulent flow are beyond prediction, yet everything about the flow is bounded in magnitude, and in a robust engineering design the unpredictable details won’t matter."
Eliezer's scenario does assume the involvement of human labs (he describes a scenario where DNA is ordered online).
I agree with you here (although I would hope that much of this iteration can be done in quick succession, and hopefully in a low-risk way) 🙂
Btw, I very much enjoyed this talk by Ralph Merkle. It's from 2009, but it's still my favorite talk from every talk I've seen on the topic. Maybe you would enjoy it as well. He briefly touches upon the topic of simulations at 28:50, but the entire talk is quite interesting IMO:
This tweet from Eliezer seems relevant btw. I would give similar answers to all of the questions he lists that relate to nanotechnology (but I'd be somewhat more hedged/guarded - e.g. replacing "YES" with "PROBABLY" for some of them).