Drexler on AI Risk

by PeterMcCluskey 9 min read1st Feb 20197 comments


Eric Drexler has published a book-length paper on AI risk, describing an approach that he calls Comprehensive AI Services (CAIS).

His primary goal seems to be reframing AI risk discussions to use a rather different paradigm than the one that Nick Bostrom and Eliezer Yudkowsky have been promoting. (There isn't yet any paradigm that's widely accepted, so this isn't a Kuhnian paradigm shift; it's better characterized as an amorphous field that is struggling to establish its first paradigm). Dueling paradigms seems to be the best that the AI safety field can manage to achieve for now.

I'll start by mentioning some important claims that Drexler doesn't dispute:

  • an intelligence explosion might happen somewhat suddenly, in the fairly near future;
  • it's hard to reliably align an AI's values with human values;
  • recursive self-improvement, as imagined by Bostrom / Yudkowsky, would pose significant dangers.

Drexler likely disagrees about some of the claims made by Bostrom / Yudkowsky on those points, but he shares enough of their concerns about them that those disagreements don't explain why Drexler approaches AI safety differently. (Drexler is more cautious than most writers about making any predictions concerning these three claims).

CAIS isn't a full solution to AI risks. Instead, it's better thought of as an attempt to reduce the risk of world conquest by the first AGI that reaches some threshold, preserve existing corrigibility somewhat past human-level AI, and postpone need for a permanent solution until we have more intelligence.

Stop Anthropomorphising Intelligence!

What I see as the most important distinction between the CAIS paradigm and the Bostrom / Yudkowsky paradigm is Drexler's objection to having advanced AI be a unified, general-purpose agent.

Intelligence doesn't require a broad mind-like utility function. Mindspace is a small subset of the space of intelligence.

Instead, Drexler suggests composing broad AI systems out of many, diverse, narrower-purpose components. Normal software engineering produces components with goals that are limited to a specific output. Drexler claims there's no need to add world-oriented goals that would cause a system to care about large parts of spacetime.

Systems built out of components with narrow goals don't need to develop much broader goals. Existing trends in AI research suggest that better-than-human intelligence can be achieved via tools that have narrow goals.

The AI-services model invites a functional analysis of service development and delivery, and that analysis suggests that practical tasks in the CAIS model are readily or naturally bounded in scope and duration. For example, the task of providing a service is distinct from the task of developing a system to provide that service, and tasks of both kinds must be completed without undue cost or delay.

Drexler's main example of narrow goals is Google's machine translation, which has no goals beyond translating the next unit of text. That doesn't imply any obvious constraint on how sophisticated its world-model can be. It would be quite natural for AI progress continue with components whose "utility function" remains bounded like this.

It looks like this difference between narrow and broad goals can be turned into a fairly rigorous distinction, but I'm dissatisfied with available descriptions of the distinction. (I'd also like better names for them.)

There are lots of clear-cut cases: narrow-task software that just waits for commands, and on getting a command, it produces a result, then returns to its prior state; versus a general-purpose agent which is designed to maximize the price of a company's stock.

But we need some narrow-task software to remember some information, and once we allow memory, it gets complicated to analyze whether the software's goal is "narrow".

Drexler seems less optimistic than I am about clarifying this distinction:

There is no bright line between safe CAI services and unsafe AGI agents, and AGI is perhaps best regarded as a potential branch from an R&D-automation/CAIS path.

Because there is no bright line between agents and non-agents, or between rational utility maximization and reactive behaviors shaped by blind evolution, avoiding risky behaviors calls for at least two complementary perspectives: both (1) design-oriented studies that can guide implementation of systems that will provide requisite degrees of e.g., stability, reliability, and transparency, and (2) agent-oriented studies support design by exploring the characteristics of systems that could display emergent, unintended, and potentially risky agent-like behaviors.

It may be true that a bright line can't be explained clearly to laymen, but I have a strong intuition that machine learning (ML) developers will be able to explain it to each other well enough to agree on how to classify the cases that matter.

6.7 Systems composed of rational agents need not maximize a utility function There is no canonical way to aggregate utilities over agents, and game theory shows that interacting sets of rational agents need not achieve even Pareto optimality. Agents can compete to perform a task, or can perform adversarial tasks such as proposing and criticizing actions; from an external client's perspective, these uncooperative interactions are features, not bugs (consider the growing utility of generative adversarial networks ). Further, adaptive collusion can be cleanly avoided: Fixed functions, for example, cannot negotiate or adapt their behavior to align with another agent's purpose. ... There is, of course, an even more fundamental objection to drawing a boundary around a set of agents and treating them as a single entity: In interacting with a set of agents, one can choose to communicate with one or another (e.g. with an agent or its competitor); if we assume that the agents are in effect a single entity, we are assuming a constraint on communication that does not exist in the multi-agent model. The models are fundamentally, structurally inequivalent.

A Nanotech Analogy

Drexler originally described nanotechnology in terms of self-replicating machines.

Later, concerns about grey goo caused him to shift his recommendations toward a safer strategy, where no single machine would be able to replicate itself, but where the benefits of nanotechnology could be used recursively to improve nanofactories.

Similarly, some of the more science-fiction style analyses suggest that an AI with recursive self-improvement could quickly conquer the world.

Drexler's CAIS proposal removes the "self-" from recursive self-improvement, in much the same way that nanofactories removed the "self-" from nanobot self-replication, replacing it with a more decentralized process that involves preserving more features of existing factories / AI implementations. The AI equivalent of nanofactories consists of a set of AI services, each with a narrow goal, which coordinate in ways that don't qualify as a unified agent.

It sort of looks like Drexler's nanotech background has had an important influence on his views. Eliezer's somewhat conflicting view seems to follow a more science-fiction-like pattern of expecting one man to save (or destroy?) the world. And I could generate similar stories for mainstream AI researchers.

That doesn't suggest much about who's right, but it does suggest that people are being influenced by considerations that are only marginally relevant.

How Powerful is CAIS

Will CAIS be slower to develop than recursive self-improvement? Maybe. It depends somewhat on how fast recursive self-improvement is.

I'm uncertain whether to believe that human oversight is compatible with rapid development. Some of that uncertainty comes from confusion about what to compare it to (an agent AGI that needs no human feedback? or one that often asks humans for approval?).

Some people expect unified agents to be more powerful than CAIS. How plausible are their concerns?

Some of it is disagreement over the extent to which human-level AI will be built with currently understood techniques. (See Victoria Krakovna's chart of what various people believe about this).

Could some of it be due to analogies to people? We have experience with some very agenty businessmen (e.g. Elon Musk or Bill Gates), and some bureaucracies made up of not-so-agenty employees (the post office, or Comcast). I'm tempted to use the intuitions I get from those examples to conclude that an unified agent AI will be more visionary and eager to improve. But I worry that doing so anthropomorphises intelligence in a way that misleads, since I can't say anything more rigorous than "these patterns look relevant".

But if that analogy doesn't help, then the novelty of the situation hints we should distrust Drexler's extrapolation from standard software practices (without placing much confidence in any alternative).

Cure Cancer Example

Drexler wants some limits on what gets automated. E.g. he wants to avoid a situation where an AI is told to cure cancer, and does so without further human interaction. That would risk generating a solution for which the system misjudges human approval (e.g. mind uploading or cryonic suspension).

Instead, he wants humans to decompose that into narrower goals (with substantial AI assistance), such that humans could verify that the goals are compatible with human welfare (or reject those that are too hard too evaluate).

This seems likely to delay cancer cures compared to what an agent AGI would do, maybe by hours, maybe by months, as the humans check the subtasks. I expect most people would accept such a delay as a reasonable price for reducing AI risks. I haven't thought of a realistic example where I expect the delay would generate a strong incentive for using an agent AGI, but the cancer example is close enough to be unsettling.

This analysis is reassuring compared to Superintelligence, but not as reassuring as I'd like.

As I was writing the last few paragraphs, and thinking about Wei Dai's objections, I found it hard to clearly model how CAIS would handle the cancer example.

Some of Wei Dai's objections result from a disagreement about whether agent AGI has benefits. But his objections suggest other questions, for which I needed to think carefully in order to guess how Drexler would answer them: How much does CAIS depend on human judgment about what tasks to give to a service? Probably quite heavily, in some cases. How much does CAIS depend on the system having good estimates of human approval? Probably not too much, as long as experts are aware of how good those estimates are, and are willing and able to restrict access to some relatively risky high-level services.

I expect ML researchers can identify a safe way to use CAIS, but it doesn't look very close to an idiot-proof framework, at least not without significant trial and error. I presume there will in the long run be a need for an idiot-proof interface to most such services, but I expect those to be developed later.

What Incentives will influence AI Developers?

With grey goo, it was pretty clear that most nanotech developers would clearly prefer the nanofactory approach, due to it being safer, and having few downsides.

With CAIS, the incentives are less clear, because it's harder to tell whether there will be benefits to agent AGI's.

Much depends on the controversial assumption that relatively responsible organizations will develop CAIS well before other entities are able to develop any form of equally powerful AI. I consider that plausible, but it seems to be one of the weakest parts of Drexler's analysis.

If I knew that AI required expensive hardware, I might be confident that the first human-level AI's would be developed at large, relatively risk-averse institutions.

But Drexler has a novel(?) approach (section 40) which suggests that existing supercomputers have about human-level raw computing power. That provides a reason for worrying that a wider variety of entities could develop powerful AI.

Drexler seems to extrapolate current trends, implying that the first entity to generate human-level AI will look like Google or OpenAI. Developers there seem likely to be sufficiently satisfied with the kind of intelligence explosion that CAIS seems likely to produce that it will only take moderate concern about risks to deter them from pursuing something more dangerous.

Whereas a poorly funded startup, or the stereotypical lone hacker in a basement, might be more tempted to gamble on an agent AGI. I have some hope that human-level AI will require a wide variety of service-like components, maybe too much for a small organization to handle. But I don't like relying on that.

Presumably the publicly available AI services won't be sufficiently general and powerful to enable random people to assemble them into an agent AGI? Combining a robocar + Google translate + an aircraft designer

  • a theorem prover doesn't sound dangerous. Section 27.7 predicts that "senior human decision makers" would have access to a service with some strategic planning ability (which would have enough power to generate plans with dangerously broad goals), and they would likely restrict access to those high-level services. See also section 39.10 for why any one service doesn't need to have a very broad purpose.

I'm unsure where Siri and Alexa fit in this framework. Their designers have some incentive to incorporate goals that extend well into the future, in order to better adapt to individual customers, by improving their models of each customers desires. I can imagine that being fully compatible with a CAIS approach, but I can also imagine them being given utility functions that would cause them to act quite agenty.

How Valuable is Modularity?

CAIS may be easier to develop, since modularity normally makes software development easier. On the other hand, modularity seems less important for ML. On the gripping hand, AI developers will likely be combining ML with other techniques, and modularity seems likely to be valuable for those systems, even if the ML parts are not modular. Section 37 lists examples of systems composed of both ML and traditional software.

And as noted in a recent paper from Google, "Only a small fraction of real-world ML systems is composed of the ML code [...] The required surrounding infrastructure is vast and complex." [Sculley et al. 2015]

Neural networks and symbolic/algorithmic AI technologies are complements, not alternatives; they are being integrated in multiple ways at levels that range from components and algorithms to system architectures.

How much less important is modularity for ML? A typical ML system seems to do plenty of re-learning from scratch, when we could imagine it delegating tasks to other components. On the other hand, ML developers seem to be fairly strongly sticking to the pattern of assigning only narrow goals to any instance of an ML service, typically using high-level human judgment to integrate that with other parts.

I expect robocars to provide a good test of how much ML is pushing software development away from modularity. I'd expect if CAIS is generally correct, a robocar would have more than 10 independently trained ML modules integrated into the main software that does the driving, whereas I'd expect less than 10 if Drexler were wrong about modularity. My cursory search did not find any clear answer - can anyone resolve this?

I suspect that most ML literature tends to emphasize monolithic software because that's easier to understand, and because those papers focus on specific new ML features, to which modularity is not very relevant.

Maybe there's a useful analogy to markets - maybe people underestimate CAIS because very decentralized systems are harder for people to model. People often imagine that decentralized markets are less efficient that centralized command and control, and only seem to tolerate markets after seeing lots of evidence (e.g. the collapse of communism). On the other hand, Eliezer and Bostrom don't seem especially prone to underestimate markets, so I have low confidence that this guess explains much.

Alas, skepticism of decentralized systems might mean that we're doomed to learn the hard way that the same principles apply to AI development (or fail to learn, because we don't survive the first mistake).


MIRI has been worrying about the opaqueness of neural nets and similar approaches to AI, because it's hard to evaluate the safety of a large, opaque system. I suspect that complex world-models are inherently hard to analyze. So I'd be rather pessimistic if I thought we needed the kind of transparency that MIRI hopes for.

Drexler points out that opaqueness causes fewer problems under the CAIS paradigm. Individual components may often be pretty opaque, but interactions between components seem more likely to follow a transparent protocol (assuming designers value that). And as long as the opaque components have sufficiently limited goals, the risks that might hide under that opaqueness are constrained.

Transparent protocols enable faster development by humans, but I'm concerned that it will be even faster to have AI's generating systems with less transparent protocols.


The differences between CAIS and agent AGI ought to define a threshold, which could function as a fire alarm for AI experts. If AI developers need to switch to broad utility functions in order to compete, that will provide a clear sign that AI risks are high, and that something's wrong with the CAIS paradigm.

CAIS indicates that it's important to have a consortium of AI companies to promote safety guidelines, and to propagate a consensus view on how to stay on the safe side of the narrow versus broad task threshold.

CAIS helps reduce the pressure to classify typical AI research as dangerous, and therefore reduces AI researcher's motivation to resist AI safety research.

Some implications for AI safety researchers in general: don't imply that anyone knows whether recursive self-improvement will beat other forms of recursive improvement. We don't want to tempt AI researchers to try recursive self-improvement (by telling people it's much more powerful). And we don't want to err much in the other direction, because we don't want people to be complacent about the risks of recursive self-improvement.


CAIS seems somewhat more grounded in existing software practices than, say, the paradigm used in Superintelligence, and provides more reasons for hope. Yet it provides little reason for complacency:

The R&D-automation/AI-services model suggests that conventional AI risks (e.g., failures, abuse, and economic disruption) are apt to arrive more swiftly than expected, and perhaps in more acute forms. While this model suggests that extreme AI risks may be relatively avoidable, it also emphasizes that such risks could arise more quickly than expected.

I see important uncertainty in whether CAIS will be as fast and efficient as agent AGI, and I don't expect any easy resolution to that uncertainty.

This paper is a good starting point, but we need someone to transform it into something more rigorous.

CAIS is sufficiently similar to standard practices that it doesn't require much work to attempt it, and creates few risks.

I'm around 50% confident that CAIS plus a normal degree of vigilance by AI developers will be sufficient to avoid global catastrophe from AI.