Reframing Superintelligence: Comprehensive AI Services as General Intelligence

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Since the CAIS technical report is a gargantuan 210 page document, I figured I'd write a post to summarize it. I have focused on the earlier chapters, because I found those to be more important for understanding the core model. Later chapters speculate about more concrete details of how AI might develop, as well as the implications of the CAIS model on strategy. ETA: This comment provides updates based on more discussion with Eric.

The Model

The core idea is to look at the pathway by which we will develop general intelligence, rather than assuming that at some point we will get a superintelligent AGI agent. To predict how AI will progress in the future, we can look at how AI progresses currently -- through research and development (R&D) processes. AI researchers consider a problem, define a search space, formulate an objective, and use an optimization technique in order to obtain an AI system, called a service, that performs the task.

A service is an AI system that delivers bounded results for some task using bounded resources in bounded time. Superintelligent language translation would count as a service, even though it requires a very detailed understanding of the world, including engineering, history, science, etc. Episodic RL agents also count as services.

While each of the AI R&D subtasks is currently performed by a human, as AI progresses we should expect that we will automate these tasks as well. At that point, we will have automated R&D, leading to recursive technological improvement. This is not recursive self-improvement, because the improvement comes from R&D services creating improvements in basic AI building blocks, and those improvements feed back into the R&D services. All of this should happen before we get any powerful AGI agents that can do arbitrary general reasoning.

Why Comprehensive?

Since services are focused on particular tasks, you might think that they aren't general intelligence, since there would be some tasks for which there is no service. However, pretty much everything we do can be thought of as a task -- including the task of creating a new service. When we have a new task that we would like automated, our service-creating-service can create a new service for that task, perhaps by training a new AI system, or by taking a bunch of existing services and putting them together, etc. In this way, the collection of services can perform any task, and so as an aggregate is generally intelligent. As a result, we can call this Comprehensive AI Services, or CAIS. The "Comprehensive" in CAIS is the analog of the "General" in AGI. So, we'll have the capabilities of an AGI agent, before we can actually make a monolithic AGI agent.

Isn't this just as dangerous as AGI?

You might argue that each individual service must be dangerous, since it is superintelligent at its particular task. However, since the service is optimizing for some bounded task, it is not going to run a long-term planning process, and so it will not have any of the standard convergent instrumental subgoals (unless the subgoals are helpful for the task before reaching the bound).

In addition, all of the optimization pressure on the service is pushing it towards a particular narrow task. This sort of strong optimization tends to focus behavior. Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task. Think of how a racecar is optimized for speed, while a bus is optimized for carrying passengers, rather than having a "generally capable vehicle".

It's also worth noting what we mean by superintelligent here. In this case, we mean that the service is extremely competent at its assigned task. It need not be learning at all. We see this distinction with RL agents -- when they are trained using something like PPO, they are learning, but at test time you can simply execute them without any PPO and they will perform the behavior they previously learned and won't change that behavior at all.

(My opinion: I think this isn't engaging with the worry with RL agents -- typically, we're worried about the setting where the RL agent is learning or planning at test time, which can happen in learn-to-learn and online learning settings, or even with vanilla RL if the learned policy has access to external memory and can implement a planning process separately from the training procedure.)

On a different note, you might argue that if we analyze the system of services as a whole, then it certainly looks generally intelligent, and so should be regarded as an AGI agent. However, "AGI agent" usually carries the anthropomorphic connotation of VNM rationality / expected utility maximization / goal-directedness. While it seems possible and even likely that each individual service can be well-modeled as VNM rational (albeit with a bounded utility function), it is not the case that a system of VNM rational agents will itself look VNM rational -- in fact, game theory is all about how systems of rational agents have weird behavior.

In addition, there are several aspects of CAIS that make it more safe than a classic monolithic AGI agent. Under CAIS, each service interacts with other services via clearly defined channels of communication, so that the system is interpretable and transparent, even though each service may be opaque. We can reason about what information is present in the inputs to infer what the service could possibly know. We could also provide access to some capability through an external resource during training, so that the service doesn't develop that capability itself.

This interpretability allows us to monitor the service -- for example, we could look at which subservices it accesses in order to make sure it isn't doing anything crazy. But what if having a human in the loop leads to unacceptable delays? Well, this would only happen for deployed applications, where having a human in the loop seems expected, and should also be economically incentivized because it leads to better behavior. Basic AI R&D can continue to be improved autonomously without a human in the loop, so you could still see an intelligence explosion. Note that tactical tasks requiring quick reaction times probably would be delegated to AI services, but the important strategic decisions could still be left in human hands (assisted by AI services, of course).

What happens when we create AGI?

Well, it might not be valuable to create an AGI. We want to perform many different tasks, and it makes sense for these to be done by diverse services. It would not be competitive to include all capabilities in a single monolithic agent. This is analogous to how specialization of labor is a good idea for us humans.

(My opinion: It seems like the lesson of deep learning is that if you can do something end-to-end, that will work better than a structured approach. This has happened with computer vision, natural language processing, and seems to be in the process of happening with robotics. So I don't buy this -- while it seems true that we will get CAIS before AGI since structured approaches tend to be available sooner and to work with less compute, I expect that a monolithic AGI agent would outperform CAIS at most tasks once we can make one.)

That said, if we ever do build AGI, we can leverage the services from our CAIS-world in order to make it safe. We could use superintelligent security services to constrain any AGI agent that we build. For example, we could have services trained to identify long-term planning processes and to perform adversarial testing and red teaming.

Safety in the CAIS world

While CAIS suggests that we will not have AGI agents, this does not mean that we automatically get safety. We will still have AI systems that take high impact actions, and if they take even one wrong action of this sort it could be catastrophic. One way this could happen is if the system of services starts to show agentic behavior -- our standard AI safety work could apply to this scenario.

In order to ensure safety, we should have AI safety researchers figure out and codify the best development practices that need to be followed. For example, we could try to always use predictive models of human (dis)approval as a sanity check on any plan that is being enacted. We could also train AI services that can adversarially check new services to make sure they are safe.


The CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D. This reframes the problems of AI safety and has implications for what technical safety researchers should be doing.

ETA: This comment provides updates based on more discussion with Eric.

New Comment
77 comments, sorted by Click to highlight new comments since: Today at 3:56 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

This is one of the documents I was responding to when I wrote A general model of safety-oriented AI development, Three AI Safety Related Ideas, and Two Neglected Problems in Human-AI Safety. (I didn't cite it because it was circulating semi-privately in draft form, and Eric apparently didn't want its existence to be publicly known.) I'm disappointed that although Eric wrote to me "I think that your two neglected problems are critically important", the perspectives in those posts didn't get incorporated more into the final document, which spends only 3 short paragraphs out of hundreds of pages to talk about what I think of as "human safety problems". (I think those paragraphs were in the draft even before I wrote my posts.)

I worry about the framing adopted in this document that the main problem in human-AI safety is "questions of what humans might choose to do with their capabilities", as opposed to my preferred framing of "how can we design human-AI systems to minimize total risk". (To be fair to Eric, a lot of other AI safety people also only talk about "misuse risk" and not about how AI is by default likely to exacerbate human safety problems, e.g., by causing rapid distributiona

... (read more)
5Rohin Shah5y
I actually think the CAIS model gives me optimism for these sorts of problems. As long as we acknowledge that the problems exist and can be an issue, we could develop services that help us mitigate them. Safety in the CAIS world already depends on having services that are in charge of good engineering, testing, red teaming, monitoring, etc., as well as services that evaluate objectives and make sure humans would approve of them. It seems fairly easy to expand this to include services that consider how disruptive new technologies will be, how underdetermined human values are, whether a proposed plan reduces option value, what risk aversion implies about a particular plan of action, what blind spots people have, etc. I'd be interested in a list of services that you think would be helpful for addressing human safety problems. You might think of this as "our best current guess at metaphilosophy and metaphilosophy research". (I know you were mainly talking about the document's framing, I don't have much to say about that.)

It seems fairly easy to expand this to include services that consider how disruptive new technologies will be, how underdetermined human values are, whether a proposed plan reduces option value, what risk aversion implies about a particular plan of action, what blind spots people have, etc.

Can you explain how you'd implement these services? Take "how disruptive new technologies will be" for example. I imagine you can't just apply ML given the paucity of training data and how difficult it would be to generalize from historical data to new technologies and new social situations. And it seems to me that if you base it on any kind of narrow AI technology, it would be easy to miss some of the novel implications/consequences of the new technologies and social situations and end up with a wrong answer. Maybe you could instead base it on a general purpose reasoner or question-answerer, but if something like that exists, AI would already have created a lot of new technologies that are risky for humans to face. Plus, the general purpose AI could replace a lot of discrete/narrow AI services, so I feel like we would already have moved past the CAIS world at that point. BTW, if the service i

... (read more)
Can you explain how you'd implement these services?

Not really. I think of CAIS as suggesting that we take an outside view that says "looking at how AI has been progressing, and how humans generally do things, we'll probably be able to do more and more complex tasks as time goes on". But the emphasis that CAIS places is that the things we'll be able to do will be domain-specific tasks, rather than getting a general-purpose reasoner. I don't have a detailed enough inside view to say how complex tasks might be implemented in practice.

I agree with the rest of what you said, which feels to me like considering a few possible inside-view scenarios and showing that they don't work.

One way to think about this is through the lens of iterated amplification. With iterated amplification, we also get the property that our AI systems will be able to do more and more complex tasks as time goes on. The key piece that enables this is the ability to decompose problems, so that iterated amplification always bottoms out into a tree of questions and subquestions down to leaves which the base agent can answer. You could think of (my conception of) CAIS as a claim that a... (read more)

8Wei Dai5y
This seems like a sensible way of looking at things, and in this framing I'd say that my worry is that crucial safety-enhancing services may only appear fairly high in the overall tree of services, or outside the tree altogether (see also #3 in Three AI Safety Related Ideas which makes a similar point), and in the CAIS world it would be hard to limit access to the lower-level services (as a risk-reduction measure).
2Rohin Shah5y
Yeah, that seems right, I don't think anyone is arguing against that claim.

I have a problem with section 32, "Unaligned superintelligent agents need not threaten world stability". Here's the summary of that section from the paper:

  • Powerful SI-level capabilities can precede AGI agents.
  • SI-level capabilities could be applied to strengthen defensive stability.
  • Unopposed preparation enables strong defensive capabilities.
  • Strong defensive capabilities can constrain problematic agents.

So the key idea here seems to be that good actors will have a period of time to use superintelligent AI services to prepare some sort of ubiquitous defense that will constrain any subsequent AGI agents. But I don't understand where this period of "unopposed preparation" comes from. Why wouldn't someone create an AGI by cobbling together a bunch of AI services, or hire a bunch of AI services to help them design an AGI, as soon as they could? If they did that, then superintelligent AGI agents would arise nearly simultaneously with SI-level capabilities, and there would be no such period of unopposed preparation. In section 32.2, Eric only argues that SI-level capabilities can precede AGI agents. Since I think they wouldn't at least not by a significant margin, the whole argumen

... (read more)
Why wouldn't someone create an AGI by cobbling together a bunch of AI services, or hire a bunch of AI services to help them design an AGI, as soon as they could?

Because any task that an AGI could do, CAIS could do as well. (Though I don't agree with this -- unified agents seem to work better.)

But if quickly building an AGI can potentially allow someone to take over the world before "unopposed preparation" can take place, isn't that a compelling motivation by itself for many people?

I suspect he would claim that quickly building an AGI would not allow you to take over the world, because the AGI would not be that much more capable than the CAIS service cluster.

It may be the case that people try to take over the world just with CAIS, and maybe that could succeed. I think he's arguing only against AGI accident risk here, not against malicious uses of AI. (I think you already knew that, but it wasn't fully clear on reading your comment.)

I suspect he would claim that quickly building an AGI would not allow you to take over the world, because the AGI would not be that much more capable than the CAIS service cluster.

That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32. If that was his position, he could just talk about how ordinary policing and military defense would work in a CAIS world (i.e., against human adversaries wielding CAIS) and say that the same policing/defense would also work against AGI because AGI is not much more capable than CAIS.

Instead it seems clear that he thinks AGI requires special effort to defend against, which is made possible by a delay between SI-level CAIS and AGI, which he proposes that we use to do a very extensive "unopposed preparation". I've been trying to figure out why he thinks there will be such a delay and my current best guess is "Implementation of the AGI model is widely regarded as requiring conceptual breakthroughs." (page 75) which he repeats on page 77, "AGI (but not CAIS) calls for conceptual breakthr

... (read more)
Do you get it?

I doubt I will ever be able to confidently answer yes to that question.

That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32.

My model is that he does think AGI won't be much more capable than CAIS (see sections 12 and 13 in particular, and 10, 11 and 16 also touch on the topic), but lots of people (including me) kept making the argument that end-to-end training tends to improve performance and so AGI would outperform CAIS, and so he decided to write a response to that.

In general, my impression from talking to him and reading earlier drafts is that the earlier chapters are representative of his core models, while the later chapters are more like responses to particular arguments, or specific implications of those models.

I can give one positive argument for AGI being harder to make than SI-level CAIS. All of our current techniques for building AI systems create things that are bounded in the time horizon they are optimizing over. It's actually quite unclear how we would use current techniques ... (read more)

6Wei Dai5y
Thanks, I think this is helpful for me to understand Eric's model better, but I'm still pretty confused. But it's quite unclear how to use current techniques to do a lot of things. Why should we expect that this conceptual breakthrough would come later than other conceptual breakthroughs needed to achieve CAIS? (Given your disagreement with Eric on this, I guess this is more a question for him than for you.) I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI. I don't see why it wouldn't, unless these services are specifically designed to be corrigible (in which case the "corrigible" part seems much more important than the "service" part). For example, suppose you asked the plan maker to create a plan to cure cancer. Why would the mere fact that it's a bounded service prevent it from coming up with a plan that involves causing human extinction (and a bunch of convergent instrumental subgoals like deceiving humans who might stop it)? (If there was a human in the loop, then you could look at the plan and reject it, but I'm imagining that someone, in order to build an AGI as quickly and efficiently as possible, stripped off the "optimize for human consumption" part of the strategic planner and instead optimized it to produce plans for direct machine consumption.)
7Rohin Shah5y
I think I share Eric's intuition that this problem is hard in a more fundamental way than other things, but I don't really know why I have this intuition. Some potential generators: * ML systems seem to be really good at learning tasks, but really bad at learning explicit reasoning. I think of CAIS as being on the side of "we never figure out explicit reasoning at the level that humans do it", and making up for this deficit by having good simulators that allow us to learn from experience, or by collecting much more data across multiple instances of AI systems, or by trying out many different AI designs and choosing the one which performs best. * It seems like humans tend to build systems by making individual parts that we can understand and predict well, and putting those together in a way where we can make some guarantees/predictions about what will happen. CAIS plays to this strength, whereas "figure out how to do very-long-term-planning" doesn't. Yeah, you're right, I definitely said the wrong thing there. I guess the difference is that the convergent instrumental subgoals are now "one level up" -- they aren't subgoals of the AI service itself, they're subgoals of the plan that was created by the AI service. It feels like this is qualitatively different and easier to address, but I can't really say why. More generators: * In this setting, convergent instrumental subgoals happen only if the plan-making service is told to maximize outcomes. However, since it's one level up, it should be easier to ask for something that says something more like "do X, interpreted pragmatically and not literally". * Things that happen one level up in the CAIS world are easier to point at and more interpretable, so it should be easier to find and fix issues of this sort. (You could of course say "just because it's easier that doesn't mean people will do it", but I could imagine that if its easy enough this becomes best practice and people do it by default, and you don't actual
5Wei Dai5y
Unfortunately, I only vaguely understand the points that you're trying to make in this comment... Would it be fair to just say at this point that this is an important crux that Eric failed to convincingly argue for?
7Rohin Shah5y
I agree that it's an important crux, and that the arguments are not sufficiently strong that everyone should believe Eric's position. I do think that he has provided arguments that support his position, though they are in a different language/ontology than is usually used here.
4Wei Dai5y
Ah, ok, what sections would you suggest that I (re)read to understand his arguments better? (You mentioned 12, 13, 10, 11 and 16 earlier in this thread but back then we were talking about "AGI won’t be much more capable than CAIS" and here the topic is whether we should expect AGI to come later than CAIS or require harder conceptual breakthroughs.)
4Rohin Shah5y
I quickly skimmed the table of contents to generate this list, so it might have both false positives and false negatives. Section 1: We typically make progress using R&D processes; this can get us to superintelligence. Implicitly also makes the claim that this is qualitatively different from AGI, though doesn't really argue for that. Section 8: Optimization pressure points away from generality, not towards it, which suggests that strong optimization pressure doesn't give you AGI. Section 12.6: AGI and CAIS solve problems in different ways. (Combined with the claim, argued elsewhere: CAIS will happen first.) Section 13: AGI agents are more complex. (Implicit claim: and so harder to build.) Section 17: Most complex tasks involve several different subtasks that don't interact much; so you get efficiency and generality gains by splitting the subtasks up into separate services. Section 38: Division of labor + specialization are useful for good performance.
4Wei Dai5y
Most of these sections seem to only contain arguments that AGI won't come earlier than CAIS, but not that it would come later than CAIS. In other words, they don't argue against the likelihood that under CAIS someone can easily build an AGI by connecting existing AI services together in a straightforward way. The only section I can find among the ones you listed that tries to argue in this direction is Section 13, but even it mostly just argues that AGI isn't simpler than CAIS, and not that it's more complex, except for this paragraph in the summary, Section 13.5: So putting alignment aside (I'm assuming that someone would be willing to build an unaligned AGI if it's easy enough), the only argument Eric gives for greater complexity of AGI vs CAIS is "must be integrated into a single, autonomous, self-modifying agent", but why should this integration add a non-negligible amount of complexity? Why can't someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed? (I think your argument that strategic planning may be one of the last AIS to arrive is plausible, but it doesn't seem to be an argument that Eric himself makes.) Where is the additional complexity coming from?
3Rohin Shah5y
I think Eric would not call that an AGI agent. Setting aside what Eric thinks and talking about what I think: There is one conception of "AGI risk" where the problem is that you have an integrated system that has optimization pressure applied to the system as a whole (similar to end-to-end training) such that the entire system is "pointed at" a particular goal and uses all of its intelligence towards that. The goal is a long-term goal over universe-histories. The agent can be modeled as literally actually maximizing the goal. These are all properties of the AGI itself. With the system you described, there is no end-to-end training, and it doesn't seem right to say that the overall system is aimed at a long-term goal, since it depends on what you ask the plan maker to do. I agree this does not clearly solve any major problem, but it does seem markedly different to me. I think that Eric's conception of "AGI agent" is like the first thing I described. I agree that this is not what everyone means by "AGI", and it is particularly not the thing you mean by "AGI". You might argue that there seems to be no effective safety difference between an Eric-AGI-agent and the plan maker + plan executor. The main differences seem to be about what safety mechanisms you can add -- such as looking at the generated plan, or using human models of approval to check that you have the right goal. (Whereas an Eric-AGI-agent is so opaque that you can't look at things like "generated plans", and you can't check that you have the right goal because the Eric-AGI-agent will not let you change its goal.) With an Eric-AGI-agent, if you try to create a human model of approval, that would need to be an Eric-AGI-agent itself in order to effectively supervise the first Eric-AGI-agent, but in that case the model of approval will be literally actually maximizing some goal like "be as accurate as possible", which will lead to perverse behavior like manipulating humans so that what they approve is easi
That's not consistent with my understanding of section 27. My understanding is that Drexler would describe that as too dangerous. I suspect that a problem here is that "plan maker" is ambiguous as to whether it falls within Drexler's notion of something with a bounded goal. CAIS isn't just a way to structure software. It also requires some not-yet-common sense about what goals to give the software. "Cure cancer" seems too broad to qualify as a goal that Drexler would consider safe to give to software. Sections 27 and 28 suggest that Drexler wants humans to break that down into narrower subtasks. E.g. he says:
After further rereading, I now think that what Drexler imagines is a bit more complex: (section 27.7) "senior human decision makers" would have access to a service with some strategic planning ability (which would have enough power to generate plans with dangerously broad goals), and they would likely restrict access to those high-level services. I suspect Drexler is deliberately vague about the extent to which the strategic planning services will contain safeguards. This, of course, depends on the controversial assumption that relatively responsible organizations will develop CAIS well before other entities are able to develop any form of equally powerful AI. I consider that plausible, but it seems to be one of the weakest parts of his analysis. And presumably the publicly available AI services won't be sufficiently general and powerful to enable random people to assemble them into an agent AGI? Combining a robocar + Google translate + an aircraft designer + a theorem prover doesn't sound dangerous. But I'd prefer to have something more convincing than just "I spent a few minutes looking for risks, and didn't find any".
4Rohin Shah5y
Fwiw, by my understanding of CAIS and my definition of a service here as "A service is an AI system that delivers bounded results for some task using bounded resources in bounded time", a plan maker would qualify as a service. So every time I make claims about "services" I intend for those claims to apply to plan makers as well. I have tried to use words the same way that Drexler does, but obviously I can't know exactly what he meant.

Eric and I have exchanged a few emails since I posted this summary, I'm posting some of it here (with his permission), edited by me for conciseness and clarity. The paragraphs in the quotes are Eric's, but I have rearranged his paragraphs and omitted some of them for better flow in this comment.

There is a widespread intuition that AGI agents would by nature be more integrated, flexible, or efficient than comparable AI services. I am persuaded that this is wrong, and stems from an illusion of simplicity that results from hiding mechanism in a conceptually opaque box, a point that is argued at some length in Section 13.
Overall, I think that many of us have been in the habit of seeing flexible optimization itself as problem, when optimization is instead (in the typical case) a strong constraint on a system’s behavior (see Section 8). Flexibility of computation in pursuit of optimization for bounded tasks seems simply useful, regardless of planning horizon, scope of considerations, or scope of required knowledge.

I agree that AGI agents hide mechanism in an opaque box. I also agree that the sort of optimization that current ML does, which is very task-focused, is a strong cons... (read more)

8Wei Dai5y
Can you summarize this exchange, especially what updates you made as a result of it, if any?

That was the summary :P The full thing was quite a bit longer. I also didn't want to misquote Eric.

Maybe the shorter summary is: there are two axes which we can talk about. First, will systems be transparent, modular and structured (call this CAIS-like), or will they be opaque and well-integrated? Second, assuming that they are opaque and well-integrated, will they have the classic long-term goal-directed AGI-agent risks or not?

Eric and I disagree on the first one: my position is that for any particular task, while CAIS-like systems will be developed first, they will gradually be replaced by well-integrated ones, once we have enough compute, data, and model capacity.

I'm not sure how much Eric and I disagree on the second one: I think it's reasonable to predict that the resulting systems are specialized for particular bounded tasks and so won't be running broad searches for long-term plans. I would still worry about inner optimizers; I don't know what Eric thinks about that worry.

This summary is more focused on my beliefs than Eric's, and is probably not a good summary of the intent behind the original comment, which was "what does Eric think Rohin got wrong in his summary + opinion of CAIS", along with some commentary from me trying to clarify my beliefs.

Updates were mainly about actually carving up the space in the way above. Probably others, but I often find it hard to introspect on how my beliefs are updating.

I don't understand why this crux needs to be dichotomous. Setting aside the opacity question for the moment, why can't services in a CAIS be differentiable w.r.t. each other? Example Consider a language modeling service (L) that is consumed by several downstream tasks, including various text classifiers, an auto-correction service for keyboards, and a machine translation service. In the end-to-end view, it would be wise for these downstream services to use a language representation from L and to propagate their own error information back to L so that it can improve its shared representation. Since the downstream services ultimately make up L's raison d'etre, it will be obliged to do so. For situations that are not so neatly differentiable, we can describe the services network as a stochastic computation graph if there is a benefit for end-to-end learning the entire system. This should lead to a slightly more precise conjecture about the relationship between the CAIS agent and utility-maximizing agent: A CAIS agent that can be described as a stochastic computation graph is equivalent to some utility-maximizing agent when trained end-to-end via approximate backpropagation. It's likely that CAIS agents aren't usefully described as stochastic computation graphs, or that we may need to extend the usage of "stochastic computation graph" here to deal with services that create other services as offspring and attach them to the graph. But the possibility itself suggests a spectrum between the archetypal modular CAIS and an end-to-end CAIS, in which subgraphs of the services network are trained end-to-end. It's not obvious to me that the CAIS as defined in the text discounts this scenario, despite Eric's comments here.
3Rohin Shah5y
I broadly agree, especially if you set aside opacity; I very rarely mean to imply a strict dichotomy. I do think in the scenario you outlined the main issue would be opacity: the learned language representation would become more and more specialized between the various services, becoming less interpretable to humans and more "integrated" across services.
One way to test the "tasks don't overlap" idea is to have two nets do two different tasks, but connect their internal layers. Then see how high the weights on those layers get. Like, is the internal processing done by Mario AI useful for Greek translation at all? If it is then backprop etc should discover that.

Promoted to curated: I think the linked document is one of the most interesting things to be written in AI Alignment in the last year, and this is the best summary and commentary of it that currently exists. Quality wise, I think everything that I have to say has already been covered by the other commenters, but I overall found reading the linked document, as well as this summary, to be quite helpful in my thinking about AI Alignment, though I also disagree with large parts of it (However, I am not at the research level, and so have a harder time judging how useful it would be for the people who are spending even more time thinking about AI Alignment).

Thanks a lot for writing this summary, and thanks a lot to Eric for all the work he is doing.

I want to draw separate attention to chapter 40 of Drexler's paper, which uses what looks like a novel approach to argue that current supercomputers likely have more raw processing power than a human brain. I find that scary.

From the conclusion of that section:

Many modern AI tasks, although narrow, are comparable to narrow capacities of neural systems in the human brain. Given an empirical value for the fraction of computational resources required to perform that task with humanlike throughput on a 1 PFLOP/s machine, and an inherently uncertain and ambiguous—yet bounded—estimate of the fraction of brain resources required to perform “the equivalent” of that machine task, we can estimate the ratio of PFLOP/s machine capacity to brain capacity. What are in the author’s judgment plausible estimates for each task are consistent in suggesting that this ratio is ~10 or more. Machine learning and human learning differ in their relationship to costs, but even large machine learning costs can be amortized over an indefinitely large number of task-performing systems and application events.
In light of these considerations, we should expect that substantially superhuman computational capacity will accompany the eventual emergence of a software with broad functional competencies. On present evidence, scenarios that assume otherwise seem unlikely.

I'm not completely sure I'm understanding the first paragrap... (read more)

Late to the party but I'm pretty confident he's saying the opposite - that a 1 PFLOP/s system is likely to have 10 or more times the computational capacity of the human brain, which is rather terrifying. He gives the example of Baidu's Deep Speech 2 which requires around 1 GFLOP/s to run and produces human-comparable results. This is 10^6 slower than the 1 PFLOP/s machine. He estimates that this process in humans take around 10^-3 of the human brain, thereby giving the estimate of a 1 PFLOP/s system being 10^3 times faster than the brain. His other examples give similar results.
5Adrià Garriga-alonso5y
Yes, though I'm fairly sure he's talking about using trained neural networks to e.g. classify an image, which is known to be fairly cheap, rather than training them. In other words, he's talking about using an AI service rather than creating one. He also says that "Machine learning and human learning differ in their relationship to costs" which is also evidence for my interpretation: training is expensive, testing on one example is very cheap.

I trust past-me to have summarized CAIS much better than current-me; back when this post was written I had just finished reading CAIS for the third or fourth time, and I haven't read it since. (This isn't a compliment -- I read it multiple times because I had a lot of trouble understanding it.)

I've put in two points of my own in the post. First:

(My opinion: I think this isn't engaging with the worry with RL agents -- typically, we're worried about the setting where the RL agent is learning or planning at test time, which can happen in learn-to-learn and on

... (read more)

I disagree outright with

Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task.

Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!

And the deeper reason for that is that we have no idea how to tell what's a hole.

Suppose you want to set the service generator to make a robot that cleans cars. If you give a blow b... (read more)

It seems like the important thing is how bounded the task is. For example, in the case of Go, if you just kept training AlphaZero, would you expect it to eventually decide that it needs to break out into the physical world to get more computing power? It seems to me that it could get to be ultra-super-human at Go without that happening. (Even if there is some theoretical threshold where, with enough computation, it couldn't help but stumble upon a sequence of moves that causes the program to crash. It seems to me that you're likely to get crashing behavior long before you get hack-out-of-the-vm behavior, and the threshold for either may be too high to matter.) If that's true for Go, then the questions are: 1. How much less bounded of a task can you train a system to do while maintaining the focused-on-the-task property? and 2. How general of a system can you make by composing such focused systems together?
5Rohin Shah5y
Note that under the CAIS worldview, in order to be competent in some domain you need to have some experience in that domain (i.e. competence requires learning). Or at least, that's the worldview under which I find CAIS most compelling. In that case, the AI would have had to try breaking out of the box a few times in order to get good at it, and why would it do that? Even if it ever hit upon this plan, whenever it tried it for the first time it would get a gradient pushing that behavior away, since it didn't help with achieving the goal. Only after significant learning would it be able to execute these weird plans in a way that they actually succeed and help achieve the goal, and that significant learning will not happen. CAIS would definitely use human preference information, see eg. section 22. It's not really an approach to AI safety, it's mostly meant to be a different prediction about how we achieve superintelligence. (There are definitely some prescriptive aspects of CAIS, and some arguments that it is safer than AGI agents, but mostly it is meant to be descriptive, I believe.)
3Donald Hobson5y
Any algorithm that gets stuck in local optimum so easily will not be very intelligent or very useful. Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully. We don't get stuck in local optima as much as current RL algorithms. AIXI would be very good at making complex plans and doing well first time. You could tell it the rules of chess and it would play PERFECT chess first time. It does not need lots of examples to work from. Give it any data that you happen to have available, and it will become very competent, and able to carry out complex novel tasks first time. Current reinforcement learning algorithms aren't very good at breaking out of boxes because they follow the local incentive gradient. (I say not very good at, because a few algorithms have exploited glitches in a way thats a bit "break out the boxish") In some simple domains, its possible to follow the incentive gradient all the way to the bottom. In other environments, human actions already form a good starting point, and following the incentive gradient from there can make the solution a bit better. I agree that most of the really dangerous break out the boxes probably can't be reached by local gradient decent from a non adversarial starting point. (I do not want to have to rely on this) I agree that you can attach loads of sensors to say postmen, and train a big neural net to control a humanoid robot to deliver letters, given millions of training examples. You can probably automate many of the training weight fiddling tasks currently done by grad student descent to make big neural nets work. I agree that this could be somewhat useful economically, as a significant proportion of economic productivity could be automated. What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can't protect us from making an unfriendly AGI. I'm also not sure
4Rohin Shah5y
Agreed, I claim we have no clue at how to make anything remotely like AIXI in the real world. Agreed, in a CAIS world, the system of interacting services would probably notice the plan but not execute it because of some service that is meant to prevent it from doing crazy things that humans would not want. This definitely seems like the crux for many people. I'm quite unsure about this point; it seems plausible to me that CAIS could in fact do most things such that there aren't very large incentives, especially if the Factored Cognition hypothesis is true. I don't see why it would have to be little tweaks to existing algorithms, it seems plausible to have the R&D services consider entirely new algorithms as well.

As a note, I belive that FHI is planning to publish a(n edited?) version of this document as an actual book ala Superintelligence: Paths, Dangers, Strategies.

After reading the post and some of these comments (including this one) it was unclear to me whether FHI had actually intended to make this public yet. It seems that in fact they have:
4Rohin Shah5y
It's linked in the first sentence of the post. Though I guess I link to the pdf instead of the web page. I tried to make this a link post, but I got an error message saying that it has already been linked before.
Yeah, saw the link, but since it was direct to the pdf, wasn't sure if there'd been an announcement or anything like that. (Perhaps I should have enough trust in FHI that if a link is accessible then that's intentional. Not something you can count on in general though. :P)
The restriction on having multiple linkposts to the same URL is something we inherited from our framework (Vulcan), which doesn't particularly make sense for LW. We've taken it out, so you'll be able to make the linkpost after the next time we deploy an update (which will be later this week).
I also just went in and appended some random URL parameters to the URL to avoid the duplication filter for now.

Upvoted. I've long thought that Drexler's work is a valuable contribution to the debate that hasn't received enough attention so far, so it's great to see that this has now been published.

I am very sympathetic to the main thrust of the argument – questioning the implicit assumption that powerful AI will come in the shape of one or more unified agents that optimise the outside world according to their goals. However, given our cluelessness and the vast range of possible scenarios (e.g. ems, strong forms of biological enhancement, mergin... (read more)

8Rohin Shah5y
That seems right. I would argue that CAIS is more likely than any particular one of the other scenarios that you listed, because it is primarily taking trends from the past and projecting them into the future, whereas most other scenarios require something qualitatively new -- eg. an AGI agent (before CAIS) would happen if we find the one true learning algorithm, ems require us to completely map out the brain in a way that we don't have any results for currently, even in simple cases like C. elegans. But CAIS is probably not more likely than a disjunction over all of those possible scenarios.
I think generality and goal-directedness are likely orthogonal attributes. A "one true learning algorithm" sounds very general, but a priori I don't expect it to be any more goal-directed than the comprehensive AI services idea outlined in this post. I suspect you can take each of your comprehensive AI services and swap out the specific algorithm you were using for a one true learning algorithm without making the result any more of an agent. I'm thinking about it something like this: * Traditional view of superintelligent AI ("top-down"): A superintelligent AI is something that's really good at achieving arbitrary goals. We abstract away the details of its implementation and view it as a generic hyper-competent goal achievement process, with a wide array of actions & strategies at its disposal. This view potentially lets us do FAI research without having to contribute to AI progress or depend overmuch on any particular direction that AI capabilities development proceeds in. * CAIS ("bottom-up"): We have a collection of AI services. We can use these services to accomplish specific tasks, including maybe eventually generating additional services. Each service represents a specific algorithm that achieves superior performance along one or more dimensions in a narrow or broad range of circumstances. If we abstract away the details of how tasks are being accomplished, that may lead to an inaccurate view of the system's behavior. For example, our machine learning algorithms may get better and better at performing classification tasks... but we have to look into the details of how the algorithm works in order to figure out whether it will consider strategies for improving its classification ability such as "pwn all other servers in the cluster and order them to search the space of hyperparameters in parallel". Our classification systems have been getting better and better, and arguably also more general, without them considering strategies like the pwnage strategy, an
6Rohin Shah5y
Mostly agreed, but if we find the one true learning algorithm, then CAIS is no longer on the development path towards AGI agents, and I would predict that someone builds an AGI agent in that world because it could have lots of economic benefits that have not already been captured by CAIS services. I actually see CAIS as an argument against this. I think we could get superintelligent services by having lots of specialization (unlike humans, who are mostly general and a little bit specialized for their jobs), by aggregating learning across many actors (whereas humans can't learn from other humans' experience), by making models much larger and with much more compute (whereas humans are limited by brain size). Humans could still outperform AI services on things like power usage, sample efficiency, compute requirements, etc. while still having lots of AI services that can perform nearly any task at a superhuman level.

There is a discussion at OvercomingBias of this work now.

The CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D.

This conclusion seems similar to the one Paul arrives at here:

In the slow takeoff scenario, pre-AGI systems have a transformative impact that’s only slightly smaller than AGI.

(See also this post from AI Impacts.)

CAIS is a very different take on what transformative AI might look like than the ones I find most intuitive. I think it's really useful to experience a range of different perspectives to break me out of my cached thoughts.

And I'm grateful to Rohin for writing up this summary! I think this kind of thing is a valuable service for spreading these ideas to more people, who don't want to read a 200 page document.

I think the CAIS framing that Eric Drexler proposed gave concrete shape to a set of intuitions that many people have been relying on for their thinking about AGI. I also tend to think that those intuitions and models aren't actually very good at modeling AGI, but I nevertheless think it productively moved the discourse forward a good bit. 

In particular I am very grateful about the comment thread between Wei Dai and Rohin, which really helped me engage with the CAIS ideas, and I think were necessary to get me to my current understanding of CAIS and to ... (read more)

I see a few criticisms about how this doesn't really solve the problem, it only delays it because we expect a unified agent to outperform the combined services.

It seems to me on the basis of that criticism that this is worth driving as a commercial template anyway. Every R&D dollar that goes into a bounded service is one that doesn't drive specifically for an unbounded agent; every PhD doing development an individual service is not doing development on a unified agent.

We're currently still in the regime where first mover advantage is ov... (read more)

4Rohin Shah5y
Not sure if you're talking about me, but I suspect that my criticism could be read that way. Just want to clarify that I do think "we expect a unified agent to outperform the combined services" but I don't think this means we shouldn't pursue CAIS. That strategic question seems hard and I don't have a strong opinion on it.
You were one of them, but not the only one. I thought it was worth pointing the strategic question out specifically, because we have only recently had enough plausible alternatives for there to even be such a question. Granted, the lack of options makes me feel a bit like anime-guy-looks-at-butterfly for alternatives. I agree the strategic question is hard.

What excites me most about Eric's position since I first learned of it is that it provides a framework for safer AI systems that we might otherwise build if we were trying to target AGI. From this perspective it's valuable for setting policy and missions for AI-focused endeavors in such a way that we potentially delay the creation of AGI.

Although it might be argued that this is inevitable (last time I talked to Eric this was the impression that I got; he felt he was laying out some ideas that would happen anyway and was taking the time to explain... (read more)

My main objection to this idea is that it is a local solution, and doesn't have built-in mechanisms to become global AI safety solution, that is, to prevent other AIs creation, which could be agential superintelligences. One can try to make "AI police" as a service, but it could be less effective than agential police.

Another objection is probably Gwern's idea that any Tool AI "wants" to become agential AI.

This idea also excludes the robotic direction in AI development, which will anyway produce agential AIs.

6Rohin Shah5y
If by agent we mean "system that takes actions in the real world", then services can be agents. As I understand it, Eric is only arguing against monolithic AGI agents that are optimizing a long-term utility function and that can learn/perform any task. Current factory robots definitely look like a service, and even the soon-to-come robots-trained-with-deep-RL will be services. They execute particular learned behaviors. If I remember correctly, Gwern's argument is basically that Agent AI will outcompete Tool AI because Agent AI can optimize things that Tool AI cannot, such as its own cognition. In the CAIS world, there are separate services that improve cognition, and so the CAIS services do get the benefit of ever-improving cognition, without being classical AGI agents. But overall I agree with this point (and disagree with Eric) because I expect there to be lots of gains to be had by removing the boundaries between services, at least where possible.
3Adrià Garriga-alonso5y
Recursive self-improvement that makes the intelligence "super" quickly is what makes the misaligned utility actually dangerous, as opposed to dangerous like a, say, current day automatized assembly line. A robot that self-improves would need to have the capacity to control its actuators and also to self-improve. Since none of these capabilities directly depends on the other, each time one of them improves, the improvement is much more likely to be first demonstrated independently of an improvement in the other one. Thus we're likely to already have some experience with self-improving AI, or the recursively improved AI to help us, when we get to dealing with people wanting to build self-improving robots. Even though with advanced AI in hand to help we should maybe still start early on that, it seems more important to get the not-necessarily-and-also-probably-not-robotic AI right.
I meant not that the "robot will self-improve", but that the research in robotics will create AIs which are agential and adapted to act in the real world. Such AIs may start to self-improve later and without robotic body.
3Wei Dai5y
This seems likely to me as well, especially since "service" is by definition bounded and agent is not.
2Rohin Shah5y
Monitoring surveillance in order to see if anyone is breaking rules seems to be quite a bounded task, and in fact is one that we are already in the process of automating (using our current AI systems, which are basically all bounded). Of course, there are lots of other tasks that are not as clear. But to the extent that you believe the Factored Cognition hypothesis, you should believe that we can make bounded services that nevertheless do a very good job.
2Wei Dai5y
That seems true, but if this surveillance monitoring isn't 100% effective, won't you still need an agential police to deal with any threats that manage to evade the surveillance? Or do you buy Eric's argument that we can use a period of "unopposed preparation" to make sure that the defense, even though it's bounded, is still much more capable than any agential threat it might face?
4Rohin Shah5y
Sorry, when I said "there are lots of other tasks that are not as clear", I meant that there are a lot of other tasks relevant to policing and security that are not as clear, such as police to deal with threats that evade surveillance. I think the optimism here comes from our ability to decompose tasks, such that we can take a task that seems to require goal-directed agency (like "be the police") and turn it into a bunch of subtasks that no longer look agential.
I agree that in the long term, agent AI could probably improve faster than CAIS, but I think CAIS could still be a solution. Regardless of how it is aligned, aligned AI will tend to improve slower than unaligned AI, because it is trying to achieve a more complicated goal, human oversight takes time, etc. To prevent unaligned AI, aligned AI will need a head start, so it can stop any unaligned AI while it's still much weaker. I don't think CAIS is fundamentally different in that respect. If the reasoning in the post that CAIS will develop before AGI holds up, then CAIS would actually have an advantage, because it would be easier to get a head start.

So what is he saying? We never need to solve the problem of designing a human-friendly superintelligent agent?

2Rohin Shah5y
I don't think he'd make a strong claim about that, but I wouldn't be surprised if he assigned that possibility significant credence. I assign that possibility relatively low credence. I assign much more credence to the position that we'll never need to solve the problem of designing a human-friendly superintelligent goal-directed agent.

Definition please.


3Yaakov T5mo 

Thanks for the summary! I agree that this is missing some extra consideration for programs that are planning / searching at test time. We normally think of Google Maps as non-agenty, "tool-like," "task-directed," etc, but it's performing a search for the best route from A to B, and capable of planning to overcome obstacles - as long as those obstacles are within the ontology of its map of ways from A to B.

A thermostat is dumber than Google Maps, but its data is more closely connected to the real world (local temperature rather than... (read more)

I consider it important to further clarify the notion of a bounded utility function.

A deployed neural network has a utility function that can be described as outputting a description of the patterns it sees in its most recent input, according to whatever algorithm it's been trained to apply. It's pretty clear to any expert that the neural network doesn't care about anything beyond a specific set of numbers that it outputs.

A neural network that is in the process of being trained is slightly harder to analyze, but essentially the same. It cares about generat

... (read more)
To clarify, when you say "bounded utility function" you mean that it's only defined over a fixed set of inputs, right? (As opposed to meaning that the output of the function is never infinite, as in this post, which is what I first think of when I hear "bounded utility function". In other words, I expected bounded utility to refer to the range of the function, but you seem to be referring to the domain. Not sure which is more standard, but thought it worth calling out for other readers who may be confused.)
4Rohin Shah5y
It sounds like he's talking about services. From the post:
I'm not talking about the range. Domain seems possibly right, but not as informative as I'd like. I'm talking about what parts of spacetime it cares about, and saying that it only cares about specific outputs of a specific process. Drexler refers to this as "bounded scope and duration". Note that this will normally be an implicit utility function, that we infer from our understanding of the system. "bounded utility function" is definitely not an ideal way of referring to this.
You might argue that each individual service must be dangerous, since it is superintelligent at its particular task. However, since the service is optimizing for some bounded task, it is not going to run a long-term planning process [...]

Does this assume that we'll be able to build generally intelligent systems (e.g. the service-creating-service) that optimize for a bounded task?

2Rohin Shah5y
Depends what you mean by "generally intelligent". Any individual service could certainly have deep and broad knowledge about the world (as with eg. a language translation service), but no service will be able to do all tasks (eg. the service-creating-service is not going to be able to edit genomes, except by creating a new service that learns how to edit genomes). With that caveat, yes, this assumes that we'll be able to build services that optimize for bounded tasks. But this is meant more as a description of how existing AI systems already work. Current RL agents are best modeled as optimizing for maximizing reward obtained for the current episode. (This isn't exactly right, because the value function is trying to capture the reward that can be obtained in the future, but in practice this doesn't make much of a difference.)