AGIs as populations

by Richard_Ngo4 min read22nd May 202023 comments


Ω 11

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I think there’s a reasonably high probability that we will end up training AGI in a multi-agent setting. But in that case, we shouldn’t just be interested in how intelligent each agent produced by this training process is, but also in the combined intellectual capabilities of a population of agents. If those agents cooperate, they will exceed the capabilities of any one of them - and then it might be useful to think of the whole population as one AGI. Arguably, on a large-scale view, this is how we should think of humans. Each individual human is generally intelligent in our own right. Yet from the perspective of chimpanzees, the problem was not that any single human was intelligent enough to take over the world, but rather that millions of humans underwent cultural evolution to make the human population as a whole much more intelligent.

This idea isn’t just relevant to multi-agent training though: even if we train a single AGI, we will have strong incentives to copy it many times to get it to do more useful work. If that work involves generating new knowledge, then putting copies in contact with each other to share that knowledge would also increase efficiency. And so, one way or another, I expect that we’ll eventually end up dealing with a “population” of AIs. Let’s call the resulting system, composed of many AIs working together, a population AGI.

We should be clear about the differences between three possibilities which each involve multiple entities working together:

  1. A single AGI composed of multiple modules, trained in an end-to-end way.
  2. The Comprehensive AI Services (CAIS) model of a system of interlinked AIs which work together to complete tasks.
  3. A population AGI as described above, consisting of many individual AIs working together in comparable ways to how a population of humans might collaborate.

This essay will only discuss the third possibility, which differs from the other two in several ways:

  • Unlike the modules of a single AGI, the members of a population AGI are not trained in a centralised way, on a single objective function. Rather, optimisation takes place with respect to the policies of individual members, with cooperation between them emerging (either during training or deployment) because it fits the incentives of individuals.
  • Unlike CAIS services and single AGI modules, the members of a population AGI are fairly homogeneous; they weren’t all trained on totally different tasks (and in fact may start off identical to each other).
  • Unlike CAIS services and single AGI modules, the members of a population AGI are each generally intelligent by themselves - and therefore capable of playing multiple roles in the population AGI, and interacting in flexible ways.
  • Unlike CAIS services and single AGI modules, the members of a population AGI might be individually motivated by arbitrarily large-scale goals.

What are the relevant differences from a safety perspective between this population-based view and the standard view? Specifically, let’s compare a “population AGI” to a single AGI which can do just as much intellectual work as the whole population combined. Here I’m thinking particularly of the most high-level work (such as doing scientific research, or making good strategic decisions), since that seems like a fairer comparison.


We might hope that a population AGI will be more interpretable than a single AGI, since its members will need to pass information to each other in a standardised “language”. By contrast, the different modules in a single AGI may have developed specialised ways of communicating with each other. In humans, language is much lower-bandwidth than thought. This isn’t a necessary feature of communication, though - members of a population AGI could be allowed to send data between each other at an arbitrarily high rate. Decreasing this communication bandwidth might be a useful way to increase the interpretability of a population AGI.


Regardless of the specific details of how they collaborate and share information, members of a population AGI will need structures and norms for doing so. There’s a sense in which some of the “work” of solving problems is done by those norms - for example, the structure of a debate can be more or less helpful in adjudicating the claims made. The analogous aspect of a single AGI is the structure of its cognitive modules and how they interact with each other. However, the structure of a population AGI would be much more flexible - and in particular, it could be redesigned by the population AGI itself in order to improve the flow of information. By contrast, the modules of a single AGI will have been designed by an optimiser, and so fit together much more rigidly. This likely makes them work together more efficiently; the efficiency of end-to-end optimisation is why a human with a brain twice as large would be much more intelligent than two normal humans collaborating. But the concomitant lack of flexibility is why it’s much easier to improve our coordination protocols than our brain functionality.


Suppose we want to retrain an AGI to have a new set of goals. How easy is this in each case? Well, for a single AGI we can just train it on a new objective function, in the same way we trained it on the old one. For a population AGI where each of the members was trained individually, however, we may not have good methods for assigning credit when the whole population is trying to work together towards a single task. For example, a difficulty discussed in Sunehag et al. (2017) is that one agent starting to learn a new skill might interfere with the performance of other agents - and the resulting decrease in reward teaches the first agent to stop attempting the new skill. This would be particularly relevant if the original population AGI was produced by copying an single agent trained by itself - if so, it’s plausible that multi-agent reinforcement learning techniques have lagged behind.


This is a tricky one. I think that a population AGI is likely to be less agentic and goal-directed than a single AGI of equivalent intelligence, because different members of the population may have different goals which push in different directions. However, it’s also possible that population-level phenomena amplify goal-directed behaviour. For example, competition between different members in a population AGI could push the group as a whole towards dangerous behaviour (in a similar way to how competition between companies makes humans less safe from the perspective of chimpanzees). And our lessened ability to fine-tune them, as discussed in the previous paragraph, might make it difficult to know how to intervene to prevent that.

Overall evaluation of population AGIs

I think that the extent to which a population AGI is more dangerous than an equivalently intelligent single AGI will mainly depend on how the individual members are trained (in ways which I’ve discussed previously). If we condition on a given training regime being used for both approaches, though, it’s much less clear which type of AGI we should prefer. It’d be useful to see more arguments either way - in particular because a better understanding of the pros and cons of each approach might influence our training decisions. For example, during multi-agent training there may be a tradeoff between training individual AIs to be more intelligent, versus running more copies of them to teach them to cooperate at larger scales. In such environments we could also try to encourage or discourage them from in-depth communication with each other.

In my next post, I’ll discuss one argument for why population AGIs might be safer: because they can be deployed in more constrained ways.



Ω 11