What success story (or stories) did you have in mind when writing this?

Showing 3 of 21 replies (Click to show all)
9ricraz5dMy thoughts on each of these. The common thread is that it seems to me you're using abstractions at way too high a level to be confident that they will actually apply, or that they even make sense in those contexts. AGIs and economies of scale * Do we expect AGIs to be so competitive that reducing coordination costs is a big deal? I expect that the dominant factor will be AGI intelligence, which will vary enough that changes in coordination costs aren't a big deal. Variations in human intelligence have a huge effect, and presumably variations in AGI intelligence will be much bigger. * There's an obvious objection to giving one AGI all of your resources, which is "how do you know it's aligned"? And this seems like an issue where there'd be unified dissent from people worried about both short-term and long-term safety. * Oh, another concern: if they're all intent aligned to the same person, then this amounts to declaring that person dictator. Which is often quite a difficult thing to convince people to do. * Consider also that we'll be in an age of unprecedented plenty, once we have aligned AGIs that can do things for us. So I don't see why economic competition will be very strong. Perhaps military competition will be strong, but will countries really be converting so much of their economy to military spending that they need this edge to keep up? So this seems possible, but very far from a coherent picture in my mind. Some thoughts on metaphilosophy * These are a bunch of fun analogies here. But it is very unclear to me what you mean by "philosophy" here, since most, or perhaps all, of your descriptions would be equally applicable to "thinking" or "reasoning". The model you give of philosophy is also a model of choosing the next move in the game of chess, and countless other things. * Similarly, what is metaphilosophy, and what would it mean to solve it? Reach a dead end? Be able to answer any question?
2dxu5dI confess to being uncertain of what you find confusing/unclear here. Think of any subject you currently have conflicting moral intuitions about (do you have none?), and now imagine being given unlimited power without being given the corresponding time to sort out which intuitions you endorse. It seems quite plausible to me that you might choose to do the wrong thing in such a situation, which could be catastrophic if said decision is irreversible.

But I can't do the wrong thing, by my standards of value, if my "value system no longer applies". So that's part of what I'm trying to tease out.

Another part is: I'm not sure if Wei thinks this is just a governance problem (i.e. we're going to put people in charge who do the wrong thing, despite some people advocating caution) or a more fundamental problem that nobody would do the right thing.

If the former, then I'd characterise this more as "more power magnifies leadership problems". But maybe it won't, because there's also a much larger space of morally

... (read more)

AGIs as populations

by ricraz 4 min read22nd May 202023 comments


Ω 10

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I think there’s a reasonably high probability that we will end up training AGI in a multi-agent setting. But in that case, we shouldn’t just be interested in how intelligent each agent produced by this training process is, but also in the combined intellectual capabilities of a population of agents. If those agents cooperate, they will exceed the capabilities of any one of them - and then it might be useful to think of the whole population as one AGI. Arguably, on a large-scale view, this is how we should think of humans. Each individual human is generally intelligent in our own right. Yet from the perspective of chimpanzees, the problem was not that any single human was intelligent enough to take over the world, but rather that millions of humans underwent cultural evolution to make the human population as a whole much more intelligent.

This idea isn’t just relevant to multi-agent training though: even if we train a single AGI, we will have strong incentives to copy it many times to get it to do more useful work. If that work involves generating new knowledge, then putting copies in contact with each other to share that knowledge would also increase efficiency. And so, one way or another, I expect that we’ll eventually end up dealing with a “population” of AIs. Let’s call the resulting system, composed of many AIs working together, a population AGI.

We should be clear about the differences between three possibilities which each involve multiple entities working together:

  1. A single AGI composed of multiple modules, trained in an end-to-end way.
  2. The Comprehensive AI Services (CAIS) model of a system of interlinked AIs which work together to complete tasks.
  3. A population AGI as described above, consisting of many individual AIs working together in comparable ways to how a population of humans might collaborate.

This essay will only discuss the third possibility, which differs from the other two in several ways:

  • Unlike the modules of a single AGI, the members of a population AGI are not trained in a centralised way, on a single objective function. Rather, optimisation takes place with respect to the policies of individual members, with cooperation between them emerging (either during training or deployment) because it fits the incentives of individuals.
  • Unlike CAIS services and single AGI modules, the members of a population AGI are fairly homogeneous; they weren’t all trained on totally different tasks (and in fact may start off identical to each other).
  • Unlike CAIS services and single AGI modules, the members of a population AGI are each generally intelligent by themselves - and therefore capable of playing multiple roles in the population AGI, and interacting in flexible ways.
  • Unlike CAIS services and single AGI modules, the members of a population AGI might be individually motivated by arbitrarily large-scale goals.

What are the relevant differences from a safety perspective between this population-based view and the standard view? Specifically, let’s compare a “population AGI” to a single AGI which can do just as much intellectual work as the whole population combined. Here I’m thinking particularly of the most high-level work (such as doing scientific research, or making good strategic decisions), since that seems like a fairer comparison.


We might hope that a population AGI will be more interpretable than a single AGI, since its members will need to pass information to each other in a standardised “language”. By contrast, the different modules in a single AGI may have developed specialised ways of communicating with each other. In humans, language is much lower-bandwidth than thought. This isn’t a necessary feature of communication, though - members of a population AGI could be allowed to send data between each other at an arbitrarily high rate. Decreasing this communication bandwidth might be a useful way to increase the interpretability of a population AGI.


Regardless of the specific details of how they collaborate and share information, members of a population AGI will need structures and norms for doing so. There’s a sense in which some of the “work” of solving problems is done by those norms - for example, the structure of a debate can be more or less helpful in adjudicating the claims made. The analogous aspect of a single AGI is the structure of its cognitive modules and how they interact with each other. However, the structure of a population AGI would be much more flexible - and in particular, it could be redesigned by the population AGI itself in order to improve the flow of information. By contrast, the modules of a single AGI will have been designed by an optimiser, and so fit together much more rigidly. This likely makes them work together more efficiently; the efficiency of end-to-end optimisation is why a human with a brain twice as large would be much more intelligent than two normal humans collaborating. But the concomitant lack of flexibility is why it’s much easier to improve our coordination protocols than our brain functionality.


Suppose we want to retrain an AGI to have a new set of goals. How easy is this in each case? Well, for a single AGI we can just train it on a new objective function, in the same way we trained it on the old one. For a population AGI where each of the members was trained individually, however, we may not have good methods for assigning credit when the whole population is trying to work together towards a single task. For example, a difficulty discussed in Sunehag et al. (2017) is that one agent starting to learn a new skill might interfere with the performance of other agents - and the resulting decrease in reward teaches the first agent to stop attempting the new skill. This would be particularly relevant if the original population AGI was produced by copying an single agent trained by itself - if so, it’s plausible that multi-agent reinforcement learning techniques have lagged behind.


This is a tricky one. I think that a population AGI is likely to be less agentic and goal-directed than a single AGI of equivalent intelligence, because different members of the population may have different goals which push in different directions. However, it’s also possible that population-level phenomena amplify goal-directed behaviour. For example, competition between different members in a population AGI could push the group as a whole towards dangerous behaviour (in a similar way to how competition between companies makes humans less safe from the perspective of chimpanzees). And our lessened ability to fine-tune them, as discussed in the previous paragraph, might make it difficult to know how to intervene to prevent that.

Overall evaluation of population AGIs

I think that the extent to which a population AGI is more dangerous than an equivalently intelligent single AGI will mainly depend on how the individual members are trained (in ways which I’ve discussed previously). If we condition on a given training regime being used for both approaches, though, it’s much less clear which type of AGI we should prefer. It’d be useful to see more arguments either way - in particular because a better understanding of the pros and cons of each approach might influence our training decisions. For example, during multi-agent training there may be a tradeoff between training individual AIs to be more intelligent, versus running more copies of them to teach them to cooperate at larger scales. In such environments we could also try to encourage or discourage them from in-depth communication with each other.

In my next post, I’ll discuss one argument for why population AGIs might be safer: because they can be deployed in more constrained ways.


Ω 10