Current arguments for AGI can be distilled to arguments for specific capabilities, not for generality in itself. We need to examine whether there exists a genuine and sound argument for generality as an independent property.
In Plato's Republic, Glaucon's challenge to Socrates is to show him why justice is good in and of itself; instead of arguing for its instrumentality. In other words, Socrates has to show Glaucon that we value justice itself, not merely for its after-effects:
"For I want to hear what justice and injustice are, and what power each has when it is just by itself in the soul. I want to leave out of account the rewards and the consequences of each of them." (Plato, Republic, 358b-c)
Following Glaucon's spirit, I dare ask: is generality in AI valuable in itself, or do we follow it merely for its expected instrumental effects?
When leading labs say "we're building towards AGI," what do they really mean? If we enumerate all the capabilities they desire (mathematical reasoning, long-horizon tasks, automated R&D and real-world economic tasks, ...) does anything remain in the term AGI after we subtract this list? Or is AGI simply a short name for "all of these capabilities together"?
Most, if not all, pro-generality arguments seem to be reducible to:
It doesn't seem to be wrong, then, to ask whether generality is the name we give to a sufficiently big conjunction of specific capabilities, or whether there is something qualitatively distinct: generality itself.
The subtraction test: If we could have all the specific capabilities that AGI promises, but without 'generality' (whatever that means, maybe we have all the capabilities but in separate, narrow models), would we lose any value?
No one seems to argue that generality has value in itself (as we could argue about consciousness or wellbeing). Why not? Maybe because AI (seemingly) is instrumental by nature. So, why do we want generality? And, is that really what we want?
A general system may be more efficient than maintaining a comprehensive set of narrow systems because:
But there seems to be an implicit assumption here, that is, it assumes that the cost of maintaining generality will be lower than the sum costs of ANIs (costs of development, inference, and maintenance). Is this empirically true? Could we build accurate mathematical cost models?
Currently, foundation models are very expensive to train and operate, and pushing the frontier is not getting any cheaper. Meanwhile, specialized models are much more efficient. So far, it seems that, if we think in terms of cost/benefit, empirical evidence may favor specialized models.
Moreover, this argument also seems to assume that shared representations are necessarily beneficial. Yet, in ML, it is well known that there are many trade-offs. A model aimed at doing everything may suffer from catastrophic forgetting or negative transfer.
Arguments for AGI often conflate two distinct claims about inevitability:
The distinction matters. Socioeconomic inevitability is a governance problem which suggests we need coordination mechanisms. On the other hand, technical inevitability is a scientific claim which suggests generality will emerge whether we coordinate or not.
Let's focus on the technical claim. If this view is correct, then asking "should we build generality" becomes moot. Generality would be an inevitable byproduct of scaling up systems initially designed for narrow tasks (such as next-token-prediction). We wouldn't be necessarily aiming for generality, rather, we'd simply observe its emergence.
But this argument smuggles in a few assumptions:
This argument states that it'll be easier to build AGI and have it solve all other specific problems, than to solve every problem independently. This argument tends to come with the easily-repeated slogan "it'll be our last invention".
Some possible issues with this argument:
One could argue that we cannot know in advance what issues we may need to solve, and that generality gives us that flexibility to respond to unknown unknowns.
Yet this again seems to be an instrumental argument for, say, flexibility or adaptability, not for generality in itself. Moreover, what warrants us to assume that generality equals adaptability?[4] The most adaptable biological systems we know (bacterias) are not the most general.
Perhaps we conflate breadth of capabilities with generality. Consider two systems:
What is more valuable? The answer seems to hinge on whether System B can sustain chains of generalization, using domain X to solve slightly-OOD domain Y, then using that to tackle even-further-OOD domain Z. If yes, then generality represents something genuinely powerful. If not, then System A's breadth may be superior. This latter case would suggest we actually value sufficient breadth, not generality per se.[5]
Paradoxically, the lack of a solid argument for generality in and of itself does not seem to mean we should not keep trying to build AGI. Rather, it means we should be honest about why we are building it. Maybe we are building it not because we see value in generality itself, but because:
This clarity isn't merely for philosophical amusement, it matters for determining research priorities and governance efforts. If we're building towards AGI for instrumental reasons, we should:
I think the fundamental question remains: are we building toward the right target, and do we even know what that target is?
I welcome counterarguments. If there exists a sound intrinsic argument for generality that I've missed, I'd genuinely like to hear it.
I want to thank BlueDot Impact for accepting me into their inagural cohort of "AGI Strategy" where this discussion arose. This post would not exist without their great efforts to build the much needed Safety workforce.
Ganguli et al. (2022) "Emergent Abilities of Large Language Models", Wei et al. (2022) "Predictability and Surprise in Large Generative Models", Hoffmann et al. (2022) "Training Compute-Optimal Large Language Models" (Chinchilla paper)
There may be a good argument to be developed here, if one can successfully argue that adaptability is an intrinsic component of generality, and not a mere after-effect.
This formulation of generality as chainable out-of-distribution transfer draws on work in meta-learning and few-shot transfer learning. See Jiang et al. (2023), Tripuraneni et al. (2022), Sun et al. (CVPR 2019), and Ada et al. (2019) for theoretical foundations on OOD generalization and transfer bounds.