485

LESSWRONG
LW

484
AI
Frontpage

5

Is There a Sound Argument for Generality in AI?

by Juan Cadile
13th Oct 2025
7 min read
0

5

AI
Frontpage

5

New Comment
Moderation Log
More from Juan Cadile
View more
Curated and popular this week
0Comments

Thesis Statement[1]

Current arguments for AGI can be distilled to arguments for specific capabilities, not for generality in itself. We need to examine whether there exists a genuine and sound argument for generality as an independent property.

Introduction

In Plato's Republic, Glaucon's challenge to Socrates is to show him why justice is good in and of itself; instead of arguing for its instrumentality. In other words, Socrates has to show Glaucon that we value justice itself, not merely for its after-effects:

"For I want to hear what justice and injustice are, and what power each has when it is just by itself in the soul. I want to leave out of account the rewards and the consequences of each of them." (Plato, Republic, 358b-c)

Following Glaucon's spirit, I dare ask: is generality in AI valuable in itself, or do we follow it merely for its expected instrumental effects?

Dialectic

The problem of reduction

When leading labs say "we're building towards AGI," what do they really mean? If we enumerate all the capabilities they desire (mathematical reasoning, long-horizon tasks, automated R&D and real-world economic tasks, ...) does anything remain in the term AGI after we subtract this list? Or is AGI simply a short name for "all of these capabilities together"?

Most, if not all, pro-generality arguments seem to be reducible to:

  • "We want adaptability" – which is a specific capability
  • "We want transfer learning" – again, a specific capability
  • "We want to solve multiple issues" – this seems to be a set of specific capabilities

It doesn't seem to be wrong, then, to ask whether generality is the name we give to a sufficiently big conjunction of specific capabilities, or whether there is something qualitatively distinct: generality itself.

The subtraction test: If we could have all the specific capabilities that AGI promises, but without 'generality' (whatever that means, maybe we have all the capabilities but in separate, narrow models), would we lose any value?

The missing argument: intrinsic value

No one seems to argue that generality has value in itself (as we could argue about consciousness or wellbeing). Why not? Maybe because AI (seemingly) is instrumental by nature. So, why do we want generality? And, is that really what we want?

The argument of cognitive economy / from cognitive economics

A general system may be more efficient than maintaining a comprehensive set of narrow systems because:

  • Share representations across domains, reducing redundant learning and enabling knowledge-transfer.
  • Reduces computational cost and redundancy.
  • Allows for emergence of currently unknown capabilities through unexpected transfer.

But there seems to be an implicit assumption here, that is, it assumes that the cost of maintaining generality will be lower than the sum costs of ANIs (costs of development, inference, and maintenance). Is this empirically true? Could we build accurate mathematical cost models?

Currently, foundation models are very expensive to train and operate, and pushing the frontier is not getting any cheaper. Meanwhile, specialized models are much more efficient. So far, it seems that, if we think in terms of cost/benefit, empirical evidence may favor specialized models.

Moreover, this argument also seems to assume that shared representations are necessarily beneficial. Yet, in ML, it is well known that there are many trade-offs. A model aimed at doing everything may suffer from catastrophic forgetting or negative transfer.

The scaling hypothesis and two types of inevitability

Arguments for AGI often conflate two distinct claims about inevitability:

  • Socioeconomic inevitability: "competition forces us," "someone will build it anyway," "it's the next natural step." These are claims about coordination problems and race dynamics, Molochian pressures that make AGI development feel unstoppable regardless of whether it's wise.
  • Technical inevitability: the scaling hypothesis (that model capabilities improve predictably with increased compute, data, and parameters) suggests generality may not be something we choose to pursue, but something that emerges automatically from scaling.

The distinction matters. Socioeconomic inevitability is a governance problem which suggests we need coordination mechanisms. On the other hand, technical inevitability is a scientific claim which suggests generality will emerge whether we coordinate or not.

Let's focus on the technical claim. If this view is correct, then asking "should we build generality" becomes moot. Generality would be an inevitable byproduct of scaling up systems initially designed for narrow tasks (such as next-token-prediction). We wouldn't be necessarily aiming for generality, rather, we'd simply observe its emergence.

But this argument smuggles in a few assumptions:

  • First, the hypothesis doesn't distinguish between different types of generality. Perhaps scaling gives us generality in the functional sense of "can do many things" (breadth) but not generality in the sense of "can chain OOD generalizations indefinitely" (the kind that might lead to recursive self-improvement). These are very different properties with different kinds of implications.
  • Second, it assumes functional generality is what scales. Recent work[2] suggests that different capabilities scale at different rates, and some don't scale predictably at all. What if what actually scales is breadth of capabilities rather than genuine generality? In that case, we'd end up with systems that are impressively capable across many domains without exhibiting the kind of transfer learning that seems to define or capture the functional essence of generality.
  • Lastly, even if generality emerges, this is a descriptive claim about what will happen, not a normative claim about what should happen. In other words, the hypothesis tells us generality might be inevitable, but says nothing about whether or why we should keep building towards it.

The meta-solver argument

This argument states that it'll be easier to build AGI and have it solve all other specific problems, than to solve every problem independently. This argument tends to come with the easily-repeated slogan "it'll be our last invention".

Some possible issues with this argument:

  • Firstly, it seems to assume a solution to (or the falsity of) the "no-free-lunch theorem" which states that no single model can perform optimally across all possible problem domains.[3] If the no free lunch theorem is correct, then a general model will inevitably have some drawbacks in some areas, does this not undermine the idea that AGI will solve all narrow problems?
  • Maybe the road to AGI actually requires that we solve many ANI problems first.
  • This argument focuses on the efficiency of building AGI over ANIs, but it does not seem to state why generality in itself is valuable. In other words, this seems to be another argument for instrumental benefits.
  • Even if a general intelligence could technically solve all specific problems, this argument conflates capability with alignment. A misaligned meta-solver that brilliantly solves technical problems while pursuing goals orthogonal to human values could leave us worse off than multiple well-aligned narrow systems. The meta-solver argument treats alignment as either automatically solved or separable from capability development. Neither assumption is warranted.

The argument from unknown unknowns

One could argue that we cannot know in advance what issues we may need to solve, and that generality gives us that flexibility to respond to unknown unknowns.

Yet this again seems to be an instrumental argument for, say, flexibility or adaptability, not for generality in itself. Moreover, what warrants us to assume that generality equals adaptability?[4] The most adaptable biological systems we know (bacterias) are not the most general.

Breadth or generality?

Perhaps we conflate breadth of capabilities with generality. Consider two systems:

  • System A: 1000 specific capabilities, without transfer between them
  • System B: 100 capabilities that generalize to new domains that are only slightly out of distribution

What is more valuable? The answer seems to hinge on whether System B can sustain chains of generalization, using domain X to solve slightly-OOD domain Y, then using that to tackle even-further-OOD domain Z. If yes, then generality represents something genuinely powerful. If not, then System A's breadth may be superior. This latter case would suggest we actually value sufficient breadth, not generality per se.[5]

Open questions 

  1. Do any benefits attributed to AGI actually require generality, or merely sufficient breadth of capabilities?
  2. Is generality a real property or a convenient abstraction?
  3. If no sound argument exists for generality in itself, should we pivot toward developing the right set of highly-capable narrow systems?
  4. Does this same issue apply to ASI?

Conclusion

Paradoxically, the lack of a solid argument for generality in and of itself does not seem to mean we should not keep trying to build AGI. Rather, it means we should be honest about why we are building it. Maybe we are building it not because we see value in generality itself, but because:

  1. It seems inevitable given current incentives
  2. We believe (maybe incorrectly) that it will be more efficient
  3. We want specific capabilities that we don't yet know how to build, and believe a general system would, in virtue of being general, solve them
  4. The scaling hypothesis suggests generality may emerge whether we aim for it or not

This clarity isn't merely for philosophical amusement, it matters for determining research priorities and governance efforts. If we're building towards AGI for instrumental reasons, we should:

  • Measure progress by the capabilities that matter, not proximity to some abstract notion of generality.
  • Invest heavily in alignment for narrow systems, since "wait until AGI to solve alignment" is not a plan.
  • Question whether scaling toward emergent generality is safer than deliberately engineering the specific capabilities we want.
  • Distinguish between breadth (many capabilities we care about) and generality (chainable OOD transfer), since these have different safety profiles.

I think the fundamental question remains: are we building toward the right target, and do we even know what that target is?

I welcome counterarguments. If there exists a sound intrinsic argument for generality that I've missed, I'd genuinely like to hear it.

  1. ^

    I want to thank BlueDot Impact for accepting me into their inagural cohort of "AGI Strategy" where this discussion arose. This post would not exist without their great efforts to build the much needed Safety workforce.

  2. ^

    Ganguli et al. (2022) "Emergent Abilities of Large Language Models", Wei et al. (2022) "Predictability and Surprise in Large Generative Models", Hoffmann et al. (2022)  "Training Compute-Optimal Large Language Models" (Chinchilla paper)

  3. ^

    https://en.wikipedia.org/wiki/No_free_lunch_theorem

  4. ^

    There may be a good argument to be developed here, if one can successfully argue that adaptability is an intrinsic component of generality, and not a mere after-effect.

  5. ^

    This formulation of generality as chainable out-of-distribution transfer draws on work in meta-learning and few-shot transfer learning. See Jiang et al. (2023), Tripuraneni et al. (2022), Sun et al. (CVPR 2019), and Ada et al. (2019) for theoretical foundations on OOD generalization and transfer bounds.

  6. ^

    https://www.lesswrong.com/posts/BqoE5vhPNCB7X6Say/superintelligence-12-malignant-failure-modes