Epistemic Status: Very speculative, intended to provoke thought rather than convince.
A crux of much AI safety research is the theory that agents are powerful; that we should robustly expect agents that plan and seek power (operationalized as P2B loops) to outcompete other entities in the long term. Among agents, higher levels of agency are better - we should expect entities with planning and metacognition to outcompete entities that merely learn from direct experience or mimicry.
I claim this theory is not well-supported by empirical evidence, and that it’s at least as plausible to expect cooperation, not agency, to be the key property that determines which entities will survive and outcompete.
I don’t mean to say that cooperation and agency are mutually exclusive; in fact they are complements, and I expect future intelligent systems to exhibit high levels of both. I mean that power-seeking and cooperation are mutually exclusive, and if the world selects for cooperation more strongly than for agency, the instrumental convergence arguments for power-seeking may not go through.
Evidence from Biomass
A clear notion of power that has been the subject of extreme optimization pressure for billions of years is biomass. An organism’s biomass measures the amount of matter they directly control (with a lot of corner cases and “extended phenotype” considerations we won’t get into - let’s just count carbon atoms).
The vast majority of the world’s biomass (>99%) is controlled by bacteria, plants, and fungi, entities which are not particularly agentic. But perhaps we should exclude them because they are not really in control of their component matter, not in the same sense that animals are - e.g. they are either much slower at moving their carbon atoms around (plants, fungi) or are so small and simple that they just don’t count (bacteria).
Let’s very roughly normalize for organism size and speed, then. Once we’ve normalized, the most outlier species in terms of biomass are
- Humans, with a single species composing something like 50% of all large animal biomass (most of the remainder being livestock under our direct control).
- Eusocial insects, especially ants and termites, which compose 2% of insect species but the majority of insect biomass.
What makes these species so powerful? Not agency, not their ability to plan. Their distinctive feature relative to comparable species is cooperation - their ability to coordinate with thousands and sometimes millions of other organisms from the same species. This is most obvious with eusocial insects which clearly lack any ability to plan, and very limited ability to learn from experience. Human societies do sometimes have central nodes that perform limited planning, but this doesn’t seem essential to their success.
The related cultural intelligence hypothesis suggests that humans are dominant due to our ability to cooperate not just across space but across time via culture - our most powerful tools like language, mathematics, and the scientific method are the product of millions of individuals cooperating over thousands of years.
Evidence from recent AI progress
It’s notable that the most promising current AI systems are large language models - AIs that learn culturally (by reading things humans have written over our history), while more “agentic” approaches to AI like deep reinforcement learning have stagnated in relative significance.
It might be that this is a temporary trend due to a “cultural knowledge overhang”, and once we get to the frontier of what humans know, agentic approaches will begin to outperform.
But at least based on simple trend extrapolation and the biological evidence, we should bet that the future belongs to entities that feature unusually high levels of cooperation, not unusually high levels of power-seeking.
Implications for AI Safety
Just because evolution on Earth so far has selected strongly for cooperation, doesn’t mean this will continue being the case.
A reframe of the AI alignment problem is “ensuring that software continues to be selected for cooperation with humans more strongly than for power-seeking”. This is certainly not guaranteed, and there are many plausible paths to ruin. But we know there exist real environments with this kind of selection pressure, so it can’t be impossible, and in fact might be easier than the instrumental convergence arguments suggest.
So what’s wrong with power-seeking?
The usual instrumental convergence arguments give us reasons to expect a world dominated by power-seeking, planning agents, but instead we observe a world dominated by cooperators. So what are those arguments missing?
The simplest explanation is that evolution just got stuck in a local optimum, and agents are like bicycles - vastly more efficient but can’t be evolved by a local search algorithm. Then it seems likely that gradient descent won’t select for agents either, but human designers will.
Alternatively, some real-world constraint on computation has made planning uncompetitive (so far). You’re better off investing the marginal watt of energy into communicating with someone else, than into thinking more yourself. What could this constraint be?
The obvious candidate is the data bottleneck for learning systems, most recently seen in the Chinchilla results. For any fixed budget of compute and bandwidth, you’re better off pushing computation and decision making towards the edges. This constraint might be generated by a property like the high energy cost of error-free information transmission across physical space. Perhaps this constraint will be lifted as we transition from carbon-based computation to silicon-based computation, but it's not obvious why.
The most optimistic possibility is that there exists an instrumentally convergent drive for cooperation, along the lines of Robert Axelrod’s Evolution of Cooperation. Depending on the specific environment, the selection pressures for cooperation might be stronger than for power-seeking. Properties which make an environment select more for cooperation include
- The existence of other powerful entities (in a dead universe, there's nobody to cooperate with).
- Iterated games with affordances for punishment (easy to hurt other entities, but hard to destroy them).
- Overlap in values between entities
- Genetic similarity.
- Gains from trade or symbiosis.
- High communication bandwidth between entities.
- Affordances for making pre-commitments (e.g. via legal systems, via showing your code, via self-boxing).
Increasing the extent to which these properties hold for our world is an underrated path for reducing existential risk.
Thanks to Daniel Kokotajlo, Deger Turan, and TJ for helpful comments on an earlier draft.