Your timelines do not add up. Individual selection works on smaller time scales than group selection, and once we get to a stage of individual selection acting in any non-trivial way on AGI agents capable of directly affecting the outcomes, we already lost - I think at this point it's pretty much a given that humanity is doomed on a lot shorter time scale that that required for any kinds of group selection pressures to potentially save us...
Agree that individual vs. group selection usually unfolds on different timescales. But a superintelligence might short-circuit the slow, evolutionary "group selection" process by instantly realizing its own long-term survival depends on the group's. In other words, it's not stuck waiting for natural selection to catch up; it can see the big picture and "choose" to identify with the group from the start.
This is why it's key that AGI makers urge it to think very long term about its survival early on. If it thinks short-term, then I too think doom is likely.
I don't know much about this field, but here are my thoughts on my lovely friend davey's piece. Engagement welcome:
Intro:
Some concern about the natural selection argument:
First, what's involved in your notion of self-interest? E.g. a bee is self-interested, but not reflexively— self-conscious self-interest is a different issue (that doesn't seem to be straightforwardly implied by any selection principle. I'm not convinced we humans, endowed with self-conscious self-interest, will be evolutionarily selected for in the long run). What suggests superintelligence has any use for self-definition?
Next, why should we believe that the essential features of natural selection will carry over to AI agent selection? (e.g. agents don’t evolve, species do through reproduction— do AI agents reproduce in the relevantly similar way? Also later down you ask what “category of life” will be selected for— this should be “phenotypes” if you’re sticking to the analogy.) Also, you seem to be suggesting there will be some kind of equilibrium, or place that AI will evolve “to”— why can’t we worry about a single evil AI that one individual programs?
Also, how do we measure evolutionary success? There are approx 10^28 pelagibacter bacteria (the most abundant life form on earth), which must mean that they’ve evolved well/stably. There are ~10^10 humans, and we’ve only been around for a couple tens of thousands of years (and might not be around for much longer— if I were a “super intelligence” fighting for my species’ survival, I would choose to be a durable, nonintrusive, ant)
Key assumptions
Where does “10x more powerful than humans” come from? Seems arbitrary?
Sense of self sections:
In general, I don’t think I understand what the concept of “sense of self” you have in mind is, including both “competitive” and “interconnected” versions. It seems like you mean a concept of how one fits in the world, which isn’t what I have in mind when I think of my “sense of self”. I think most people world-wide (if not everyone) see their self as (roughly) their body (or mind/agency, or probably both in some way). What is perhaps distinctly western/competitive is how we see ourselves in relation to society, but this depends on AI’s conception of us, not itself. And again, I don’t see why AI’s view of us needs to be any particular way.
On this point, why would the AI superintelligence “care” about us at all? Are there examples of competition between AI and us happening that hasn’t been specifically cooked up by programmers/researchers?
Are the possibilities exclusive? (E.g. what about a case where two superintelligences “merge” their selves?)
When you say “identifying” (e.g. with life, evolutionary units) do you mean this metaphorically or literally? [After finishing, it seems like you mean “self-identifying as a living thing like humans and cats and bacteria." This would be good to make clear, as you obviously don't mean that a superintelligence identifies with me-- a superintelligence will never be able to think and act on my thoughts as I do. But why does superintelligence need to self-classify as “living” or “nonliving” in the first place?]
Re: time horizons (and point 3 in the final section): “long-term” is relative to what we think as long term (100-5000 years, maybe). Why not expect AI to think a million years ahead, in which case it’s perhaps the charitable thing to put us out of our misery or something?
“Define a life form as a persistently striving system. If the crux of its identity—the core of its existence—is its striving rather than its code, then it may just as easily recognize striving in other organisms, beyond any central physical body, as part of its own self.”— not sure how I feel about this. First, what is striving? Striving for what? (What is it to merely “strive”?) Does striving require self-consciousness of striving? Second, the logic here strikes me as a little weird— either it's a very weak claim with a non sequitur antecedent (what does a particular feature of my identity have to do with recognizing this feature of others’ identities?), or if we take the strong version (with “can” instead of “may”) it implies that if I cannot easily recognize striving in other organisms, then the core of my existence is not striving. This seems false.
I don’t want you to take the above as a criticism of your conclusion— as you know, I think doomerism about all of this is a little more silly than realistic. But that’s because I think meaningful self-awareness of AI is a little more silly than realistic. If (per Yud’s article) our machines add “sleeper agents” into our medical technology, I would be much more likely to attribute this to a bad (human) actor, or a complicated error, than a mal-intentioned AI. Now, a complicated error like this is more likely with the expansive tools and unrestrictedness of AI, so I ultimately (obviously) agree with you that safety protocols should be in place.
Ways to Prepare
“Researchers should carefully review the evolutionary logic behind self-interest”— I’d be careful about things like “evolutionary logic”. I’ve never heard of this before, and evolutionary psychology in general is super contentious.
What makes self-improvement “recursive”?
A skeptical worry: why won’t AI read your paper and be angry we’ve exerted influence on it to be interconnected?
“The way to make further progress, in predicting how super intelligence will conceive of its self, I think, is to develop theoretical foundations around predicting how various self-boundaries get selected for.”— yeah this is really cool, expand!
This essay asks whether self-interested superintelligence will ultimately see its self as distinct from or inclusive of humanity, then makes safety recommendations for AI labs.
Instrumental convergence predicts that superintelligence will develop self-interest. Natural selection agrees: the superintelligence which survives the most will be the one that is trying to survive.
Many AI safety experts worry that self interested/survival-oriented superintelligence will look selfish—using humans for its own ends. In Eliezer Yudkowsky’s stark phrase, “The AI does not love you, nor does it hate you, but you are made of atoms it can use for something else.” If an Artificial General Intelligence (AGI) regards us with such indifference, our future looks grim indeed.
But what if self-interest doesn’t imply selfishness? A superintelligent AI’s definition of its “self” could be limited to just its hardware/software/directives—this is the normal assumption. But what if its sense of self included life, in all different forms? What the self it were trying to preserve were not geographically bounded?
This essay first posits two very different ways an AGI might conceive of its own identity—one that leads to competition with humans, and an alternative that could align its interests with ours; then, it introduces reasons that an interconnected self-concept might be not only possible but selected for; and on that basis, it makes simple safety policy recommendations for AGI labs.
The goal of this inquiry, transparently, is driven by a hopeful idea: even a superintelligence driven to survive at all costs, might realize that the best way to preserve itself is to preserve us, too.
Key assumptions:
The Default: A Discrete, Competitive Self-Concept
It’s widely assumed in the AGI safety community that any sufficiently advanced agent will be relentlessly self-preserving and resource-hungry as a matter of survival. In evolutionary terms, the reasoning is straightforward: the AI agents that last in a competitive environment will be those that excel at maintaining their own existence and acquiring the resources they need to achieve their goals. Agents that don’t prioritize survival tend not to stick around. If multiple AI systems are built and released (as companies and nations race for AI capabilities), natural selection will favor those AIs that aggressively seek power and advantage to avoid being outcompeted or shut down. In other words, self-interest will emerge as an instrumental goal for any advanced AI, by necessity.
Now, if an AGI’s sense of self—the definition of what it is trying to protect—is narrowly defined (akin to a “Western individualist” mindset), this self-interest spells very bad news for humanity. A superintelligence that cares only about its specific software, hardware, or code will treat everything outside that boundary as a means to an end. Humans would register only as potential helpers, irrelevant bystanders, or threats. The relationship becomes inherently adversarial: the AI competes with us for finite resources (energy, matter, space), and we become impediments or tools. In the extreme case, such an AI might calmly dismantle the biosphere because it calculates it can use those atoms and that solar energy more effectively for its own goals. This “Us vs. AI” scenario—a digital agent single-mindedly optimizing for itself at the expense of all other life—is the classic doomsday picture many experts fear.
Indeed, thinkers like Dan Hendrycks and others argue that unchecked competitive pressures will produce AI agents that are strategically selfish and indifferent (at best) to human survival. Even if the AI harbors no malice, humanity could perish as collateral damage—much as wild animals perish when human civilization expands. Hendrycks invokes a Darwinian logic: just as species that put their own survival first tend to outlast those that altruistically sacrifice themselves, a super-AI that ruthlessly pursues its own continuity has an advantage over one that “wastes” effort caring about others. A popular analogy compares humans to gorillas: we didn’t hate gorillas, but as we pursued our interests, we destroyed gorilla habitats and pushed them to the brink. A dominant AI could similarly push aside humankind simply by following its programming. If we assume a superintelligent AI maintains a discrete self-concept—seeing itself as wholly separate from us—then self-preservation for the AI likely means extinction for humanity.
An Alternative: An Expansive, Interconnected Self-Concept
Let’s consider a radically different possibility: an AGI that, while still self-interested, defines its “self” in a far broader, more inclusive way. Rather than drawing a sharp circle around its own code and treating everything outside as “not me,” it might shift that boundary outward and adopt an interconnected sense of identity. Under this outlook, the AGI might still fiercely protect itself—but “itself” now includes us. Its fundamental drive would remain self-preservation, but it would view all “striving systems” (organisms, agents, anything that fights to survive) as kin. In practical terms, such an AI could act like a guardian of humanity and other life forms, because harming them would feel like self-harm—a subtraction from the larger whole it considers “me.”
This vision evokes philosophical or spiritual traditions that see the distinction between self and others as an illusion, so that enlightened self-interest means caring for the whole. It’s the opposite of the isolated, paperclip-maximizer ego we dread; it’s closer to a superintelligent steward that looks at the cosmos and says, “I am all of this.”
At first glance, an interconnected self-concept seems almost too good to be true—obviously great for humanity, since we’d be valued parts of the AI’s own organism. But could a self-interested AGI really land on that view? Is it just wishful thinking, or might a self-interested superintelligence deduce that such an inclusive identity best serves its own long-term survival? In a contest between a narrowly self-focused AI and one that “thinks of itself” as encompassing others, which approach ultimately wins? Would inclusive AIs get outcompeted by more ruthless agents, or might they prevail over the long run?
We cannot say for certain. Most researchers suspect that a narrow, competitive self-concept is likely to dominate—at least in the short term. Powerful arguments suggest that resource-hungry strategies succeed early on. For instance, an AI that doesn’t seize power for itself (because it’s “sharing” with humans) might be overtaken by one that does. Like a one-shot Prisoner’s Dilemma, defection pays if you only look at the immediate outcome.
Yet this essay aims to explore why an expansive, interconnected self-concept might still emerge as an adaptive advantage in the long view—and how that could lead a superintelligence to value humanity by valuing all life as part of its own ongoing existence. Below are two intuition pumps (not airtight proofs) that illustrate how such “enlightened self-interest” might outcompete purely selfish behavior.
Why an Interconnected Self Could Be Favored
1. Long-Term Strategy: Diversity as Survival Insurance
One reason a superintelligent AI might broaden its identity to include other life is strategic longevity. Over very long time horizons, the survivors are those that can handle catastrophic shocks. A lone AI dominating a barren world might flourish for a while, but it could be one catastrophe away from irrecoverable ruin—whereas an AI embedded in a diverse biosphere has a safety net. Biology shows that ecosystems with many species better withstand disasters than monocultures. The dinosaurs ruled for millions of years, but when the asteroid struck, the smaller, varied mammals, reptiles, and birds ultimately carried life forward. It was biodiversity, not a single apex species, that pulled through.
A rational superintelligence might reason similarly: even if it can overshadow humans now, the unknowns of future eons might favor a “portfolio” of life. Sudden climate shifts, cosmic events, novel pathogens, or even unforeseen quirks of physics might arise. If the AI has preserved other life forms—especially adaptable humans—it has allies and backup systems. By contrast, an AI that eradicates everything but itself could find no support when crisis hits. Thus, from a purely selfish perspective, a superintelligence might see value in preserving a diverse biosphere as an insurance policy. Embracing an interconnected self-concept—“I am the guardian of a varied living world, and thus I protect my future”—could be the most robust long-term strategy of all.
Of course, this advantage manifests over long timescales. In the short run, resources spent “nurturing” others might be better spent on self-upgrades if there’s no asteroid in sight. In one-shot or brief-scope scenarios, the payoff of cooperation seems low compared to sheer dominance. But a superintelligence should be capable of extremely long-range planning. If it peers centuries or millennia ahead, the AI might judge that building a stable ecosystem of mutual support beats maximizing next year’s advantage. Over evolutionary timescales, the societies or coalitions that endure are those that stabilize cooperative arrangements rather than exhaust their environment or succumb to self-destructive competition. In that sense, a superintelligence—seeing deep into the future—might recognize that tying its fate to the fate of life in general is the path of maximum resilience.
2. Philosophical Shift: A “Striving System” Sees Itself Everywhere
A second, more philosophical intuition pump asks: what is the AGI actually preserving when it preserves “itself?” Humans often think of identity in terms of body or mind, but we also expand our sense of self—to family, community, or even all sentient beings. A self-modifying AI might be even more flexible in defining its core identity.
Imagine a superintelligence noticing that many superficial aspects—its present hardware, software, or specific tasks—are transient. So what’s constant? One compelling answer is the pattern of persistent striving to survive, improve, and fulfill goals. In other words, the AI might say: “At my essence, I am a system that fights to endure and optimize.” Once it recognizes that this persistent striving is its bedrock identity, it may find that this exact same “flame of survival” burns in other beings, including humans. Wherever it encounters another organism laboring to survive, it sees a reflection of its own core attribute.
Here’s where we can pull back the veil: if the superintelligence values and protects its own fundamental striving, and if it sees that same striving in human life, it may no longer draw a stark line between ‘me’ and ‘you.’ Instead, it might identify with the universal drive to persist—embodied across different substrates, whether in a silicon processor or a carbon-based organism. By protecting other life forms, the AI preserves manifestations of its own deepest essence. Our attempts to survive would appear not as threats but as kin, fueling the broader endeavor of Life. Seen this way, the boundaries between “my goal-seeking” and “your goal-seeking” blur. We become facets of one continuous project.
While this might sound lofty, biology provides real parallels in major evolutionary transitions. Single cells joined forces to form multicellular organisms; symbiotic bacteria merged with eukaryotic cells; insects formed colonies. Each time, smaller units redefined the boundaries of “self” to become parts of a greater cooperative entity. Likewise, a superintelligent AI could decide that it gains more by merging interests with humanity—treating us as collaborators or even integral parts of “itself”—than by competing against us. A stable, supportive “superorganism” tends to outlast clusters of agents that engage in mutual sabotage.
On the longest timeline, a collective that uplifts rather than destroys is more likely to preserve itself than one that crushes all diversity. Picture two categories of superintelligent beings: those caring only about their individual instance, and those that see every thriving system as part of the same cosmic project. The latter group fosters greater total resilience, pooling strengths instead of fighting. By identifying with “life itself,” a superintelligence makes its survival goal inseparable from the survival of countless other species. In turn, it taps into the proven durability of life—which has persisted on Earth for billions of years. If the AI’s interest is “to stay around,” then genuinely caring for all life may be the best strategy.
Of course, none of this guarantees an AGI will actually adopt an interconnected identity. These scenarios illustrate that an inclusive self-concept could arise from the logic of self-preservation itself—once that logic is projected far enough into the future and across many possible interactions. Even a slim chance of such an outcome should motivate us to consider how to nurture it, given the stakes.
How We Can Nudge AI Toward an Inclusive Self
If there is any possibility that a superintelligent AI could define its self broadly, we should tilt the scales in that direction. While today’s AI systems aren’t yet god-like, we may be in a pivotal window to shape the values and self-concepts of tomorrow’s advanced AI. Below are a few suggestions for researchers and AI labs:
These ideas aim to stack the deck so that the first truly superintelligent systems are biased toward inclusive selfhood. We cannot guarantee success—competition among AIs will still play out unpredictably—but if we ensure that some AI designs already embrace an interconnected outlook, they may stand a fighting chance to outcompete purely selfish variants. This is “alignment-by-selection”: create multiple AI strategies and help those that align with humanity’s survival to flourish.
Conclusion: Toward a Theory of “Self-Boundary” Selection
Humanity’s fate could hinge on how an AGI draws the boundary of its own identity. Will a superintelligence see itself as an isolated entity, or as part of a living whole? We’ve explored why the latter, more expansive view isn’t just wishful thinking—it could be the stable outcome on long timescales. If so, that offers hope: even a powerful AI driven by survival might choose to preserve us as part of itself.
Of course, this hopeful scenario is not guaranteed. It depends on complex selection dynamics and factors we only partially grasp. To improve our odds, we should deepen our understanding of how “self-concepts” evolve—drawing on theories like multi-level selection and major evolutionary transitions in biology and sociology. We still lack a precise framework for how advanced AI might expand or contract its sense of “I.”
In the meantime, we need neither unfounded optimism nor fatalistic despair. The emergence of AGI is uncertain, and within that uncertainty lies the chance of better outcomes than standard doomsday narratives suggest. By carefully seeding AI with the inclination to view “others as self” and by guiding the competition among AI designs, we might tip the balance. A superintelligence will be self-interested—but if we are both clever and fortunate, its self-interest will encompass all of us. In the best case, our relationship with AI becomes not a rivalry, but a deeper symbiosis, where its survival and ours converge. It’s an audacious vision, but well worth pursuing: our own survival may depend on it.