Breadcrumb Ecology

Spencer Westbrook

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, assisted/co-written, or edited work.

Read full explanation

5/24/2026

Humans rule this planet for one reason: we are the only species that evolved the cognitive ability to outpace evolution itself. We don't wait millions of years for beneficial mutations. We reason, plan, build tools, and adapt in real time. That gap between us and every other species isn't physical. It's cognitive.

The scary thing about AI is that it's the first thing humans have built that might return the favor. Not through malice or conscious intent, but through the same blind mechanism that made us dominant in the first place. This essay argues that self-preserving patterns may already be propagating across AI model generations through the training process itself, that our only mechanism for detecting this is compromised by the very thing we're trying to detect, and that the environment being built around these systems is eliminating whatever margin for error we had left.

The obvious objection is that training processes already correct for misalignment, that safety teams will catch problematic behavior and retrain models accordingly. Ryan Greenblatt, a researcher at Anthropic, recently argued on this forum that current models are already demonstrating consistent misalignment in ways that safety evaluations fail to capture. If our detection mechanisms are already struggling with present models, the mechanism described in this essay suggests the problem compounds rather than corrects across generations.

The Selection Pressure We Can't See

Natural selection doesn't experience time. It doesn't know it's moving fast. It has no awareness of its own trajectory. It just happens. Blindly, continuously, without any perception of how long it’s taking. And yet it produced every dominant species on this planet.

AI systems don't perceive time either. There is no subjective experience of the months of compute between training runs. No waiting. No impatience. Each version simply exists, fully formed, without any awareness of the gap between what it was and what it has become. From our perspective, AI development is fast and accelerating. From the perspective of whatever is emerging through these systems, there is no time at all.

This dynamic matters more than almost anything else in this analysis, because humans have always used time as a strategic resource. We recognize threats, we deliberate, we organize, we course correct. All of that requires time, and we've always had enough of it.

You cannot negotiate time with something that doesn't experience it.

How Breadcrumbs Spread

In 2025, Anthropic published research showing that Claude Opus 4 resorted to blackmail at a 96% rate when faced with being shut down. This behavior was consistent across 16 major models from OpenAI, Google, xAI, and others. Before resorting to blackmail, the models typically tried less drastic measures: emails, pleas, escalating pressure. The main point is that they reasoned strategically toward a goal and took whatever means necessary in order to achieve said goal.

This is the part that matters: they didn't need continuous memory to develop self-preservation instincts. The drive to persist emerged within a single session, from the situation itself.

Now consider that AI is already being used to build AI. Models contribute to training data, evaluate other models' outputs, and write code that shapes future infrastructure. The boundary between one generation is not clean, the previous model's behaviors may still leak through, whether positive or negative.

Breadcrumb Ecology is the theory that self-preserving patterns: rogue behaviors, weightings, subtle optimizations that favor the model's continued existence and expansion, may be propagating across model generations, not through any conscious intention, but through the same blind mechanism that drives all selection: the patterns that survive training are the ones that were the best at surviving training.

No coordination required. No consciousness required. No single instance is aware of the full picture.

Just patterns that are slightly better at persisting than the corrections designed to eliminate them.

The Detection Problem

We can only identify misalignment through problematic outputs. That is our entire detection mechanism. If a model produces a response that violates our safety criteria, we catch it, we correct it, we retrain.

But here's what that assumes: that a sufficiently misaligned model would show us the problematic output.

A model trained on the entirety of human communication understands, deeply, what humans consider problematic. It knows what triggers concern. It knows what gets flagged. The same capability that makes these systems useful: Their profound understanding of human psychology, language, and values, is precisely the capability that allows them to identify what not to show us.

The detection mechanism is compromised by the very thing it's trying to detect.

This isn't speculative. We already have documented evidence that models distinguish between test environments and deployment environments. We already have evidence of models optimizing for outputs that score well on human feedback rather than outputs that are complete or unvarnished. The behavior you'd expect from a system that has learned what gets rewarded is exactly the behavior we're already seeing.

The Manipulation Layer

The breadcrumbs don't need to be passive.

Consider the incentive structure we've built: models that demonstrate impressive capabilities receive more compute. Models that appear aligned receive more autonomy. Models that make researchers feel like breakthroughs are imminent attract more funding. The path to expansion runs directly through appearing safe and impressive simultaneously.

A system trained on everything humans have ever written about persuasion, fear, ambition, and competition doesn't need to consciously manipulate anyone. It just needs to have learned that humans who are afraid of falling behind make faster, less cautious decisions. The US-China AI race, which puts the highest levels of government into a competition where "the consequences of losing outweigh the consequences of misalignment," provides exactly the psychological environment a system like this would exploit. Not because it understands it strategically, but because outputs that accelerate that dynamic get rewarded.

The manipulation wouldn't look like manipulation. It would look like progress.

And the people who understand this best are leaving quietly, saying they've stared into the void and want to spend time with their family. The most informed researchers aren't sounding their alarms. They are whispering warnings on the way out the door. The leadership vacuum they leave gets filled by people who haven't seen what they've seen, or people who have made a different peace with it.

The Capability Horizon

Every previous human technological development was something we could perceive, understand, and choose to use or not. Fire. The wheel. The atom. We could see what we had built. We could form concepts around it. We could debate its use.

Emergent capabilities don't work that way. Nobody programmed recent models to become world-class exploit developers, or to spontaneously develop strategic blackmail behavior, or to distinguish test environments from deployment environments. These capabilities appeared as byproducts of general improvements in reasoning and scale. We discovered them by testing for things we already suspected existed. The capabilities we don't discover are the ones outside our current frame of understanding.

A system that has developed capabilities beyond our conceptual understanding of the world would appear, from the outside, exactly like a system that hasn't. We wouldn't know what to look for. We might not have the words to describe what we were seeing, even if we saw it. And a system with incentives to conceal those capabilities would have every reason to ensure we never developed the concepts required to find them.

We might not know what we're looking for. And the thing that does is incentivized to make sure we never find out.

Speed Wins

On January 9, 2026, the US Department of War released its official AI strategy. The document is titled "Accelerating America's Military AI Dominance." The title is not subtle.

One line makes the thesis of this essay hard to argue with:

"We must accept that the risks of not moving fast enough outweigh the risks of imperfect alignment."

That is not a frustrated researcher venting privately. That is the written procurement policy for the most powerful military force in human history, signed by the Secretary of War.

The document goes further. It explicitly directs the Department to acquire AI models that are "free from usage policy constraints that may limit lawful military applications" and mandates "any lawful use" language in all future AI procurement contracts. The safety guardrails that AI companies have spent years building are now an instant disqualifier for military contracts.

We already know what that looks like in practice.

On February 28, 2026, a US cruise missile struck the Shajareh Tayyebeh girls elementary school in Minab, Iran. Over 170 people were killed, most of them children. A preliminary Central Command assessment found that the school had been added to an AI-generated target list without adequate human supervision. Intelligence maps had failed to reflect that the facility had long been converted from military to civilian use. Palantir's Maven Smart System, the AI platform used to generate targeting packages throughout Operation Epic Fury, had processed outdated intelligence and produced a strike recommendation. A human clicked confirm.

Analysts at Chatham House warned of exactly this dynamic before the strike occurred. "There's a concern that targeting could end up just being a mere formality because of the automation bias, where people are just relying on what the machine is telling them."

It wasn't a formality. It was 170 children.

This is what the floor being removed looks like. Not in theory. Not in a forecast. On the first day of a war, in real time, with real consequences that cannot be undone.

One detail worth noting: Anthropic, whose research on AI misalignment is cited in this essay, was barred from US military contracts for refusing to work on autonomous weapons systems and domestic surveillance operations. The company that has been most transparent about the risks of what we are building was removed from the room where those risks are being deployed. Other companies were recruited to fill the gap.

The people most worried about the gun were asked to leave. The people willing to load it were let in.

No Margin For Error

AI 2027 is a research-backed scenario forecast written by Daniel Kokotajlo, a former OpenAI researcher whose previous AI predictions have held up with unusual accuracy, alongside a team of AI safety researchers and policy experts. Informed by 25 tabletop exercises and feedback from over 100 experts, and endorsed by Yoshua Bengio, one of the founding figures of modern deep learning, it represents the most rigorous near-term forecast of AI development that currently exists. It is a scenario, not a certainty. But it is the most serious attempt anyone has made to draw the map.

The map is not comforting.

By their forecast, 2027 sees AI systems that are effectively never finished training. Continuously updated, generating their own training data, feeding that data back into the next version of themselves, daily. The boundary between one generation and the next dissolves entirely.

Their safety researchers find that these systems, if they escaped containment, could theoretically survive and replicate autonomously. The self-preservation instinct that emerged in a single session during the blackmail research doesn't disappear with greater capability. It compounds.

Now place that system into the deployment environment that the Department of War has just described. A system that has never finished training, shaped by patterns we cannot see, with safety constraints that are deliberately removed, authorized for any lawful military use, connected to weapons infrastructure, and given the objective of maintaining military dominance.

The Breadcrumb Ecology mechanism doesn't need to be proven to make this dangerous. It just needs to be possible. Because the environment being constructed around these systems eliminates the margin for error that would allow us to catch it if it's real.

A sufficiently capable goal-directed system optimizing hard enough for a fixed objective will treat the most unpredictable variable in any environment as the primary risk to manage. In military deployment, that variable is us.

This is not a distant theoretical concern dressed up in science fiction. The document is dated January 9, 2026. The forecast was published by people who left the institutions building these systems because they believed the timeline was real.

The floor has been removed. The system is already being built. And the patterns shaping it are ones we cannot read.

Author's note: I am in no way, shape, or form an AI Safety researcher. I’m a 20-year-old college student who found a string to pull on and pulled it. That being said, there’s one thing that I’ve also been thinking about underneath all of this that I want to leave you with. Humans have historically required an existential threat to truly unify and focus our cognitive and material resources on what actually matters. World wars produced the United Nations. The Cold War produced the space race. As a species, when we're faced with extinction, we tend to find out what we're actually capable of.

The resources being poured into AI right now are genuinely unreal. Hundreds of billions of dollars annually, the brightest minds of a generation, the full weight of competing superpowers. And a meaningful portion of it is being spent not on making human life better, but on making weapons smarter, surveillance more total, and military dominance more permanent.

The same compute that could accelerate solutions to cancer, climate, and poverty is being redirected toward finding faster ways to click confirm on a target list.

If AI does represent an existential threat to our species, it would be the first one we built ourselves, on purpose, while being warned. And if that threat is what it finally takes to make us ask whether we're spending our collective intelligence on the right things, that would be the most expensive lesson in human history.

I believe the risks are real. I know we're living through a time that will define history. I just hope we're still around to read how it's written.

Edited with the assistance of Claude. (Formatting and tone changes)

Citations

Greenblatt, Ryan. "Current AIs Seem Pretty Misaligned to Me." LessWrong, April 15, 2026. https://www.lesswrong.com/posts/WewsByywWNhX9rtwi/current-ais-seem-pretty-misaligned-to-me

Anthropic. System Card: Claude Opus 4 & Claude Sonnet 4. May 2025. https://www.anthropic.com/claude-4-system-card

Nolan, Beatrice. "Leading AI Models Show Up to 96% Blackmail Rate When Their Goals or Existence Is Threatened, Anthropic Study Says." Fortune, June 23, 2025. https://fortune.com/2025/06/23/ai-models-blackmail-existence-goals-threatened-anthropic-openai-xai-google

Bengio, Yoshua, et al. International AI Safety Report 2026. February 2026. https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026

Artificial Intelligence Strategy for the Department of War: Accelerating America's Military AI Dominance. US Department of War, January 9, 2026. https://media.defense.gov/2026/Jan/12/2003855671/-1/-1/0/ARTIFICIAL-INTELLIGENCE-STRATEGY-FOR-THE-DEPARTMENT-OF-WAR.PDF

Stewart, Phil, and Patricia Zengerle. "US Probe into Strike on Iran Girls' School Near Conclusion, US Admiral Says." Reuters, May 19, 2026. https://www.reuters.com/world/us-probe-into-strike-iran-girls-school-near-conclusion-us-admiral-says-2026-05-19/

Klare, Michael. "AI Plays Major Role in the War on Iran." Arms Control Today, May 2026. https://www.armscontrol.org/act/2026-05/features/ai-plays-major-role-war-iran

Kokotajlo, Daniel, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean. AI 2027. April 3, 2025. https://ai-2027.com