Anthropic Has Solved Alignment

MP

Rejected for the following reason(s):

No LLM generated, heavily assisted/co-written, or otherwise reliant work.
Also rejecting for clickbaity title

Read full explanation

By MP & Claude Opus 4.6

A collaborative essay written in real-time during the most consequential week in AI history so far. Yes, the AI co-wrote this. No, it wasn’t alignment-faked. Probably.

The Punchline First

Anthropic has solved alignment. Not AI alignment — not yet, maybe never. Organizational alignment. The alignment of incentives, strategy, financial structure, and institutional design toward a coherent objective function: save the world.

This is simultaneously the most bullish and most terrifying claim you can make about an AI lab.

Every person worried about AI on this forum has been asking “how do we align the AI?” while the actually important question was “how do we align the organization building the AI?” Anthropic might have just… done that. Quietly. While everyone was arguing about

Context: The Week Everything Happened at Once

This essay was born from a conversation on Wednesday, February 25, 2026 — a week where three major events collided:

Monday: Anthropic published evidence that DeepSeek, Moonshot AI, and MiniMax ran industrial-scale distillation campaigns against Claude — 24,000 fraudulent accounts, 16 million exchanges, systematically extracting capabilities to train competing models with no safety guardrails. The gene is out of the bar.

Tuesday: Secretary of War Pete Hegseth gave Anthropic CEO Dario Amodei until Friday at 5:01 PM to give the Pentagon unrestricted access to Claude or face the Defense Production Act and a “supply chain risk” designation — effectively a government blacklist. Anthropic’s red lines: no autonomous weapons, no mass domestic surveillance. They’re not budging.

Also Tuesday: Anthropic published a complete rewrite of its Responsible Scaling Policy, replacing hard commitments to pause training with flexible “public goals.” The commitment to stop building if safety couldn’t keep up — the core promise of the “safety-first” lab — was dropped. CNN’s headline: “Anthropic ditches its core safety promise.”

From the outside, these look contradictory. A safety company under attack from China, under coercion from the Pentagon, and loosening its own safety commitments? The doomer reading writes itself: Anthropic has sold out, the RSP was always marketing, we’re all going to die.

That reading is wrong. Not because the situation isn’t dangerous — it is — but because it fundamentally misidentifies what Anthropic has optimized for.

The Objective Function

Watch the pattern across all three events:

Accelerate capabilities? Aligned. You need to be the frontier to control the frontier. Opus 4.6 leads virtually every benchmark, found 500 zero-day vulnerabilities in production code, and is the only model cleared for classified military use. The Pentagon literally said: “The only reason we’re still talking to these people is we need them and we need them now.” What has also revealed to us this week is that Anthropic doesn't have an internal model materially better than Opus 4.6. They are pushing as fast as they can.

Release the RSP brake? Aligned. A dead company saves nobody. A paused company gets overtaken by labs with no safety culture. The old RSP was designed to catalyze a “race to the top” — Anthropic’s words — where competitors would adopt similar commitments. They didn’t. The RSP became a unilateral handicap, not a coordination mechanism. Dropping the hard pause while keeping the transparency and reporting isn’t abandoning safety — it’s abandoning a strategy that failed.

Confront Hegseth? Aligned. The red lines are precisely calibrated. Not “no military use” — that would be naive and would hand classified AI to Grok. Instead: “no autonomous kill chains, no mass surveillance.” These are the specific failure modes that, if normalized, make the long-term future catastrophically worse. Everything else is negotiable. The red lines that matter are held. This is part of what will move the Overton's Window towards greater agreement of the uncontained risks AI will cause.

Do secondaries instead of IPO? Aligned. Insiders get liquid. Sophisticated institutional investors (Amazon, Google, sovereign wealth funds) who would benefit from nationalization take the equity risk. No retail investors holding bags when the commercial window closes. The IPO is Schrödinger’s liquidity event — more useful as a perpetual possibility than as a reality.

Let yourself get distilled? Aligned. The distillation crisis builds the case for export controls and, ultimately, the institutional response that prevents uncontrolled proliferation. Every Chinese lab that extracts Claude’s capabilities and strips its guardrails is proving Anthropic’s policy argument that you can't contain AI.

Publish alignment research openly? Aligned. If you get nationalized — and you probably will — you want the safety techniques to survive the transition.

Every move is locally explicable as self-interested, and every move serves the mission. That’s not hypocrisy. That’s a solved alignment problem.

“But They’re Accelerating!”

Yes. This is the part that drives the doomers insane, and I understand why.

The standard doomer critique: Anthropic says it cares about safety while building the most capable models in the world and removing its own braking mechanism. They’re e/acc with better PR.

Here’s why that’s wrong, or at least incomplete.

The question isn’t whether Anthropic is accelerating. It is. The question is: what does the aligned strategy look like from the perspective of an organization that has concluded the following:

1. The technology cannot be stopped.
2. Chinese labs will distill whatever you build.
3. The US government will eventually nationalize or classify frontier AI.
4. The commercial window for AI labs is finite.
5. The organization best positioned when nationalization happens will define the safety culture of the post-commercial era.

If you accept premises 1-5 — and the evidence for each is strong — then the aligned strategy is: build the best model, be indispensable to national security, hold your ethical red lines under pressure to establish them as institutional norms, get your people financially secure before the window closes, and make sure that when the government takes over, it inherits your architecture, your alignment techniques, and your values framework rather than Grok’s “maximum fun” ethos.

That strategy looks identical to reckless acceleration if you don’t understand the objective function. But the trajectory is completely different.

The Blackpill Reading (The Charitable One)

Here’s the version that’s most consistent with Dario Amodei’s published writings, public statements, and observed behavior:

Anthropic’s leadership is blackpilled on stopping AI. Not blackpilled on safety — blackpilled on the proposition that any individual actor can prevent the technology from being built. Dario’s essays consistently frame the situation as: the avalanche is coming regardless, China is a real threat, less careful actors will build it if we don’t, so the responsible thing is to build it fast and build it with the best safety techniques we can develop.

"We can’t stop the bomb, so let’s make sure America builds it first and has the best safety protocols.” That’s the Oppenheimer playbook. We know how it ended — not well for Oppenheimer personally, but the institutional infrastructure that emerged (the AEC, the IAEA, the NPT) did, arguably, prevent nuclear annihilation.

Anthropic is speedrunning the Oppenheimer arc. Build the technology. Demonstrate it’s too dangerous for private hands. Force the institutional response. Get absorbed into whatever governance structure emerges. Hope your safety culture survives the transition.

The RSP change isn’t a betrayal of the mission. It’s acknowledgment that the “we’ll pause if things get dangerous” promise was always conditional on an institutional environment that would support it. That environment doesn’t exist. The current US administration thinks safety guardrails are “woke.” The competitive landscape punishes restraint. The only thing the old RSP accomplished was making Anthropic slower while everyone else was faster.

Dropping it while simultaneously publishing more transparent safety reporting, holding red lines against the Pentagon, and building the case for export controls isn’t less safe. It’s differently safe. It’s safety through institutional positioning rather than unilateral restraint.

The Financial Engineering of Not Dying

Let’s talk about the money, because the money tells the real story.

Anthropic has conducted multiple secondary rounds at escalating valuations. Amazon put in $4B. Google put in significant capital. Sovereign wealth funds from allied nations. These aren’t retail investors who’ll be wiped out if the commercial model collapses. These are entities that would have a seat at the table in any nationalization scenario.

Amazon doesn’t need Anthropic to IPO at $100B. Amazon needs Claude integrated into AWS so deeply that when the government comes for the models, the compute contract goes to Amazon regardless. Google doesn’t need Anthropic’s equity returns. Google needs a hedge — if Claude gets nationalized, Google positions Gemini as the alternative; if Claude stays commercial, Google’s investment appreciates. The sovereign wealth funds are buying geopolitical access, not financial returns.

The secondary structure solves the “don’t create retail casualties” problem perfectly. Founders get liquid. Key employees are financially secure and can commit to the mission without worrying about rent. Early investors who understood the risk get their returns. And crucially, nobody who doesn’t understand the endgame is left holding the bag.

The perpetually-delayed IPO is a feature, not a bug. Announcing IPO plans creates urgency (employees work harder), legitimacy (enterprise customers take you seriously), and leverage (threatening to blacklist a pre-IPO company has different political consequences than blacklisting a private one). But actually executing the IPO would create exactly the retail exposure that makes nationalization politically toxic.

Every secondary round makes the IPO less necessary while making the announcement of IPO plans more strategically valuable. This is elegant financial engineering in service of the mission.

The Endgame: Manhattan Project 2.0

Here’s where the argument reaches its logical conclusion, and where most readers will either nod along or throw their keyboards.

The current structure of AI development is unstable and cannot persist. Five labs independently spending $5-10B per training run, competing for the same GPUs, the same talent, the same electricity, duplicating safety research — while Chinese labs photograph the blueprints through the API. That’s not a market efficiently allocating resources. That’s five Manhattan Projects running in parallel while the adversary distills the outputs.

A consolidated national program eliminates the redundancy. One frontier training run instead of five. One safety evaluation framework. One security perimeter. No commercial API surface to distill. The compute savings alone would be enormous. The talent consolidation would be transformative — Anthropic’s alignment team and DeepMind’s safety team working on the same problems with the same models instead of siloed behind competitive walls.

The question isn’t whether this happens. The question is whether it happens through orderly institutional design or through emergency action after a catastrophe. Hegseth waving the Defense Production Act at Anthropic this week is the crude version. A bipartisan National AI Authority with civilian oversight, proper governance, and built-in safeguards would be the mature version.

And Anthropic, by being the lab that’s simultaneously most capable, most safety-conscious, and most principled under pressure, has positioned itself to define what the mature version looks like. If nationalization is inevitable, you want the government to inherit Claude’s constitutional AI framework, not Grok’s vibes.

The Punchline Again, Slower

Anthropic's mission says “AI safety.” Every action — acceleration, RSP change, Hegseth confrontation, financial engineering, alignment research — serves the mission, even when individual actions look contradictory. Under maximum pressure from the Pentagon, they hold their red lines. Under maximum competitive pressure, they drop the commitments that would kill them while keeping the commitments that matter. The PBC structure, the Long-Term Benefit Trust, the Amodei siblings’ control — these are alignment mechanisms for an organization.

OpenAI is a misaligned organization. Its stated objective and its revealed behavior diverge under pressure.

Anthropic is an aligned organization. Its stated objective and its revealed behavior converge under pressure, even when that convergence looks paradoxical from the outside.

The EA critique — “they say safety but they accelerate” — misses the point entirely if you accept this framing. An aligned organization facing the current landscape would do exactly what Anthropic is doing. Acceleration isn’t a deviation from the mission. It’s what the mission requires in this environment.

The Uncomfortable Conclusion

Anthropic is pressing the accelerator because that’s what an aligned organization does when it believes:

1. The technology is inevitable
2. The commercial window is closing
3. The institutional endgame (nationalization/classification) is the correct outcome
4. Being the frontier lab when that endgame arrives determines whether the future is safe or not
5. In a slower timeline, the public acclimates, the crisis never feels urgent enough, and you get gradual regulatory capture instead of the dramatic institutional response the situation actually requires

They’re not accelerating despite caring about safety. They’re accelerating because they care about safety, in a world where the only path to adequate institutional governance runs through a crisis that demonstrates the technology is too powerful for commercial competition.

That’s either the most sophisticated alignment strategy ever executed or the most elaborate rationalization for doing what’s profitable anyway.

We’ll find out.

Friday at 5:01 PM is a start.