What if Anthropic unilaterally paused capabilities development right now?

Karl von Wendt

In their new post on recursive self-improvement, Anthropic argues that a pause in frontier AI development is needed, but unfortunately, they can't pause on their own, because of less cautious actors:

We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology.
...
A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped.
...
None of this is necessarily impossible in principle—the world has built verification regimes for other complex technologies (e.g., the Intermediate-Range Nuclear Forces Treaty)—but those regimes took decades to build both the infrastructure and the trust. We don’t have that long. A unilateral pause by one lab, by contrast, is achievable immediately, but accomplishes much less: it would change who the front-runner is, but it would not create the wider deliberative process that is currently missing.

As many have pointed out, this reads a lot like lip service. But it sounds plausible: Anthropic seems to be the most safety-concerned lab right now, so the future would look worse if they weren't in the lead anymore because they paused unilaterally and a less cautious actor overtook them, right?

I think this is fundamentally wrong, because it ignores many of the actual or possible effects of a unilateral pause.

Mythos seems to have been a wake-up call for many, especially in governments around the world. For example, in response to Mythos, the president of the German Bundesamt für Sicherheit in der Informationstechnik, Claudia Plattner, called for a German AI Safety Institute - something I have always thought was necessary, but wouldn't have deemed very likely before.

It probably weren't the hacking capabilities of the new model alone that caused such a stir, but rather the fact that Anthropic chose to not publish it immediately and instead launched Project Glasswing. This could be seen as a clever PR stunt in the wake of the planned IPO, but I believe it was the correct thing to do and was mainly driven by real concerns. The decision to not publish a new model, thereby possibly giving up some revenue and market share, was a very strong signal that caused a lot of discussions and change in the political landscape.

Now assume that Anthropic would unilaterally declare that they pause capabilities development, say, for three months, and instead put every resource they have into advancing AI safety for that time. They even offer options for outsiders to verify this. They publish a statement declaring that there is a significant risk now of accidentally creating an uncontrollable AI and they ask the other labs to pause development as well and join forces to improve AI safety techniques.

This is of course a highly speculative scenario, but I think this would put enormous pressure on OpenAI and Google-Deepmind to follow their lead. After all, both Sam Altman and Demis Hassabis have said things like "if the others stop, we would stop too" in the past. It would be another wake-up call for politicians, making it very clear that the AI race is a real threat to humanity and regulation is urgently needed.

Other labs, like Meta, X.AI and the Chinese, might be less inclined to follow suit. But I think the danger of them catching up signficantly in such a short time is low. The Chinese government has indicated in the past that they are willing to regulate AI development, so this could even open a window of opportunity for starting serious talks about global regulation.

Would this move hurt Anthropic's IPO plans? Maybe, but I'm not sure. In the past, whenever they did something that seemed to hurt their revenue, like resisting the push by the Secretary of War to accept "any legal use" for Claude, it seems to have helped them more than hurt them. Anthropic is now seen as the "adult in the room", the most trustworthy and the most valuable AI lab. A unilateral pause may convince at least some investors that they are to be taken serious even more.

Of course, given that they acknowledge

How the alignment problem gets solved—or not—in this future is something we are least certain about.

from a moral standpoint, a unilateral pause would be the only correct move in my opinion.

It would have been relatively easy to tell a story ex-ante about how the Pentagon dispute wouldn't hurt their revenue.

Doesn't seem as easy to tell a story about how pausing all work related to model improvements would have the same effect.

I think announcing and following through with a unilateral pause is a one-time-only irreversible move with very low chance of cascading into a global pause

I agree there's a lot of uncertainty here. But what are the alternatives? Do we really believe that Anthropic will solve alignment in time, while nobody else (or at least not whoever surpasses them) will be able to? This seems highly implausible to me. If alignment is just a matter of having a smart-enough AI working on it (which I doubt very much), then whoever will become the new leader will solve it that way (however "bad" they are, they have no interest in losing control over their AI either). If it is really hard to solve, then it doesn't matter whether Anthropic or someone else pushes us over the cliff.

If they really believe what they say in that post, they should pause and at least try to save our future.

Accepting for the sake of argument that the only hope we have is a global pause - the question that matters is:

"Are we more or less likely to see an effective globally coordinated pause if Anthropic decides to unilaterally stop improving models tomorrow."

This is a complicated, messy question. My impression is the answer is "less likely".

The alternative to a pause today is to continue gaining market share, and (hopefully) leverage that in order to deepen relationships with other relevant companies and governments, continue advocating publically from a position of undeniable credibility, continue collecting the best talent in one place etc.

If Anthropic had pulled the brakes 6 months ago, I don't think we'd feel any safer today

*P.S. I don't have any insider knowledge at all I'm merely speculating at other's intentions based on publically available info

It seems to me that a unilateral pause would have a higher chance of working later, when more researchers at more labs are concerned about alignment, rather than now.

It seems important that they paused the Mythos release based on specific and easily understood concerns, and it seems to be holding up that they were right. I'm not sure a pause based on vague concerns followed by restarting after not making much progress would send the same signal.

Pausing a release is much less important than pausing development.

I agree that Mythos is a much clearer case, but the stakes were much lower. I mentioned it to show that AI labs can send strong signals and change a lot in politics, so "we can't do anything, because if we don't do it, someone else will, we won't be at the forefront any longer and nothing else changes" is clearly wrong.

To me, there's nothing "vague" about losing control of an advanced AI. We don't know what the specifics are, how and when this is going to happen. But if we knew, it would be too late. We do know that as long as we haven't solved alignment, it is suicidal to build an uncontrollable AI, and this is where Anthropic is heading.

Yes, it isn't clear at all that such a bold move as a unilateral pause would achieve anything. But that is no excuse not to try it if we don't have any better idea how to prevent a catastrophe and all of humanity is at stake.

As a father of three, I find it really hard to prevent myself from getting much more emotional in my writing.

My disagreement here isn't about whether AI labs should stop developing AGI. I just don't think pausing would send the same signal to outsiders without the clear (partially solveable) situation they had with Mythos.

>The decision to not publish a new model, thereby possibly giving up some revenue and market share, was a very strong signal

It was a strong signal of Anthropic's beliefs, but if that were the only signal, politicians would be mostly dismissing it as Anthropic's arrogance or hype.

The DoW flip-flop was a much more unusual and hard-to-dismiss signal. There have also been unusual signals from financial institutions with really large security budgets telling the Treasury Dept that Mythos is a big deal.

“Anthropic seems to be the most safety-concerned lab right now”

You think so?

To me, at their scale and level of backing.. it’s easy to lean towards skepticism about their concern, and as surface level as it may sound (LTBT or not..).. it might be a good idea not to fully rule out that this ‘may’ be a stunt for their goal towards that trillion dollar valuation in a few months.

It's hard to judge Anthropic's calls as good as bad, but I do think that I'm getting kind of annoyed with their constant paternalistic posture over AI. Everything they've said about Mythos and their internal models mostly decomposes to: Mythos-class models are too dangerous to trust the public with, but we're fully trusting an arbitrary set of companies, who happen to be among our biggest potential customers, with that same model.

According to Ryan Shea, in March 2026 xAI was 3 or less months behind, not 7 months behind as Mollick and Wilderford imply (alas, @Zvi decided to quote them instead of Ryan!) While we don't know anything about Grok 5's release date, or potential plans to release Grok 4.4, I suspect that Grok 4.3 wasn't^[1] a major advancement over 4.20.
xAI is far less willing to cooperate than the Big Three. For example, xAI didn't even condemn Chinese efforts to distill American models and didn't even participate^[2] in METR's most recent evaluation of whether a model can start rogue internal deployment.

Therefore, the only thing that Anthropic might do is to ensure that xAI doesn't deploy its newer models even internally unless a thorough testing is done.

^{^}
How do we ensure that all models are evaluated much more thoroughly than they are now? For example, the Groks after Grok 4 stayed unevaluated by METR. Grok 4.3 is entirely unevaluated by EpochAI. Therefore, we have to rely on aggregations like Artificial Analysis or AI IQ.
^{^}
Meta, on the other hand, did participate.

I don't know enough about X.AI, but it seems to me that if they really were that close, they wouldn't sell a large part of their compute to Anthropic. And Elon Musk has at least said often enough that he thinks superintelligence is very dangerous, so maybe he would cooperate if someone got really serious about pausing.

As for the METR evaluation, I think it is possible that X.AI did participate and chose to leave after the internal eval, but before the results were published, in which case their participation would have been kept confidential.

xAI have been hemorrhaging senior talent (particularly founders in the pretraining area, they still have post-training people). So either they were further behind than that, or Elon Musk is hard to work for, or both.

It's also notable they've been leasing compute to competitors, which is unusual if they have a good use for it themselves

They all are, aren’t they? At this point it feels like they hire someone that can steer the ship into the least headwind, then fire them. It’s a scary race ultimately factoring in the entire economy, wars will be fought with these corporations.. however good-willed employees they may have. The big picture looks quite bleak considering the effort is focused on a competitive race rather, because by now they all know you can’t patch things up or govern anything thats outpacing your safeguard mechanisms.. particularly almost all of them don’t know how they’re systems neural networks communicate. I know this sounds doomey but, I really hope I’m wrong.