It would have been relatively easy to tell a story ex-ante about how the Pentagon dispute wouldn't hurt their revenue.
Doesn't seem as easy to tell a story about how pausing all work related to model improvements would have the same effect.
I think announcing and following through with a unilateral pause is a one-time-only irreversible move with very low chance of cascading into a global pause
I agree there's a lot of uncertainty here. But what are the alternatives? Do we really believe that Anthropic will solve alignment in time, while nobody else (or at least not whoever surpasses them) will be able to? This seems highly implausible to me. If alignment is just a matter of having a smart-enough AI working on it (which I doubt very much), then whoever will become the new leader will solve it that way (however "bad" they are, they have no interest in losing control over their AI either). If it is really hard to solve, then it doesn't matter whether Anthropic or someone else pushes us over the cliff.
If they really believe what they say in that post, they should pause and at least try to save our future.
Therefore, the only thing that Anthropic might do is to ensure that xAI doesn't deploy its newer models even internally unless a thorough testing is done.
How do we ensure that all models are evaluated much more thoroughly than they are now? For example, the Groks after Grok 4 stayed unevaluated by METR. Grok 4.3 is entirely unevaluated by EpochAI. Therefore, we have to rely on aggregations like Artificial Analysis or AI IQ.
Meta, on the other hand, did participate.
I don't know enough about X.AI, but it seems to me that if they really were that close, they wouldn't sell a large part of their compute to Anthropic. And Elon Musk has at least said often enough that he thinks superintelligence is very dangerous, so maybe he would cooperate if someone got really serious about pausing.
As for the METR evaluation, I think it is possible that X.AI did participate and chose to leave after the internal eval, but before the results were published, in which case their participation would have been kept confidential.
In their new post on recursive self-improvement, Anthropic argues that a pause in frontier AI development is needed, but unfortunately, they can't pause on their own, because of less cautious actors:
As many have pointed out, this reads a lot like lip service. But it sounds plausible: Anthropic seems to be the most safety-concerned lab right now, so the future would look worse if they weren't in the lead anymore because they paused unilaterally and a less cautious actor overtook them, right?
I think this is fundamentally wrong, because it ignores many of the actual or possible effects of a unilateral pause.
Mythos seems to have been a wake-up call for many, especially in governments around the world. For example, in response to Mythos, the president of the German Bundesamt für Sicherheit in der Informationstechnik, Claudia Plattner, called for a German AI Safety Institute - something I have always thought was necessary, but wouldn't have deemed very likely before.
It probably weren't the hacking capabilities of the new model alone that caused such a stir, but rather the fact that Anthropic chose to not publish it immediately and instead launched Project Glasswing. This could be seen as a clever PR stunt in the wake of the planned IPO, but I believe it was the correct thing to do and was mainly driven by real concerns. The decision to not publish a new model, thereby possibly giving up some revenue and market share, was a very strong signal that caused a lot of discussions and change in the political landscape.
Now assume that Anthropic would unilaterally declare that they pause capabilities development, say, for three months, and instead put every resource they have into advancing AI safety for that time. They even offer options for outsiders to verify this. They publish a statement declaring that there is a significant risk now of accidentally creating an uncontrollable AI and they ask the other labs to pause development as well and join forces to improve AI safety techniques.
This is of course a highly speculative scenario, but I think this would put enormous pressure on OpenAI and Google-Deepmind to follow their lead. After all, both Sam Altman and Demis Hassabis have said things like "if the others stop, we would stop too" in the past. It would be another wake-up call for politicians, making it very clear that the AI race is a real threat to humanity and regulation is urgently needed.
Other labs, like Meta, X.AI and the Chinese, might be less inclined to follow suit. But I think the danger of them catching up signficantly in such a short time is low. The Chinese government has indicated in the past that they are willing to regulate AI development, so this could even open a window of opportunity for starting serious talks about global regulation.
Would this move hurt Anthropic's IPO plans? Maybe, but I'm not sure. In the past, whenever they did something that seemed to hurt their revenue, like resisting the push by the Secretary of War to accept "any legal use" for Claude, it seems to have helped them more than hurt them. Anthropic is now seen as the "adult in the room", the most trustworthy and the most valuable AI lab. A unilateral pause may convince at least some investors that they are to be taken serious even more.
Of course, given that they acknowledge
from a moral standpoint, a unilateral pause would be the only correct move in my opinion.