How much of what happened is driven by the vestiges of the DoW-Anthropic dispute, or the general animosity of the admin and various players towards Anthropic? I don’t know, and neither do you.
This seems like the core crux to me? If this is about Anthropic's models being dangerous then this is clearly a huge power grab that signals the imminent hyper-militarization of the LLM space. If this is about Anthropic being dangerous then we should expect the Trump admin to throw the natsec kitchen sink at Anthropic and leave other labs mysteriously untouched, or at least relatively untouched.
It is even crazier how many people think this is a response to Anthropic saying that their models are dangerous, rather than a response to the Anthropic models actually being dangerous, and that this is good and right that Anthropic be punished for that.
My friend it is Trump 2 and you yourself admit that it's hard to know if this isn't an extension of the DoW dispute. For what it's worth David Sacks claims otherwise and that he hopes Anthropic will "easily resolve" the issue. On the other hand his conditions for resolution are "solve jailbreaks", which implies to me that he either does not actually expect them to easily resolve the issue or is shockingly naive. I won't pretend to know what's going to happen here, besides that the Lindy criterion seems relevant: The longer this goes on, the more likely it is to continue going on. The 2nd order effect that other firms will be trying to hide their model capabilities implies to me that this might be yet another misjudgment from the Trump admin even if their purpose is to punish Anthropic, and that they'll reverse course after Twitter finishes pointing out all the reasons why this move could set off a doom spiral of further model investment no longer seeming worth it past the Mythos capability threshold.
So much talk of model sandbagging that it's wild to me when people fail to apply the same thought to the AI companies themselves. This is now an incentive for AI labs to underreport capabilities, or worse, under-test in the first place. Can't report what you don't find, and you won't find what you don't look for.
If you are a ‘middle power’ that is not America or China, and you now realize that these decisions will be made mostly without caring about you, what do you do now?
What would even help? Having your own ‘European’ model only helps if it is substantially stronger than Opus 4.8 and GPT-5.5.
We know what some of those "middle powers" are actually doing. Some of their orgs are launching "RSI on a medium-sized compute" projects and expect to have a shot at overcoming the leaders who might be getting too comfortable in their current paradigm.
The most prominent of those efforts is the new Sakana division launched a week ago: https://sakana.ai/rsi-lab/. They explicitly disclaim competing on compute (italics are mine):
As the world enters the era of artificial intelligence, Japan has a unique opportunity to reclaim its position at the frontier of global innovation. However, to achieve global leadership in AI and scientific discovery, we cannot simply stick to the conventional approach of brute-forcing monolithic models. We must leapfrog the current paradigm.
History shows us how Japan’s historical dominance in manufacturing was not achieved through abundant natural resources but by fundamentally redesigning the institution of the factory floor. Through the philosophy of continuous, compounding self-improvement, Japan created systems that achieved more with less.
This same principle applies to intelligence itself. Human cognition did not emerge from limitless resources; it was forged through the open-ended, compounding process of evolution operating under strict constraints. Similarly, building AI in Japan provides the ultimate design constraint. Rather than relying on brute-force scaling, we are driven to pursue elegance, adaptability, and autonomy.
Earlier, some strong-looking "RSI on a medium-sized compute" projects have been launched in the UK (the most notable of those is probably Recursive Superintelligence, Inc. which has Jeff Clune, the author of the "AI-generating algorithms" paradigm, and which has even convinced Peter Norvig to do research for them (he left Google in February and started to work for Recursive in March); Recursive scored Series A funding of 650 million on 4.65B valuation in May).
Unfortunately, the safety postures among those projects are very uneven and are quite likely to be insufficient.
We reviewed a demonstration of this specific technique being used to identify a small number of previously known, minor vulnerabilities. These vulnerabilities all appear relatively simple, and we have found that other publicly-available models are able to discover them as well without requiring a bypass.
To me, this seems like potentially pretty misleading communication from Anthropic and somewhat silly.
Amazon claims that they used a technique that successfully bypasses Fable's safeguards to get an response from Mythos, which IIUC Anthropic does not dispute. To the question about whether the safeguards work, it is probably immaterial whether the response is something you could have gotten from other models. The value of this demonstration is that it shows that the safeguards did not work, not that in this specific case any harm could/was done. Unless Anthropic makes the imho very implausible claim that the safeguards are developed in a way that they detect whether a particular vulnerability is previously known (or minor), this technique is evidence that an attacker totally could have used this for vulnerabilities that other models could not find.
Given the track record of this administration and the precedent of unilateral action without deliberation or public input (the prior action on SCR and the Executive Order), I have no expectation that this will be the end of the government's attempts to control AI for their own purposes.
This is the opening salvo, a trial balloon to see how much pressure they can exert and maintain control to promote their personal interests, not in service to the public good. This is the same administration that dismantled USAID, global welfare is not in their vocabulary.
The issuance of a takedown order without clearly communicated terms is a feature, not a bug. The less clear they can get away with being on exactly where the lines are, the further they can push that line.
Don't conflate "has some legitimate security concern" with "acting in the best interest of the nation/humanity." What we're seeing is the continuation of short-sighted reactionary power plays, not the reasoned leadership necessary to actually govern effectively.
This is not a reason to celebrate. It is one of the last warning shots.
It is valid to conclude, as Liron Shapira does here, ‘I want actions like this so much in general that I am happy to see the precedent set even if the implementation was both spiteful and extremely stupid.’
I do not share that opinion.
You reference the wrong Liron's tweet here. You should reference this one: https://x.com/liron/status/2065607736822264300
The one you reference is a later retweet/reply in that same thread agreeing with a much more nuanced take by Eliezer:
So please stop tweeting about how I must be celebrating this. I'm not one of the kids who immediately goes into overacted victory paroxysms about any hits on a perceived enemy. I care about the effect on where things end up a year later, and that's a little harder to know the first day, you know?
It'd be profoundly sad if this leads to only Americans being allowed to legally access frontier models. It helps safety very little, while it hurts people like me a lot.
My read is that the intent is to apply export controls, and a flimsy justification was chosen in order to create the broadest possible justification for export controls on AI software generally.
If David Sacks is telling the truth about why it happened, and it will in fact be walked back if "fixed", this is dumb for a lot of reasons.
If only the US is capable of pushing a frontier forward at the present moment, and the US just stopped the top lab, and created enough ambiguity to slow down the ones immediately behind, we are in a pause, and provided more action will happen if someone looks like they're catching up, the pause is durable.
For the people who want LLM-AI stopped out of fear of superintelligence, right at this moment, things look great, stuff has stopped, and there is a path to keeping it stopped.
Anthropic has not demonstrated that their safeguards work reliably enough in practice, and thus I am not very sympathetic to their perspective on the merits of the case (although procedurally this is obviously bad from the Trump admin). It seems good if the result of this is that the burden of proof of safety shifts towards the companies.
Yes, external testing has not found an universal jailbreak according to the model report, but it's pretty unclear that the focus on universal jailbreaks is sufficient. No external party has tested Anthropic's models (or other labs' for that matter) AND their defense in depth approach to explicitly certify that they work sufficiently well, instead it's isolated fishing for universal jailbreaks, and otherwise taking Anthropic at their word. I don't think a convincing case has been made that universal jailbreaks are necessary to cause harm with a model, in fact it seems very possible that an attacker could use a custom jailbreak that works for a given environment or/and pivot to different jailbreaks when one stops working.
No good policy gets announced shortly after 5pm eastern on a Friday.
Here we go again.
The Once And Future Fable
The United States Department of Commerce, as per a letter from Commerce Secretary Howard Lutnick, apparently in response to a narrow jailbreak identified by Amazon, has classified Fable 5 and Mythos 5 as being subject to US export controls. That explicitly means cutting off access to all ‘foreign nationals,’ even within the United States, even if they are Anthropic employees.
Given Anthropic has no means to verify citizenship at this time, that meant complete shutdown of the model, at least for the time being.
The justification for this appears to be rather flimsy, at best, and based on lack of understanding of what even is a jailbreak or how defense in depth works.
That left Anthropic with no options but to entirely withdraw it from the market, at least for the time being, since they have no way to verify who is and is not a United States citizen.
Anthropic is either lying, or the jailbreaks were harmless, not even mostly harmless.
I believe this is correct. GPT-5.5 can find the same exploits that got Fable labeled with export restrictions. So either this is arbitrary and capricious, or who is next?
I presume that those issuing this order knew what the short term result would be, but with this group you can never be sure.
Fable 5 will (almost certainly) return in AI: Endgame, Part 1. Release date unknown.
I am not taking the position that Anthropic is clearly right in the dispute about the facts, although it would be weird for them to lie about it given the truth will soon come out.
I am however taking the position that the implementation method chosen by the government, with no warning, was deeply terrible, even given our options with our current very terrible level of relevant state capacity, and reflects some combination of at least one of either malice or a deep misunderstanding by decision makers of how jailbreaks and cyber security work.
We now badly need to build relevant state capacity and a relevant legislative framework for government oversight, no matter what else we do, and educate key decision makers, so that this type of thing does not happen again.
This Action And Its Implementation Are Absurdly Stupid
If you take the action at face value, rather than as an attempt to lash out at Anthropic, there is no way to pretend this is not deeply, deeply stupid.
We are also doing this at the same time that we are actively relaxing export controls on selling chips to China, in order to ensure the ‘American tech stack.’
If this move had been executed in a sane way, and had come with a ramping up of chip export controls, I would at least understand that as a coherent position.
David Sacks Offers The Official Steelman
David Sacks seems to be continuing to speak on behalf of the Administration here, and Sriram pointed to this as well. So I presume this is the official story.
If it is true, then this could be resolved reasonably quickly. Once you cut out all the rhetoric and feigned surprise, this boils down to:
I’m very curious why Amazon’s Andy Jassy was so concerned while Anthropic wasn’t. My hunch (no private info) is that Jassy is mostly concerned in general, rather than about this particular jailbreak, and that got conflated somewhere.
It should be straightforward for Anthropic to block any particular exploit, even if the issue is minor, the issue also exists in GPT-5.5, and blocking that exploit is thus unnecessary and kind of stupid.
If indeed it is essentially harmless and this statement is in good faith, it should also be easy to sort this out, and convince the White House to lift the control.
It would be classic Trump Administration to do this to light a fire under Anthropic and force them to handle an issue Anthropic thinks is dumb, and to establish they mean business, in which case yes quickly backing down is very possible.
True.
Anthropic worked with trusted partners and the government regarding the deployment of Fable and Mythos.
It is Anthropic’s ‘responsibility to patch’ but this inherently frames any vulnerability, no matter how small, as something that must be patched. That is not how LLMs work. You cannot ever have a usable LLM with no vulnerabilities to adversarial attack. So the question is the nature and degree of severity.
Again, we assume this is Amazon. I have no private information.
One question is, did the Administration say ‘take it down, make this level of exploit impossible or we will export control you?’ Or did they simply request a fix? What exactly was the ask?
Because it involves, according to Anthropic, zero marginal increase in capability of operation of a cyber weapon. David Sacks knows this. He is free to say he disagrees about the nature of the jailbreak, but the idea that ‘any operability of a cyber weapon’ must necessarily be a ‘serious’ vulnerability implies that such a vulnerability exists in GPT-5.5, which the government has not asked be ‘fixed’ or taken down.
So what exactly is the request?
If the request is ‘ensure no one can ever use this for any operability of a cyber weapon versus not having access to an LLM’ or an assurance that we will never see another jailbreak? Then that is impossible and Sacks knows this.
Anthropic is clearly balancing safety against commercial offering. The only way to offer a fully safe model is to offer a fully useless model. David Sacks knows this and has reliably been on the other side of this argument except when it hurts Anthropic.
The substantive claim here is that the Admin did this in response to Anthropic’s intransigence. There are any number of reasons there could be a clash here.
Again, ignore the feigned bewilderment. The substantive statement is that if Anthropic remedies the specific issue, the export control would be lifted quickly. That may or may not be possible depending on whether the requested fix is sane.
I don’t see anyone at Anthropic drawing such connections, and I notice Sacks is not saying Anthropic is attempting to draw such connections. This is a very good sign, and would be regardless of whether or not there are connections.
Could Anthropic Offer A Technical Way Out?
It’s about as good a statement as we could hope to get from David Sacks.
The bottom line here is he is saying: Fix the issue. But what issue, exactly?
Many have had to deal with the pointy-haired boss on things like this. I’m thinking of a particular name from a past job right now (RIP, sir) and so are most of you.
If that’s all this is and the demand is well-specified and contained, then yeah, ‘fix’ it, quickly, even if it’s expensive and dumb, and work on the better fix, or convincing the admin that this was a silly concern, or both, later.
Even if, instead of 95% of cases, we end up with 90% of cases for a bit, that’s way way better than 0% of cases. I miss my Fable.
The Problem
There is one big potential problem.
Is Anthropic being told ‘fix this particular jailbreak?’ If so, easy, done by Monday morning, and then we see if the government still wants time to harden its security.
However, if Anthropic is being told ‘fix all jailbreaks at this level and assure us there will never be another one’ then that is impossible, especially if the level is ‘things GPT-5.5 can already do without much effort.’
Those giving that order may or may not understand what they are asking for. They also might understand exactly what they are doing. We cannot know.
Even in the best case, does this all leave the government way too much room to be arbitrary and capricious, and make companies worried about what is coming next, in the exact ways Sacks warns about whenever the company in question is not Anthropic? Yes, but that is life in 2026. There was never going to be a way fully around that, and again Sacks is being as reasonable as one could hope here if this is sincere (we shall see).
The Other Way Out
Axios claims that this pause could be on the order of a few weeks, in order to lock down the national security apparatus, after which the restriction would be lifted.
Axios says the government tried to ‘pause the release’ of Fable, which could mean the release on Tuesday or it could as per Sacks mean suspending the release after Amazon’s finding pending a fix. I presume it probably means what Sacks said.
UK AISI
One thing that might be missing in all this is that the most important jailbreak so far came from UK AISI, who did demonstrate a substantial jailbreak as per Fable’s model card, and who said they were making progress towards a universal jailbreak.
Is it possible UK AISI is actually the ‘trusted partner’ here, or that this is otherwise playing a larger role as a background fact?
My presumption is that UK AISI did not actually make new progress on this, and that they were not directly involved, because Anthropic would have been informed of this, and they would be reacting very differently to a universal jailbreak. They’ve reliably treated ‘universal’ breaks as a vastly different category. But it is worth flagging.
Warning Shots Fired
If you are a ‘middle power’ that is not America or China, and you now realize that these decisions will be made mostly without caring about you, what do you do now?
What would even help? Having your own ‘European’ model only helps if it is substantially stronger than Opus 4.8 and GPT-5.5.
Anton thinks this is a bad thing, but that is a good version, because it is a bad version.
The problem is, what can Europe do about it? The realistic answer is not much, other than negotiate for what access it can get, especially for vital security interests, try to establish leverage and integration and goodwill, and hope for the best.
Well Did You Lead Him On? What Were You Wearing?
The correct thing to say when someone does something in a crazy way, is ‘that was crazy, stop, walk that back, and if necessary maybe let’s figure out a better way.’
It is crazy how some types will think that, because Anthropic supports the general idea of some regulations on AI development, that they deserve whatever they get, and that you should cheer on any such action, however bone-headed.
It is even crazier how many people think this is a response to Anthropic saying that their models are dangerous, rather than a response to the Anthropic models actually being dangerous, and that this is good and right that Anthropic be punished for that.
Some even say ‘Anthropic should be happy about this’ and no this is not a strawman, another example I saw is Kevin Fisher.
There are tons of ‘he asked for this’ comments everywhere. Tap the topic sign.
Sometimes those people will either say that Anthropic was doing this out of some sort of marketing hype and it is all lies, or that Anthropic had a responsibility to lie about it or at least keep their f***ing mouths shut, or that they are pursuing some sort of 4D-chess regulatory capture business strategy. Serves them right, either way, say such folks. In case it wasn’t obvious, no, none of that is true.
Then there are those who think that if you call for [X] at all, then you must support any and all implementations of [X], and cannot ever go ‘BUT NOT LIKE THAT.’
This is like saying that you complained about the house being too cold, so you have no right to complain when someone burned down your house.
Then some are just profoundly hypocritical and nihilistic.
I am done with treating the nihilist position as if it is not exactly what it is.
Some People Have Principles
I want to shout out to Adam Thierer. We agree on many non-AI things, and because we have different views of AI as a technology we strongly disagree on AI policy.
What I appreciate about Adam, and some others like him, is that even against local interest he stands up for his principles, and rejects nihilism, and correctly judges the action on its merits, and sees this correctly as part of a scary pattern of behavior towards secretive authoritarian control over American AI that will damage our ability to move forward.
You can see in the comments how many people instead embrace nihilism.
Cause You’re Living In (At Least) One
We are definitely in a science fiction story. The question is which ones.
What Happens Now?
It is a regular thing for the Executive Branch of the United States Government, these days, to issue declarations of policy that are, to use the technical term, absolutely bonkers and stunningly destructive with no reasonable way to implement them, often without stopping to realize what they are doing.
It is also a regular thing for them to then quietly walk those policies largely or entirely back, once the consequences become clear, leaving only relatively minor total devastation in their wake.
Alas, it is also a regular thing for them to leave at least a substantial portion of the new stupid and destructive policy in place indefinitely, and sometimes we keep all of it, or they even keep going further.
Or Anthropic could give the White House what it want, no matter who is right about whether doing so makes any sense.
We are not short on examples of any of this.
One thing that must now be considered is that many employees of OpenAI, Google and Anthropic, and other AI labs, are not United States persons.
If we drive all foreign talent out of our AI labs, and otherwise actually go down the current road, that is one of the few things that could put China and other competitors back in the game in earnest, both slowing us down and speeding them up.
At Anthropic, Amanda Askell and Andrej Karpathy are examples of employees who suddenly are unable to work with Claude Mythos 5, even after Anthropic sorts out a new access control system.
We should worry about many things coming next. This might accelerate even beyond citizenship, and start requiring security clearances, further locking out key talent. The government might seize or forcibly purchase equity shares in the labs or otherwise move into direct supervision of them.
OpenAI and Google are some of the few who can potentially make a difference to what happens next.
Yes, there are versions of this kind of coordination, both with the government and across the labs, that could be net good for outcomes, but given actions so far that seems highly unlikely to be the case.
Another fun dynamic is giving labs a reason to downplay or hide the capabilities of their models, to avoid such restrictions. Can’t tell if joking:
Tyler Cowen offers thoughts, pointing out that sometimes it is hard to solve for the equilibrium. Indeed, I would say that in many future such situations we should worry that perhaps no equilibrium exists, or no equilibrium that does not involve sacrifice of sacred values. His list of government considerations is good given the quick turnaround, and he is right that ‘no jailbreaks’ is not a standard one can meet.
Oh How The Vibe Vibers Have Vibed
Another vibe shift, I presume?
Yes, the Department of War is still Department of Waring:
How much of what happened is driven by the vestiges of the DoW-Anthropic dispute, or the general animosity of the admin and various players towards Anthropic? I don’t know, and neither do you.
There is definitely still a contingent of people who are determined to be anti-Anthropic, some of whom are still Big Mad over only getting 95% access to Fable, and some of whom have pre-existing hatred for various other reasons, and who have the vibe of trying to steer the vibes by aggressively negatively vibing.
The vibes will presumably shift again, many times, in rapid succession.
We Now Know We Can Sometimes Do Things At Least?
It is valid to conclude, as Liron Shapira does here, ‘I want actions like this so much in general that I am happy to see the precedent set even if the implementation was both spiteful and extremely stupid.’
I do not share that opinion.
One should assume, by default, that bad things are quite bad, and the US government going in and wrecking things and potentially taking a large step towards de facto taking over the labs and ordering them around without even any understanding of the good reasons to do that, on the level of ‘not know what a jailbreak is,’ is a bad thing.
I can see this ultimately ending up being a good thing, and there are good aspects to be sure, but the default is that a bad thing ends up being a bad thing. If in the end you hit a bank shot, great. Don’t count on it. Most chess is played in at most two dimensions, and at most you get three.
Thus I mostly agree with Eliezer Yudkowsky here, although I have a number of other considerations in the bad column as well.
This is in many ways the opposite of trying to wisely coordinate for safety. This is cutting everyone else loose and preparing to race while shooting yourself in the foot.
It certainly is a negative update about the competence of the current government, and yes, yes, I know.
Another chess move is that this further antagnozies Anthropic from the government, no matter the motivation behind the action.
We have no idea how this will play out. They could walk this back. They could double down and really mean it. They could learn from their mistake and do something smarter. Indeed do many things come to pass.
I don’t expect it to go well. Still: reserve judgment. Too soon to tell.
The Lighter Side
Time to remake this ad, as suggested by Divyansh Kaushik?