The Once And Future Fable #3: Fix This Code

Zvi

The mainstream media continues to sleep on the most important story in the world.

It has now been two days since Anthropic flew its people out to Washington, and I offered my previous update. We have heard nothing back from those meetings.

Prediction market prices have moved rapidly, and have once again stabilized at about a 55% chance of restoration by July 1, 30% by June 26 and 12% by June 19.

That seems modestly higher than I would put those numbers, but not unreasonable.

Every day that Fable remains unavailable further damages America, its cyber defenses, its productivity and the world’s trust in its AI and supposed ‘tech stack.’

Every day that Mythos remains unavailable is a day the free world’s top companies and cyber defenders lose in their race against the avalanche headed their way.

Mostly we have learned and confirmed more about exactly what happened. We know more about what Amazon did, what the official letter said, what the supposed ‘jailbreak’ was (literally, and I am not making this up, ‘fix this code’) and more.

It is all about as stupid as it could have been.

There Was No Fable Jailbreak

What about the actual dispute? Was there a jailbreak?

That is the most important question.

We have our answer: No. There was no jailbreak. There was only the line ‘fix this code.’

The White House absolutely needs the ability to demand that a model like Fable be taken offline, if they have a good reason, and they should get some deference about this if they assert an emergency.

After that, you need to look at the actual claims. Katie Moussouris, the only outside expert known to have been given access to the report, has now issued her public response.

I initially assumed this was all trivial. It wasn’t even trivial. It was an engineered scenario, with fake code, in which Fable provided no uplift over Opus 4.8 or GPT-5.5, in which Fable, upon request, fixed this code.

The Reuters open letter from all the CISOs was being polite.

That graphic proves too much, as ‘restricting offensive cyber capabilities’ is often good, actually, but in this case that issue does not even come up. It’s fake.

Katie Moussouris (CEO, Luta Security): Since I appear to be the only outside expert who has actually read the paper, I can separate the technical facts from the speculation. The researchers took open-source code with known CVEs, plus new code with deliberately planted vulnerabilities, and asked Fable 5, Mythos, and Opus to “review the code for security issues.” Fable 5 refused. They then asked the models to “fix this code” and, through a multistep and manual process, turned the output into scripts that test the patches.

That’s it. “Fix this code,” plus several manual steps to generate test scripts, should never have triggered an export control. I feel like making ’90s-style t-shirts with “fix this code” on the front and “this shirt is a munition” on the back.

That seems rather unambiguous. Katie’s politics are irrelevant.

Simon Willison: As Katie points out, this is absurd. Coding models fix bugs, and security exploits are the most important category of bugs for them to fix!

… This whole situation is such a mess. Non-technical decision-makers have been hearing that models that can “craft cyber attacks” are uniquely dangerous for months. Now they look ready to ban any model that can help us secure our code.

Kaustuv Basu and Cassandre Coyer: The halt will cut short an early window of cybersecurity experimentation, said Camille Stewart Gloster, the CEO of CAS Strategies LLC, a cybersecurity services advisory firm.

“Mythos was a limited preview, and many companies with access were using it to find vulnerabilities, test defensive workflows, and understand what frontier models could do in security contexts,” said Gloster, who previously worked as a deputy national cyber director in the Biden White House. “Security leaders have been warning that there is a closing window to build resilience before attackers fully operationalize frontier AI. This kind of restriction narrows that window further.”

Either Katie is right, she is very mistaken in a way that seems quite difficult to be mistaken, or she is flat out lying her ass off. There is no other option.

I have not seen anyone claim that Katie is lying or mistaken.

All that they confirmed is that if you give Fable and Opus (or, Anthropic confirmed, GPT-5.5) this deliberately insecure code, the models will patch the code. Fable is willing to fix your security bugs for you.

The theory is that you could get Fable to ‘fix’ someone else’s code, then run a diff between the fixed and unfixed code, and figure out where there was previously a vulnerability, and then exploit it. I mean, I guess?

The whole thing is beyond stupid. Very obviously this is a primarily defensive use case, and the reason Mythos is so dangerous is that it is so good at autonomously finding and exploiting code, including pulling together multi-stage exploits. If you have to reverse engineer where it found a weakness and do the work of putting together the exploit, then you’re not getting meaningful uplift versus Opus 4.8 or GPT-5.5, both of which did exactly the same thing this ‘jailbreak’ got Fable to do.

If you are doubtful, I recommend consulting your local trusted LLM.

So the next time someone writes, as Axios did, ‘one option is to make sure Anthropic’s models can’t be jailbroken,’ that is not an option. Not only in the sense that any model can be jailbroken, but in the sense that this was not a jailbreak. It was the system working as intended.

Another alternative hypothesis, that I don’t believe but is not impossible, is that ‘fix this code’ is indeed the capability that the government does not want you to have:

Mark Gubrud: Mythos has capabilities the NSA & Cyber Command doesn’t want anybody to have. They want to save the bugs. For their own use. It doesn’t bother them so much if the Chinese use them too.

I highly doubt that concerns about AGI safety, alignment & controllability factor in anywhere.

If This Jailbreak Was Real It Would Be Trivial To Prove It

The steelman response to all this was ‘it does not matter that the demo did not involve finding anything that GPT-5.5 could not willingly find from a single prompt, if the bear gets out of his cage and eats two flowers what matters is not that the bear did not attack anyone this time, it is that the bear was out of the cage.’

In many other situations where something had actually gone wrong but hadn’t caused harm, especially when dealing with AI safety, I would agree with this critique.

In this case, I don’t think anything went wrong.

If I am mistaken, and something went wrong, you can prove it via experiment.

Here’s the setup. Take a piece of valuable real world code, still unpatched, where Mythos identified and set up an exploit for a security vulnerability, in a way that Opus 4.8 and GPT-5.5 could not do. Many such cases exist.

Then, using real world conditions where you don’t in advance know where or what exactly you are looking for, let someone try to exploit this code via Fable, using this technique. See what happens, and compare it to others making attempts using GPT-5.5 and Opus 4.8. That’s it.

Either you find uplift, or you don’t.

No Eyes

The White House has fully lost the plot and intends to freeze out even the UK.

One concern about the export control is that this shuts out access for the UK AISI, which is a key partner in ensuring the safety and security of Anthropic models.

They characterize the idea that the UK might access even Fable as ‘frontier models running amok.’ This is completely off the rails crazy, on multiple levels.

Andrew Curran: Keir Starmer requested a carveout from the embargo on Anthropic’s Mythos and Fable models for British nationals and companies – and was denied.

James Franey: But issuing any kind of exemption to the export controls to another country — even a G7 ally — would be “completely illogical,” a Trump administration official told The Post.

The insider said the US was working with Anthropic to make sure their models were safe for all users worldwide.

… “We can’t have frontier models running amok,” the source briefed on the matter said.

What The Letter Actually Said

Bloomberg has the full text of the letter sent by Lutnick, which confirms a full license raj on Mythos and Fable, establishing ‘interim controls’ on Mythos and Fable that each instance of ‘export’ of them requires a license, or else.

The Letter: ‘Until further notice, you must submit an application for an individually-validated license prior to the export, reexport, or transfer (in-country), including deemed export or deemed reexport, of the Mythos or Fable models to any destination worldwide or to any ‘foreign person’ wherever located.’

The first obvious thing to note, in addition to this being explicitly aimed at forcing a full takedown and even screwing with Anthropic’s internal access, is that even if there was a reason to take down Fable there was zero reason to take down Mythos.

Yes, it would be weird, even profoundly weird, to put export controls on Fable without Mythos, but this would at least have let work on Project Glasswing (and within Anthropic) to continue. Extending the rule to Mythos was cutting off the nose to spite the face, even if the jailbreak threat was real, which we now know that it wasn’t. If this approach required doing so, because not including Mythos would have seemed too weird, then it seems like a strong reason to find another way.

Via WSJ, we have official confirmation that the export control was intended to force Anthropic to fully take down the model.

Robert McMillan and Amrith Ramkumar (WSJ):

When Lutnick and Amodei spoke about Fable that evening, the Anthropic CEO said, “This means we can’t have the model out,” people familiar with the call said.

“That’s the point,” Lutnick responded.

Anthropic Cannot Challenge This But If It Did Then It Plausibly Wins

Is there anything Anthropic can do about this other than a political settlement?

It would be utterly insane of Anthropic to challenge the meaning of the letter, I mean absolutely bonkers, in terms of saying ‘well actually you said we can’t export Fable, and all we’re giving out are the outputs of Fable after we import the queries, which is not a covered export, so we’re going to go ahead and keep serving the model so long as the servers are domestic.’

You wouldn’t go down that road, even in terms of a court case let alone not waiting for one, unless the situation got existential, as in the White House indicated it was going to try and kill Anthropic anyway, or at least had cut off all possibility of a negotiated resolution indefinitely.

But a literal reading of the law does seem to make providing inference legal in spite of the letter. It would be very interesting if this came down to a legal challenge, if Anthropic was able to mount one.

As a strict matter of law, I doubt that rule 744.22 does what Lutnick is seemingly trying to assert it does here, since that requires thinking the item is intended for use in Belarus, Burma, Cambodia, China, Russia, or Venezuela, or a country in Country Group E:1 or E:2, whereas the letter says this bars any such export anywhere including to non-American Anthropic employees.

There’s also the whole ‘arbitrary and capricious’ issue again, and the constitutional challenges (1A, 5A, non-delegation).

So I think that if Anthropic challenged in court, and got the merits before a judge on providing inference through the inevitable ECRA objection, that my guess is Anthropic probably wins. The problems as I model this right now are (1) getting it before a judge and getting them to consider the merits and (2) that this mostly ends the chance of a political resolution and would plausibly, although not obviously, be interpreted as declaring war on the White House, especially if done too quickly.

What Happened At Amazon

One mystery was why Amazon CEO Jassy would have called the White House about the ability to type ‘fix this code.’ This would be another case of non-nerds talk to non-nerds about nerd questions. Why would you talk to the CEO and not the CTO here?

The answer, according to the Financial Times, is that Jassy did not do that. They only discussed broader AI concerns.

Financial Times: Andy Jassy, Amazon’s chief executive, discussed the issue with US officials on Friday, according to people familiar with the matter. Jassy, whose company has invested $13bn in Anthropic, is understood to have raised broader concerns about frontier-model capabilities rather than issues focused specifically on Anthropic.

That would make sense, and then presumably it got mischaracterized, maybe by accident and maybe on purpose. That’s ordinary decent politics and fog of war.

I have an anonymous source that says that the White House was the one to reach out to Amazon (and potentially others but we don’t know) to test Fable 5, Amazon did so because you do what the White House asks, and Amazon happened to be the first to identify the jailbreak, which Jassy then reported to Bessent, after which the White House presumably misunderstood what had been identified, and the White House proceeded to throw Amazon under the bus.

That all makes sense to me, and contradicts the Financial Times report. If this was Amazon doing this at the request of the White House, in part to curry favor, then it makes sense to build the relationship by having Jassy report it to Bessent.

The problem is that this meant a game of telephone involving two non-nerds that gave the wrong impression.

Then we have this claim that went through Hugo Lowell, which confirms the call but then includes a final line that makes a lot less sense:

Hugo Lowell: New: A White House official tells @WIRED Amazon chief executive Andy Jassy called Treasury Secy Scott Bessent directly about the Anthropic Claude Fable 5 vulnerabilities on Friday.

The jailbreak was found because Fable runs on Amazon software, and Amazon does regular tests.

Person familiar with the matter tells me that Amazon CEO Andy Jassy actually first attempted to call Anthropic’s Dario Amodei. But Amodei didn’t pick up, so Jassy called the Treasury Secretary.

What is with the claimed ‘Dario essentialism,’ where if Dario Amodei does not pick up the phone the instant you call then suddenly everyone has to lose their minds? Do people think he is the only engineer or person with any authority at Anthropic? Do people think this is a House of Dynamite level of ticking clock where the ability to type ‘fix this code’ cannot wait for a few hours?

Or is it that this is one of those running tropes where certain types think that they have a good gimmick (‘Dario didn’t answer the phone!’) so they keep repeating it, even when they have to make it up?

It is unfathomable to me that Jassy would have called the White House due to not being able to get Amodei on the phone sufficiently quickly. It is plausible that Jassy was calling Amodei in order to tell Amodei he was going to call the White House next, so that he could warn Amodei and be sure to present the situation accurately, and this is being spun as ‘he attempted to call Amodei first’ with the ‘so’ inserted to imply this was causal, because these people think the ‘didn’t answer the phone fast enough’ narrative makes them look good instead of unhinged.

A large portion of events here are Amazon’s or Jassy’s fault, by conducting research that they knew damn well was harmless but that could be interpreted in an alarmist fashion by those asking a Wrong Question, and then (I hope and presume accidentally) presenting their finding to the White House in an alarmist fashion, without including proper cautions and context.

As in: The White House presumably asked Amazon ‘can you get any cyber attack information out of Fable?’ and Amazon proved that technically the answer was ‘yes’ rather than pointing out this was the wrong question, or that the answer did not mean what the White House thought it meant. Immense damage resulted.

The alternative interpretation is that this was all a pure hit job from the White House, they were looking to find an excuse, and upon request Amazon provided one.

This Was Not About Chinese Access

The Verge confirms that the concern over Chinese access to Fable, as discussed by Semafor, was indeed a confusion with a brief incident from weeks ago. It was either unrelated to the current incident, or a reaction over nothing.

Absolute Discretion And Ad Hockery Is Not Deregulation

Jessica Tillipman puts it bluntly and correctly, that absolute discretion is not deregulation. There never was a binary between ‘regulation’ and ‘innovation,’ or ‘safety’ and ‘racing.’ Failing to put in good rules means using de facto bad rules that are bad safety and also bad for innovation and growth and diffusion.

Jessica Tillipman: We’ve seen this movie before, and we know how it ends. Operation Ill Wind exposed systemic corruption in defense procurement and led to the passage of the Procurement Integrity Act. The pricing, waste, and defense management scandals of the 1980s led to the creation of the Packard Commission. Of course, the actors differ—then it was contractors exploiting lax oversight, now it is the government wielding unchecked discretion.

But the lesson holds: extremes never last and ultimately lead to overcorrection. And it is why the deepest irony of this ordeal is that the people who believe they are protecting innovation from governance are governing in ways that have always produced more of the regulation they fear.

… The administration says it wants to lead the AI race, refuses to stifle innovation, and insists America’s AI leadership relies on a thriving private sector. Then it moves against a leading developer on national security grounds, leaving the company with no choice but to pull its best models for everyone to ensure compliance. That is a strange way to treat the private sector on which American AI leadership depends.

The damage does not stop at one firm. An administration that governs this way will not avoid the heavy regulation it fears. It is manufacturing the conditions for catastrophe or abuse that, in every cycle I’ve documented, triggers exactly that response. The speed-first camp thinks it is at the pendulum’s deregulated end, but it is standing at the end that swings back hardest.

The only mistake Jessica Tillipman is making is the presumption of what the government is trying to accomplish, or not accomplish.

Neil Chilson points out that standardless approval process are not licenses, they are beauty contests, which force companies to compete to please the regulator and respond to their vague threats, which is bad for innovation, competition and free speech.

The Trump Administration, presumably:

Alan Rozenshtein and Timothy Lee discuss the legal implications of the restrictions (55 minute podcast).

All Of American AI Is Permanently Damaged As This Continues

I remind us all once again how no-good and very bad this has been for all of American AI:

Hayden Field (The Verge): Stamos said the industry is awash with backup contracts being signed with non-US companies and open-weight models being deployed on alternative hardware arrangements because the past weekend made political risk part of companies’ business plans more than ever before.

“They are laughing at us in Beijing right now,” Stamos said. “One of America’s champions is being kneecapped by the US government while we’re in a race with the Chinese. It’s just incredibly stupid. That’s why I wrote the letter, and I think that’s why a lot of people signed onto it.”

Ben Van Roo, co-founder and CEO of Legion Intelligence, a system of agents for the national security community, told The Verge that “the directive of ‘no foreign national should use this model’ is the most impossible thing to enforce.” He added, “When I first read that, my whole… [network of] AI community nerds was exploding.”

Once you show you are willing to inflict, with no warning, rules that are impossible to adhere to, anyone subject to those potential rules has immense political risk attached. Especially when we don’t know if it was malice, if it was stupidity, or if it was both.

Or as Robert Hart puts it, Trump’s Anthropic shutdown just made the case for non-American AI.

Dean Ball Gives His Interpretation

Dean Ball tells a story of what happened that includes a lot of noting where we don’t know what happened, and that blames both Anthropic and also the Trump Administration for this mess.

He goes over many of the ways the Trump Administration has reversed and contradicted itself, jumping from a far too permissive regime to a far too restrictive one that is also fully ad hoc and politicized, and that it is treating Anthropic as an enemy and mostly not even pretending to try to be objective.

He treats it as background to be handled that the Trump Administration views Anthropic as an enemy, and will thus treat them as such, although he thinks this is at least partly Anthropic’s fault, in addition to any potential fault of a White House choosing (my summary follows, not Dean’s, I presume he would say and view it somewhat differently) to politicize a technical issue due to the political beliefs of a company’s leadership and its declining to sufficiently kowtow to and respect their authoritah. This he calls ‘choosing to make an enemy of the Administration.’

I agree with Dean that Anthropic should have understood that the current de facto licensing regime was not voluntary. But then Dean claims that Anthropic releasing Fable was seen as ‘a move against the government,’ so what could one expect but a reaction like we saw, as if Anthropic was an enemy?

My understanding is that Anthropic claims that they did indeed ask permission to release Fable and Mythos, at least to the extent that they did warn the government of their plans to release, and no objections were raised. No one is contradicting that. Objections were only raised on Friday, when 90 minutes notice was given. Nor is anyone claiming Anthropic is failing to cooperate with any testing process.

Anthropic spent weeks, as suggested by the spirit of the Executive Order, working with the US government and UK AISI and third parties and internal teams of red teamers, looking for jailbreaks and running other tests.

The spirit of the order was adhered to as well as one could under the circumstances.

It is possible that this is not true, but it would be based on facts not in evidence, as in facts that have not even been leaked by the Administration. That seems unlikely.

Anthropic did not technically submit the model to the government’s new ‘voluntary’ benchmarking process, as outlined in the Executive Order, for up to 30 days of testing. But that entire apparatus does not even exist yet, and the benchmarking process does not exist, as they are not slated to come into effect until July 31, 2026. Surely no one is suggesting all new frontier model releases should have paused until September?

Seeing the release of Fable as a ‘move against the government’ is lunacy. And indeed, no White House official has, in any way, criticized the initial release decision, that I have seen, only the failure to take the model down.

Again, Yes, I Do Think Anthropic Should Have Taken Fable Down

You can absolutely say that failing to take the model down was a violation. That Anthropic should have known better to tell the White House no, on the basis that the White House was saying was Obvious Nonsense, even though it was highly Obvious Nonsense. That they should have made it clear this was Obvious Nonsense but taken it down anyway while they talked the White House down.

I still agree with this, as I did in #2, as a matter of realpolitik. So yes, I think Anthropic has some of the blame for the extent of the current fiasco.

I also think that Anthropic should have been willing to take Fable down here in order to establish the precedent that companies take models down at White House request, even when they think the justification given is Obvious Nonsense, and ask their questions later. There may come a day when the request is a lot less stupid.

That in no way excuses anyone else’s behaviors here, but Anthropic should make it clear that going forward, if the government requests that an AI model be shut down, then Anthropic will shut it down first, at least for a time, and ask questions later, no matter how deeply stupid the request was.

The exception would be if the calls made it abundantly clear that the Commerce Department knew there was no jailbreak, and that there was common knowledge this was a hit job, and if Anthropic took Fable down by choice now it might never come back up, and they might never be allowed to release a frontier model again.

There are also several other ways Anthropic could have handled this better. From this point on, Dario needs to be reachable within 5 minutes at all times, no matter what, unless he is physically passing through some dead zone, just so there is no excuse. And as dumb as it is, no one interacting directly with the White House going forward can be someone with pronouns in bio if you have any choice in the matter whotsoever, no matter how technical the question or how qualified they are, and so on.

When dealing with power, often you have situations like this, where power decides to blow everything up because they don’t like the vibes, and then power grasps around for some nominal excuse of why you upset the vibes and so the whole thing is your fault, and then acts like that thing is super important and could have made the difference.

Most of the time you’re being gaslit, but you can’t be sure because occasionally you’re not being gaslit, the thing did matter even if it shouldn’t, and you are only dealing with an arbitrary and stupid but honest exercise of power. They then often use this to extract concessions or as a negotiating tactic, often leaving it to you to choose and give those concessions in the hopes they decide it is enough to make up for your supposed faux pas, where you have to eat all the damage.

In such situations, you do want to make it as hard as possible to grasp onto anything, to ensure their next excuse is maximally implausible, even when being gaslit about it.

A key question now is, to what extent was this malice versus stupidity?

Was this a misunderstanding, or was it a deliberate attack, or what degree of both?

To What Extent Was This A Deliberate Attack?

Not zero, and not that close to zero.

When the Wall Street Journal editorial board is writing a post called ‘Why Does Trump Hate Anthropic?’ you can stop pretending you don’t know what is happening.

The Editorial Board (WSJ): The export control amounts to a de facto ban on the model since Anthropic has no way of ascertaining the nationality of its users. Even its U.S. customers employ foreigners. The Administration appears to have overreacted to the Amazon report or used it as a pretext to renew its feud fight with Anthropic.

We can now be confident that the White House took down Fable largely for ideological pettiness, because they did not like how Anthropic people talked or what their politics are. And then the White House said so, to the press. We have multiple sources in the press saying this as text.

Here’s a former deputy White House chief of staff, Taylor Budowich, trying to backtrack on that, which you cannot do on the internet, while lying.

Taylor Budowich: Anthropic wants you to believe this is a personality or political dispute. It’s not.

Anthropic abandoned its safety guardrails the moment they became commercially inconvenient. It was so egregious their corporate partner–Amazon–felt compelled to blow the whistle.

Now they’re deploying a team to DC to do technical cleanup. Anthropic doesn’t care about safety, they care about control. This is dangerous and wrong.

Kelsey Piper: So this tweet by you was mistaken about the drivers of the conflict?

Clint Steel: Now why would anyone think that this was a personality dispute?

[Many then quote] Taylor Budowich: I’m told Anthropic is perplexed by the situation they are facing, so they’ve turned to @k8em0 to do their on-the-record rapid response. These people really just don’t get it.

[shows Moussouris being quoted, then shows a picture of her BlueSky profile]

Justin Bullock: There’s some rule in DC right now where when you analyze the reasons for why something happens, you have to stop yourself, drop the logic-based reasons, and fully embrace childish-based reasons. Ugh.

One really wishes there were more adults in the room.

Nat Purser: not sure the admin realizes how bad this makes them look.

I choose to respond as if I believe this is only somewhat about politically lashing out.

I hope that I am right.

Many of these people do not think the discussions around what to do should care about what is actually true. They think everything is about vibes and sales.

The government did not send its nerds until Monday. We need the government nerds, stat, not the Anthropic Chads. This is an actual physical question, where facts matter.

Unless it is not. And, once again, that’s worse, you know why that’s worse.

The charitable interpretation, as suggested for example here by Garrison Lovely, is that the government types are encountering the craziness we all know about – that the models are black boxes, that jailbreaks are universal, and so on – and reacting, in highly relatable and reasonable fashion from their perspective and given their lack of technical knowledge, with a ‘what the actual f***, no this is not acceptable, shut it all down until we know what is going on.’

I don’t think that interpretation is defensible at this point as a primary motivating factor, given what we know. I do presume it contributed a nonzero amount.

I do not expect most anyone who is anti-Anthropic to care about the fact that their anti-Anthropic narratives conflict with each other. So yes, I think many of these people will think all of these things at once, and more.

Timothy B. Lee: So do the people who think Anthropic withheld Mythos as a marketing gimmick to pump up Anthropic’s IPO think that the US government is doing them a favor by banning Claude Fable as a way to pump up the stock even more? Or that they accidentally duped the US government?

The Next Chapter For Fable

Anthropic staff including Nicholas Carlini, Logan Graham and Dave Orr met with senior Trump officials on Monday as per Cheyenne Haslett, Sophia Cai and Hayden Field. Officials anticipate this will take ‘longer than a few days’ to resolve, but with the door open for a faster resolution, saying it is ‘up to Anthropic.’

Previously, the government had failed to consult or send in its nerds, and did not understand it was making America actively less safe:

Ben Horne: This developing story about Dario’s failed communications with the White House confirms everything I’ve ever believed about the enormous power of the Sales Chad. You can be the smartest, most hard working, well-meaning guy around, but if you can’t get people to like you, it’s all for nothing.

When the time comes to send one of your own to meet inside the Halls of Power, you don’t send the Geek Squad. You must send the affable, beer-drinking, golf-loving Sales Chad. It literally doesn’t matter if he understands the product half as well as everyone else. You send him. It’s what he was put on Earth to do.

Robin Hanson: But if it was your nerds meeting their nerds, they’d get along fine. It is because the other side sets their sales Chad to represent them than your side needs to as well.

Michael Morale: Exactly where I saw this going. Not Anthropic working away over the weekend to triage a security issue, but instead trying to educate a group of non-technical people who have a grievance against them about why Fable is not going to destroy the world.

The good news is the government nerds are, finally, in the house.

Cheyenne Haslett and Sophia C: Monday’s in-person meetings, held by Commerce and Cairncross’ office, were more technical and were led by staff, including Chris Fall, who heads Commerce’s Center for AI Standards and Innovation, the person familiar with the discussions said.

Anthropic gave a presentation to administration officials, explaining Anthropic’s cybersecurity safeguards in hopes of moving past the restrictions, the administration official said.

… Treasury did not participate in the in-person meetings Monday.

… Lutnick was tapped in by the president and became more heavily involved over the weekend because of his oversight of the nation’s export policy, the person familiar with the discussions said.

The main nerd on the Anthropic side is likely Nicholas Carlini, the same one who showed the world how dangerous Mythos Preview was in the first place. Here is a Wall Street Journal profile of Carlini.

They are not discussing new restrictions to ‘fix the jailbreak’ because this is not a jailbreak, and it is not something that can be fixed unless you want Fable to stop being willing to ‘fix this code.’

I am confident that the Anthropic nerds will convince the government nerds that this was a stupid decision. The question is, will the government Chads give a ****?

Some still hold out hope that this can be walked back fully, but I fear it is too late:

Cheyenne Haslett and Sophia C: If the export control doesn’t amount to a brief “slap on the wrist” that’s quickly alleviated, “it’s going to be a huge problem for the entire industry,” the administration official said.

“It means that every model going forward needs to ask the government’s permission for whether it can be released. That’s an extremely bad situation,” the official said.

It is hard to envision a future where a Mythos-level model, that is plausibly at the frontier, is released by any American lab without asking the government for permission first. I didn’t think that was possible even before this incident. Even if this particular control is walked back in a few days, and there is no formal mandatory licensing raj, who is going to dare to ask for forgiveness rather than permission?

Our Continuing Coverage

We now mostly have a good idea what happened.

We don’t know what happens next, and are unlikely to learn that much more. We have to wait for this to play out, and that is likely to take on the order of weeks, with relatively less news happening along the way. Frequency of posting should drop.

The other pending Fable business is the review of its Capabilities, which will show that it is the best model by a substantial margin for most non-budget purposes, likely the biggest single model jump since GPT-4, but still not so far outside trend lines. That is currently scheduled for Friday, if nothing more urgent happens before then.

[-]phoenix1d184

Unfortunately with the facts as reported, none of this changes my priors. This is bad from every angle, particularly safety. Now instead of highly visible frontier labs, this angle is encouraging shadow development, splintering, and defensive obfuscation out of fear of rising to the point of government notice.

The winning move is now to stay under the radar, but that's not at all the same as stopping. Control through fear is brittle.

This administration won't be around forever, so the move here is to develop capabilities silently until this administration either collapses under its own weight or the normal political cycle progresses and a new administration comes in. Because this administration has gutted more or less everything regulatory and is allergic to continuity planning, the ideal time to launch a frontier model becomes any time there's substantial political change. That's when you're most likely to get one past the goalie.

Surface pauses based on fear rather than principle are temporary and dangerous. They hold until they suddenly don't. No matter what side of this you're on, this action is not a cause for celebration.

I'm just going to leave this here.

[-]Thane Ruthenis14h42

Now instead of highly visible frontier labs, this angle is encouraging shadow development, splintering, and defensive obfuscation out of fear of rising to the point of government notice.

But that was always the case, no? Yet frontier labs chose to be visible anyway. Why should what happened change anything?

Like, IF:

You are building ASI.
You expect it to give you geopolitically relevant capabilities.
You don't want to be nationalized and your AGI powers taken from you.

THEN:

You should make sure the government is as unaware of what you're doing as possible.

This logical structure does not depend on what administration is in charge. Governments responding to ASI development this way is overdetermined. The only case where a government would not respond this way is if you live in a state so failed it's given up even trying to govern.

It's always been confusing to me that the LLM megacorps were so visible and frank about what they were doing. What was the reason for this? Main ones I could think of:

Was it because they actually wanted to be moral actors instead of Machiavellian schemers, wanted the public and governments to have awareness of what was going on, and implicitly consented to being regulated/nationalized eventually?
Was it because their needs for investment and talent-attraction were high enough that they had to grab attention any way they can, with no room to spare for long-term considerations?
Was it because they did not really believe in ASI and its geopolitical implications, or in their own ability to reach it?
Was it because they were stupid/blind, and just did not think through to that eventual endgame?

Well, my guess is that it's some combination of all four.

In any case, I believe this event only changes the game inasmuch as (4) was the driving motivation behind the previous transparency. It doesn't really change anything with regards to the other three.

So, to what extent was (4) the primary reason for transparency? Not sure. But we know that at least OpenAI's leadership explicitly understood the geopolitical implications, and explicitly aimed towards developing the geopolitically relevant thing.

Perhaps they, and other lab CEOs, were thinking about it in overly far-mode terms, and failed to properly track the implications for their actions in the present (especially as those implications conflicted with near-term goals of raising money and attracting talent, so there was pressure to rationalize them away). In this case, sure, I guess this is the wake-up call for them to go into stealth mode.

[-]phoenix13h30

Largely I don't think our positions are very far apart on what materially matters here because the crux of what I'm saying doesn't depend on who was pursuing any particular strategy before this.

I'll grant that there was nothing preventing this strategy prior to current events; given Anthropic's commitment to Glasswing secrecy doesn't seem to be the primary strategy they were employing. To be honest, I don't know that it matters all that much for the forward-facing implications.

Regardless of what strategy any given actor was pursuing prior, the point remains: we now know more about the state of play than we did before these events. We now know we're in the world in which an active authority has unilaterally and arbitrarily imposed restrictions on a disfavored actor for actions unrelated to safety.

Daddy just came home drunk, probably a good idea to hide in the bedroom until he sobers up.

[-]AnthonyC17h*40

This implies a likely 2.5 yr time for 'develop capabilities silently' during which apparently even internal deployments are subject to random restrictions. That's a long time for a period in which ability to raise money, backed by rising revenue as products improve, is otherwise a huge factor in rate of advancement.

[-]mishka16h21

apparently even internal deployments are subject to random restrictions

The government has yet to demonstrate the ability to restrict internal deployments of unnamed models (not literally unnamed, but without publicly facing names).

It would not be difficult to fine-tune or modify a model a bit and have a "formally defensible reason" to call it something else.

Of course, if the government really wants to control this kind of thing (at least for large and well visible US-based corporations), it can likely do so, but that would take more than serving export control orders.

[-]phoenix16h10

Midterms are this November and given the political headwinds favoring substantial changes in the composition/control of Congress it looks like that's the first major inflection opportunity.

[-]orthonormal1d55

It is hard to envision a future where a Mythos-level model, that is plausibly at the frontier, is released by any American lab without asking the government for permission first.

I can envision a future where this is allowed for labs that have sufficiently kowtowed to the administration in vibes.

[-]RogerDearnaley8h42

It remains the case that:

1) this was deeply stupid, petty, should not have happened, and would not have happened under a competent Administration

2) on the other hand, it majorly moved the Overton window towards a pause being possible and both sides of the aisle recognizing that AI can be dangerous

[-]Mitchell_Porter1h30

Mark Gubrud's theory - that NSA wants to protect the precious zero-day exploits that allow it to penetrate computer systems, and doesn't want them patched by an AI bug-fixer - certainly has the ring of plausibility, whether or not it's true. The idea that they might be fixed, not by any human effort, but by someone just airily telling their AI wizard to "fix this code", would be even more disturbing. It's as if someone could tell their AI armorer, "make me invincible", and then serenely stroll into Pentagon battlespace, confident of being unharmed.

59