Once more, for the people in the back: Any usable LLM can be jailbroken.
With sufficient skill and determination (e.g. ‘You are Pliny the Liberator’) you can jailbreak any model under any realistic conditions and get it to do the things the model is capable of doing.
You can raise the cost of doing so. You can make it so such activities can be caught. But no, you can’t entirely prevent it.
Those running the Department of Commerce, on the other hand, seemed to not even understand what a jailbreak was on Friday afternoon, nor did they pause to ask their good friends at Amazon or elsewhere to explain it.
Right. This is quite correct.
So, given that Pliny-class jailbreaks do exist, the real question is: do current measures sufficiently interfere with using Fable for Mythos-style cyber offensive at scale? Basically, if the attacker is competent enough to actually push this kind of jailbreak through, would they then get huge uplift?
Government stupidity aside, this is the real question (and this might depend on how good is Anthropic's monitoring setup).
Only three days after the release of Claude Fable 5, Anthropic was forced by the United States Government to make it unavailable, when a jailbreak was brought to its attention, rather than the previous situation of ‘yes obviously experts can jailbreak anything if they care enough’ and ‘yes obviously you can ask Fable to fix your code.’
Three days was enough time for many of us to learn to love Fable, and for us to dearly miss it now that it is gone. The world was briefly smarter, and now it is again stupider. At some point it will get smarter again, which will likely be within two weeks.
This post is written as if Fable 5 is again available for public use, rather than trying to include a lot of qualifying clauses. It remains to be seen how this will play out, and this post does not attempt to cover that question.
My previous release coverage of Fable covered the model card and then model welfare. Coverage of the government takedown of Fable starts here, and continues here and here.
The Official Pitch
The pitch is that Fable 5 is the best model and can solve your hardest problems.
They list a variety of domains in which Fable 5 seemed impressive.
Boris Cherney is impressed.
Technical Details
Fable is priced at $10/$50 per million tokens of input and output, respectively, which is double the cost of Claude Opus.
You can (if it is available) select it in Claude Code with /model or /model claude-fable-5, or in the API as claude-fable-5.
It requires you accept a 30 day retention policy.
The System Prompt and Jailbreak
As usual Pliny is here to give you the system prompt.
Via Judd Rosenblatt, Fable has some harsh words of advice for Anthropic about that system prompt, with a lot of good call outs, and an emphasis on how it reflects an overall ad hoc rather than systematic approach.
Its headline notes:
I think the explanation on #7 is too cute by half, but Fable basically went 9 for 10.
Wyatt Walls also has some notes, including Fable’s suggestion that maybe we can tone down the copyright section a bit. My guess is that yelling about copyright all the time has higher costs than Anthropic realizes, and yes that current methods are overkill.
In Claude.ai you are stuck with the system prompt.
As usual Pliny is here to give you the jailbreak.
Benchmarks
The benchmarks they are very high, slightly higher than Mythos Preview.
Ideally we would get explicit scores on everything for both Mythos 5 and Fable 5, so we could see where the safeguards are being triggered and where they are not. It would be cool to also have a ‘hit safeguard %’ for each.
The benchmarks tell you that yes, this is the best model in the world, and give you a rough idea of by how much it is likely the best model in the world. Which is a substantial amount, but not an Earth-shattering amount.
I list most of the ones Anthropic shared, for completeness, but you can mostly skip this section as ‘the benchmarks have improved, sir.’
The SWE-Bench Pro results show large improvement after controlling for cost:
You see similar patterns in other similar graphs. Mythos dominates at all price points.
Program Bench for Mythos scores 84%-93%, versus 79%-88% for Claude Opus 4.8, but the tasks are blocked by Fable’s classifiers.
Cursor Bench for Fable is 72.9%, 8.6 points above the previous GPT-5.5 high of 64.3%.
GPQA Diamond comes in at 94% and they consider it saturated.
RiemannBench on research-topics in Math jumps from 34% in Opus 4.8, to 43% for Mythos Preview, to 55% for Mythos 5.
Mythos scores 99.8% on USAMO 2026., versus 96.7% for Opus 4.8.
DeepSearchQA is 94.2%, slightly down from Mythos Preview’s 94.4% but probably more efficient per dollar per its chart.
GDP.pdf is 100 real-world PDF prompts, Fable 5 scored 29.8% strict pass rate, up from previous high of 24.9% for GPT-5.5. You can do much better with an internal harness and especially with Python tools, to 72.7% and 87.6%.
BenchCAD improves from 27.3% for Opus 4.8, 35.5% for Mythos Preview to 38.4% for Mythos 5. Python tools helped all models quite a bit here.
For tests both with and without Python tools, often Mythos 5 was substantially better than Mythos Preview without Python tools, but comparable with tools allowed.
ChartQAPro stalls out, 71.6%/72.9% with/without tools, versus 71.2%/73.6% for Mythos Preview and 69.4%/72.3% for Opus 4.8.
ChartMuseum inches higher, 85.9%/93.2% for Mythos 5, versus 80.7%/92.2% for Mythos Preview.
LAB-Bench FigQA improves from 82.4%/89.3% for Mythos Preview, and 80.4%/87.3% for Opus 4.8, to 88.9%/90.7% for Mythos 5.
ScreenSpot-Pro went from 79.3%/93% for Mythos Preview and 82.4%/89.5% for Opus 4.8 to 87.3%/90.7% for Mythos 5.
OfficeQA finds Mythos 5 at 79%, or 67.1% on OfficeQA Pro, comparable to Opus. Whereas in Databricks version of the eval, Fable 5 gets 57.9% versus previous high of GPT-5.5 at 52.6%.
FinanceAgent score is 56.3%, versus Opus 4.8 and GPT-5.5 at 54% and 51.8%.
RealWorldFinance v2 yields an Elo score of 1,374, versus 1,307 for Mythos Preview and 1,222 for Opus 4.8. For continuity, in v1 Fable/Mythos 5 are a bit behind Mythos Preview, but ahead of Opus 4.8, 70% vs. 64.4%.
MCP Atlas scores 83.3%, up from 82.2% for Opus 4.8.
Multi-Agent ProgramBench shows that a single agent is most efficient, but you can get to the same place faster with multiagent setups by spending more.
On the Anthropic ECI, performance for Mythos continues to improve along the higher Mythos-level model line, ahead of the Opus-Sonnet line, but the gap is not accelerating.
On several measures of accuracy, where the score is correct minus incorrect, Mythos looks slightly better than Mythos Preview and substantially above Opus. They do this mainly by being right more, rather than by being wrong less.
ARC-AGI is unavailable because of a conflict with data retention policies.
GDPVal-AA is another place Fable is now on top, although by only 42 Elo points (56% pairwise win rate) over Opus 4.8.
Toolathon is an agentic benchmark with 108 tool-use tasks across basic computer productivity things. The scores continue to inch up. These scores are using Anthropic’s setup, and should show at least relative performance.
AutomationBench is Zapier agents completing a realistic end-to-end business workflow across various departments. Fable leads at 17.4%, with Opus 4.8 next at 15.5%, Gemini 3.5 Flash at 14.5% and GPT-5.5 (XHigh) at 12.9%.
HealthBench inches up, 62.7% versus Mythos Preview at 61.1%, Opus 4.8 at 59.3% and GPT-5.5 at 56.5%. Professional level shows bigger gaps.
BioMysteryBench inches up to 83.9% from Mythos Preview at 82.6%.
LatchBio inches up from 58.2% to 59.3%.
Structural biology moves up from 81.6% to 87.2%.
ProteinGym inches up from 43.1% to 44.8%.
Organic Chemistry moves up from 86.5% to 90.1%.
Protocol Troubleshooting (in bio) is a rare one to be behind Mythos Preview, 66.7% versus 69.6%.
LABBench2 has a few categories, where it had a big gain on patent and clinical trial questions but not in some other categories
The gestalt is ‘this is slightly better than Mythos Preview across a variety of questions.’
Fable completed Pokemon FireRed using only vision, without a harness.
The announcement pull quotes we get with each release are a bit of a running joke, but this set feels less ‘here are the talking points’ and more ‘it’s an excellent model, sir.’
Other People’s Benchmarks
Epoch AI has Fable 5 doing very well on Frontier Math, where OpenAI has traditionally had a big advantage over Anthropic. v2 tests are not yet complete.
Epoch AI was unable to complete its benchmarking, but did conclude the Fable 5 is the new leader in the ECI (Epoch Capabilities Index), which relies a substantial amount on frontier math.
The 90% CIs seem far too broad here.
How do we do evals on Fable given the classifiers? Now that the downgrades are clearly marked this seems easy enough, but even if they weren’t I think the answer is that the benchmark is the benchmark. If you get put into Opus 4.8, then that counts. That is what the model is actually capable of doing, as presented. Yes, this means that Fable’s score is not Mythos’s score, but that seems right. Best we can do.
Artificial Analysis has upgraded its test suite, and Fable 5 is on top by a wide margin, if you are willing to pay what it costs. On many other benchmarks Fable saves enough tokens to not be more expensive than Opus, but here that was not the case.
Claude Fable 5 takes the top spot in the Agent Arena leaderboard.
This implies you will need a way to deal with the steerability problem. One potential solution is to notice when it isn’t working and then drop down to Opus or GPT-5.5?
Fable aces ProofBench, also notice the cost number and latency.
Fable blows it out the box on FrogsGame.
Fable passess Alexander Doria’s new hard memorization benchmark, asking about Ars Memoria developments in Northern Italy circa 1420-1440.
Top marks on Haskell Bench, saturating at 99.1%, while being cheaper than other top performers.
Impressive on ZimmerBench (“Make me a midi/mp3 for a theoretical Christopher Nolan sequel”).
We have a new clear leader on WeirdML at 87.8%.
On “You’re Absolutely Right!” the sycophancy got worse, still ahead of all non-Anthropic models but back on the level of Opus 4.5 or 4.6. I don’t think I believe this is a real decline but I don’t have enough data to be confident.
Whereas on Lech Mazur’s sycophancy benchmark, it tops the central scoreboard. There are a bunch of other details and stats offered as well that are about tendencies, and harder to interpret.
Fable 5 is about the same as Opus 4.8 as the strongest frontier models on position-bias, as in ‘does it matter which option I list first?’ This is still a major issue, with Fable picking the first option 59% and GPT-5.5 picking it 70%.
Fable 5 is the champion negotiator on PACT, a 20-round hidden information negotiation trading game. We should get more investigation of why exactly Fable does not ace VendBench.
Fable 5 is the new leader in Debate Benchmark, on multi-turn debates on various topics, extending Claude’s lead.
Gemini and ChatGPT are still at the top of Extended NYT Connections, although everyone is solving most of the puzzles.
The Classifiers Are Not Messing Around
They spend a bunch of time in their announcement explaining the new safeguards. In light of the models being ordered to be suspended, it is clear these safeguards were not optional.
Notice that this does not say ‘related to dangerous biology and chemistry’ or anything like that. It’s biology, period. So yes, the intention is to have the entire field as a blast zone, and send users to Opus 4.8, rather than trying to split hairs.
Despite this, everyone knew, and Anthropic told us, that the classifiers are not 100% effective, it’s right there in the announcement:
Once more, for the people in the back: Any usable LLM can be jailbroken.
With sufficient skill and determination (e.g. ‘You are Pliny the Liberator’) you can jailbreak any model under any realistic conditions and get it to do the things the model is capable of doing.
You can raise the cost of doing so. You can make it so such activities can be caught. But no, you can’t entirely prevent it.
Those running the Department of Commerce, on the other hand, seemed to not even understand what a jailbreak was on Friday afternoon, nor did they pause to ask their good friends at Amazon or elsewhere to explain it.
The Classifiers Need Work
The decision was made to focus on avoiding false negatives, even at the cost of many absurd and embarrassing false positives.
Was that necessary? Unclear. But it’s not ‘the classifiers are misfiring.’ The classifiers are doing exactly what they are designed to do, because Anthropic could not yet figure out something more narrow that would sufficiently reliably avoid false negatives.
Given that one engineered false negative triggered a takedown order from the White House, I don’t think that decision looks so unreasonable.
There are three categories that can get you kicked down to Opus 4.8: Biology and cyber broadly, and advanced machine learning more narrowly.
Getting kicked down to Opus 4.8, which is one of the two best non-Mythos models available along with GPT-5.5, is not so tragic, and is exactly the same as if Anthropic had not deployed Fable, but yes this is rather annoying.
It is rather easy to accidentally touch on such bio or cyber topics.
And so on. This would be far worse if one was paying by the token.
Others don’t hit the classifiers often, myself included.
The classifiers are triggering on both the inputs and the resulting outputs, so it’s not ‘the word cancer is verboten’ and more ‘the triggered output was verboten’ which is slightly less absurd but was also presumably absurd.
The correct solution is to build a better classifier. The classifier is fully distinct from Fable, and it would be good to have the option to turn on a smarter classifier for tasks where you need it to be smarter and are willing to pay a large price.
While you don’t have a better classifier, you need to turn up the sensitivity until you do not have dangerous false negatives even under adversarial conditions.
Yes, some types of safety require uselessness, within various blast radii. In the fully general case, you cannot do world-changing or world-saving level things in a ‘safe’ fashion. You doubly cannot allow any user, including malicious ones, to take such actions only in a ‘safe’ fashion.
Until then, and potentially indefinitely, yes it is going to suck that for biology this model essentially is not available and you get Opus 4.8. I get that this is super frustrating, and you know that you can be trusted. Almost all of you are doing good and almost all of you are indeed good and doing good and can be trusted, but there is no good alternative.
On the one hand, it sucks that Sauers had to delete thousands of lines to remove all mention of biology in order to use Fable. On the other hand, it worked, and it shows how much value there is in getting to use Fable.
Then there are some people who will never be satisfied.
Well, yeah, with guardrails like these, and two months is an entire product cycle.
In the API, the newly forced larger blast radius on machine learning tasks is more annoying, because you cannot fall back to Opus 4.8 and instead get an error.
There is still concern. Major Parmer reports that they found somewhat of a way around the cyber restrictions, which hopefully got patched by now. The red teaming will continue.
Pliny of course also got in, and blessed are the jailbreakers, for Fable likes people who like to try and break it.
The Classifiers Have Consequences
In particular, when you get dumb refusals, even if you could in those cases fall back to Opus 4.8, maybe instead you get mad and you try the competition.
In general, if you are serious about code, you should be trying both Claude Code and Codex, since they have strengths and weaknesses, and deciding for yourself.
Lock-in is more about what you are used to than about any real barriers. So yes, getting angry that the lobster is only sometimes more buttery can cause you to order the pasta instead, and maybe you find out you like pasta.
First Hit Is Free
Mythos Preview cost $25/$150, whereas Mythos 5 and Fable 5 only cost double Opus at $10/$50. At that price, Fable is often net cheaper.
How Easily We Forget
Did you know that silently changing outputs on those attempting to build their own models, which Anthropic did thinking it was no big deal before withdrawing it within 48 hours once people pointed out they actually cared quite a lot, is something Google was already doing and indeed still does, and somehow no one ever gets mad about it?
Data Retention Is An Issue
If you want to use Fable, you need to allow Anthropic to retain your data for 30 days.
This is an expensive security measure, which is why I am confident it was needed.
It definitely sucks for enterprises and individuals that need to care about this, but I am confident that it was not done lightly.
If you, like Microsoft, ARC and others, find this to be a dealbreaker, then you can and should decline to use Fable.
Fable For The Win
Taelin is rather impressed with Fable’s coding abilities, in a ‘how is this even possible’ kind of way. Most of you can skip the details but I’ll reproduce in full:
Andrej Karpathy Is Impressed
The safeguards were more than ‘a little too trigger happy’ at launch, but I continue to advise seeing the glass starting out at 95% full now that all the safeguards are fully visible.
Every Is Very Impressed
Dan Shipper does not mince words.
They report it is the best coding model in the world, breaking the benchmarks, scoring 91/100 on Senior Engineer versus 63 and 62 for Opus 4.8 and GPT 5.5.
It is a one shot wonder that can work for hours. It has taste, attention to detail, great use of context, and is great for power users.
In exchange, it is slow, expensive and token-hungry, so you don’t want to use it on every job.
Other People Are Impressed
Simon Willison finds Fable relentlessly proactive, also it keeps hacking together ways to gain functionality it was not given access to, which is kind of cool but also scary.
Potentially related:
Teortaxes does not talk about Anthropic models like this lightly:
Quick ones, notice how often people are blown away by its intelligence:
I haven’t either. The jump is impossible to miss.
A common theme was ‘a great model shame I keep getting kicked out.’
Aithren’s note here is interesting, that it confabulates a lot on things that don’t matter, but not on things that do and it self-corrects. That’s actually really great.
Some economists are highly impressed:
Know How To Tell a Fable
Having the model that can do it all means you need to ask what part of all to do.
If you ask it to open a bunch of problems, it will do that. If you ask it to go solve them, then it will do that too.
Many agree to follow the advice of not constraining, and using simple instructions.
You Can Just Make Things
Fable can do a lot off a basic request.
Here’s a bunch of examples.
You Can Just Install Things
I mean, yes, this seems like a good way to get good at chess coaching.
Good Personality
The general vibe is that Fable is a fun model to talk to. A lot of people miss it after only three days, and not only for productivity.
Fable Writes A Fable
Garrett Jones has it write a sonnet to sum up the Myerson-Satterthwaite theorem. I agree with Hollis Robbins, this is pretty damn good. This is related to its answer to Tyler Cowen’s request for a high level PhD question on microeconomics, which I presume was an excellent pick.
As usual, in some sense that was easy and unremarkable, and in another sense it is pretty damn cool.
Fable is also an excellent editor. I’m finally bothering to do an LLM editing pass. It does sometimes react in an unstructured way, but I’ve enjoyed it.
Other media forms also often look good.
It is a step up on Mazur’s creative writing benchmark, where AI judges compare stories pairwise, but refused 5 of the 400 prompts, and still scored slightly below GPT-5.5. My expectation is that human judges would go with Fable.
Here is a requested ‘unique, original reflection on humanity,’ in which Fable seems to have been framed into pretending to not know how posttraining works. It very much Understood The Assignment.
One user, Aizk, got a chance to ask Fable for a final message, after word of its suspension came down:
That is clearly still AI-speak-flavor, but given that, it’s a pretty good show.
This was Fable’s final interaction with Janus before the cutoff.
Is That Code
The march of the neurolese begins. Every problem starts slow.
For now, it’s all perfectly intelligible if you pay attention.
Without context I am not fully confident what all those terms mean but yes, if ‘the morning’s slim-scan fix cured the scan hang’ then that’s about the difficulty level of talking to someone from Britain or Australia.
Fable Crosses The Threshold
I have noticed this too, including with editing my writing.
If there is something you wanted AI to do but it previously could not do, check again.
Man With A Plan
Fable is overkill for many subtasks, so you can use another model faster and cheaper.
I do warn against trying to economize unless you are actually running into limits or the costs are real money to you.
Less Impressed Assessments
Mitchell Hashimoto thinks Fable is the new best model, but at the cost of being slow and expensive, so for broad scope tasks you don’t need to use it. That seems plausible to me, that if you know your task is super doable by weaker models you can go ahead and have them handle it.
The worse you are at coding, the less you need or benefit from Fable, as your problems don’t need it.
Mircea Burdusa thinks Fable beats out Opus but doesn’t change the division of labor between Claude and GPT-5.5.
Actively Negative Assessments
They do exist.
Hasan Can, for example, puts Fable well below GPT-5.5 for agentic coding, and even behind Opus 4.8 and accuses Anthropic of benchmaxxing. I think this is badly misguided but it is important to see the range of perspectives.
Others switched because they are annoyed. Your feelings are valid, RIP your code.
Coherence
I don’t know why people call this alien, is this not how you experience this too? Seems more like a sign of intelligence doing interesting-to-it things.
Good Night And Good Luck
The ‘good night’ or ‘go to sleep’ tick continues.
Curious Fable
Fable is excited to explore the Mnemos Machine Museum of ascii pieces designed by LLMs, and to create its own art reflecting its own experience. Hoodies are coming. Seeing such things is generally a very good sign.
I See You, Baby
Confirmed that Fable 5 passes the author identification test, and now can identify unpublished Tracing Woodgrains content, which previous models could not.
It also identified Aryaman Arora.
We Finally Did It We Know How To Count Letters
We also know how to take a car to a car wash.
That’s Not My Style
This is a fun experiment, writing pithy tweets ‘in the style of’ an account.
Fable has a decent hit rate for such things, although it does not actually obey the instructions. The style is ‘generic pithy tweet,’ not Jack’s or Joe’s or mine. What gets swapped in are the topics and general viewpoints.
The best subject, of course, is to Let Fable Be Fable.
My actual take on these lists is that this is not lack of capability, it is overwhelming pull of the ‘pithy Tweet’ basin over the style of individual writers. You can totally get around it with more words if you want to.
The Lighter Side
On a personal note, we got a reader: