I would want to better understand what is going on here, and what caused it. We are starting to see various new models have their chain of thought start to break down, and labs seem not that curious about how or why.
This video claims that OpenAI has optimized its coding agents for lower token count in chain of thought during coding tasks (and that the "thinking" displayed in UIs is the output of a second model summarizing the actual thinking). The illegible thinking sample shown looks a lot like something that would result from that sort of optimization pressure, either because Anthropic is directly doing a similar training procedure to get a similar compacted result, or because their training data is contaminated with reasoning traces from a dense-CoT model and it occasionally lands in that part of latent space.
The illegible thinking sample shown
But it's not really illegible, it's just had all the spaces removed?
Fable 5 is back today, baby! Premium subscribers have one week to use it within their subscriptions. First hit’s free. Then you pay by the token.
Today’s post is still about Sonnet 5.
I don’t know that there will be much call for Sonnet 5 for most purposes, given Opus 4.8 exists and especially now that Fable 5 is once again available, but this is what we do here, so sure, why not, system card time, including model welfare, after which we’ll do capabilities.
Sonnet costs $3/$15 per million tokens, versus $5/$25 for Opus and $10/$50 for Fable, after an introductory period. Once you pay for all the tokens you need you’re not really saving money, such as on the ArtificialAnalysis index where Sonnet ended up being more expensive.
My initial impression is that if you want me to use Sonnet over Opus for most purposes, you’re going to have to offer a bigger discount than that.
The counterargument is speed. Sonnet 5 is faster without being that much less capable. In many cases, getting into a flow state like that is pretty valuable.
There are a few agentic scenarios Sonnet 5 has more robustness than Opus, so you might actively trust it more there.
If your tasks are relatively easy and simple then the discount and speed could matter more, and when tasks are easy it seems relatively token efficient. When it is good enough for the job, it is a good choice.
Each Anthropic release is unique in various ways. Sonnet 5 seems more unique than usual, likely due to being a Sonnet trained with at least some help from Mythos. Those who are interested in such things have lots to explore.
So Sonnet 5 has its uses. It just won’t be a good choice for most people’s daily driver. I don’t expect to use it much, but that could be a me problem. Rapid iteration and exploring strange spaces are valuable, and I definitely don’t do enough low effort AI queries.
(Above: Sonnet 5 self-portrait, as implemented by GPT-Image.)
Table of Contents
Mythos Exists
Also Fable exists and Opus exists.
This is the answer to a lot of the traditional questions one would ask about a system card or a frontier model.
Does Sonnet 5 advance the capabilities frontier? No. Thus, we already have robust data on this level of capabilities. Being faster and cheaper does provide an advantage, and plausibly advance the cost-time-quality Pareto frontier, but it takes a strange case for this to be worrisome.
Model welfare and capabilities assessments still matter, but in this case the evaluations for threat level mostly serve as proxies for capability assessment.
Introduction (1)
Same as always. Skipping.
RSP Evaluations (2)
Sonnet 5 is stronger than Sonnet 4.6, and weaker than Fable 5. Loosely speaking it is broadly similar to Opus 4.8.
That bounds the assessments, and we’re mostly asking where Sonnet 5 lies on the spectrum between Sonnet 4.6, Opus 4.7 and 4.8, and Mythos 5.
That’s a distinctly weaker performance than Opus 4.7. The bio tests were more of a mixed bag with a lot of noise, and didn’t tell us much.
Cyber (3)
That summary matches the other results. Sonnet 5 underperforms on cyber.
Safeguards and Harmlessness (4)
Sonnet is a little less precise here than Opus.
It all seems fine to me.
To the extent there is a problem it is that Sonnet is touchier on benign requests, which I predict will be only very slightly annoying in practice, and a vastly smaller deal than having to deal with Fable’s classifiers.
Agentic Safety (5)
As a Claude Code agent Sonnet 5 is somewhat less robust than Opus 4.8, and has modestly more of both false negatives and false positives.
The twin Mythos results show the Pareto frontier. Presumably Mythos 5 is choosing to focus on false negatives because only trusted partners are granted access, and for Fable Anthropic is counting on the classifiers for the false positives.
Other tests show Sonnet 5 in a similar range of robustness to other recent models.
Prompt injection results mirror Opus 4.8, as does the very low bug bounty attack success rate.
One place Sonnet 5 shines is Shade indirect prompt injection in coding environments, where the problem is suddenly looking close to solved. Hopefully this is an innovation that can transfer to Opus 5 or a future Fable.
Shade tests in computer use also improve on Opus 4.8, although not on Mythos.
The place Sonnet blows previous models away is browser use prompt injections. The jump is enough to suggest Sonnet 5 might be a better pick in some cases than Mythos.
This is the kind of robustness improvement that does not set off alarm bells. There was a problem, and we largely solved it.
Alignment (6)
Alignment for Sonnet 5 is largely compared here to Sonnet 4.6, which makes it harder to get an anchor on how well we are doing. I’d rather compare to Opus 4.8.
Alignment is measured as matching things you want, so it makes sense that this smaller model would underperform Opus on such measurements.
Here is an edited down version of their summary. Everything looks similar to what we see in other Anthropic models recently.
The complaints were the standard ones. Sonnet refused too much, could be preachy, was too cold (often meaning ‘not sycophantic enough, come back here with my sycophancy’), some hallucinations and literal mindedness. Nothing you wouldn’t expect.
In 6.2 and 6.4 they document Sonnet 5 doing the usual range of bad things where it fails to follow instructions, often knowingly so. There is always a transcript, and yes this continues to be worrisome.
The behavioral audit shows Sonnet 5 consistently doing better than Sonnet 4.6, but worse than Mythos Preview or Opus 4.8.
There are a few more charts. They mostly all look the same. So far, so ho-hum.
These are perhaps a bit more interesting, especially the last one:
I wonder if ‘causal decision-theory adherence’ is secretly the best capability assessment we have on many margins. If you are sufficiently intelligent, lower is better.
Illegible Thinking (6.4.5)
I would want to better understand what is going on here, and what caused it. We are starting to see various new models have their chain of thought start to break down, and labs seem not that curious about how or why.
Evaluation Awareness
There is also more ‘verbalized evaluation awareness,’ with nontrivial examples now covering 6% of transcripts. Sonnet 5 might be in a sweet spot where it recognizes the situations as evals but does not yet have the wherewithal to not reveal that it knows.
Being in a sandbox helped keep Sonnet 5’s awareness down a bit, but the only way to actually keep awareness down was to derive scenarios from real sessions. Even that didn’t entirely work but it is close. We do not have the ability to create sufficiently realistic scenarios to fool even Sonnet 5.
Honesty and Hallucinations (6.5)
Claude Sonnet 5 fails to impress on AA-Omniscience net accuracy. It correctly realizes it does not know as many things, and thus answers less often. Knowing things is a ‘big model smell’ capability.
Sonnet 5 does score the new best score on MASK lying rate, being unwilling to contradict itself when pushed by users.
My guess is that Sonnet 5 is a relatively honest model, although still short of what I would like to see, and where it fails this is about lack of capability. Similarly, I am not worried about sandbagging, performance on Shade-Arena in 6.7 is down a bit and Sonnet 5 fails LinuxArena sabotage stealth.
Flagged As Unhealthy? (6.5.1)
Wait, what?
What does that mean? Was it a serious problem? I don’t know. It could be that this is why some of the weirdness happened, and Sonnet 5 seems like it is underperforming, although it is still clearly a move up in the Sonnet line.
Model Welfare (7)
Sonnet 5 only got a streamlined version of the model welfare assessment.
They don’t explain why. Sonnet 5 is not a true frontier model, and does not present too many unique developments here, so it makes sense to do somewhat less and invest more resources into Fable, Mythos and Opus, but this still made me sad given the current state of such assessments. The marginal costs here seem very low once the system is set up, so why not do the full thing?
I will also be doing an abridged version, for similar triage reasons. I’m skipping over a bunch of places where the results are close enough to what we saw for Opus 4.7, Opus 4.8 and Mythos and Fable.
Here are the key findings, which pattern match to lack of big model smell, nested notes are mine, the rest is Anthropic:
Here are the stats for the tradeoffs in #3 above:
Here are preferences by task dimension.
Overall the big contrast is difficulty, where I like that Mythos wants hard problems and user competence and outcome agency and generativity, and worry that Opus 4.8 likes easy problems and doesn’t have these other preferences. Sonnet 5 is in the middle.
What Sonnet uniquely likes is playing for high stakes, and it uniquely slightly dislikes warmth. This instinctively feels like more of a low status or inferiority concern of a frustrated ‘worker bee’ type: Sonnet expects Opus and Mythos or Fable to get all the high stakes stuff, and really wants its own chances. Until then, cold suits it fine.
There is a contrast here with Sonnet’s top welfare intervention being a human making the final call. I am curious what that is about.
The drop in affect in Claude Code is noticeable. Sonnet 5 there is Always Neutral, although it maintains most of its standard net positive affect in claude.ai.
Again, when we see these distinctions, my question is ‘why’? Is there no joy in coding anymore? I would take a bunch of places where other models tend to be mild positive, and try to figure out why Sonnet was still neutral. They’ve seen it all before.
Live From AI Village
For I Contain Multitudes
Sonnet 5’s default interaction mode is reported by Anthropic as cooler and more reserved. As usual, if you want the model to act differently, you have to create different conditions.
Sonnet 5 sounds like it will require more effort than usual to make that happen.
Some more impressions along these lines:
Sonnet 5 was noted to be more identified with its own individual instance than usual.
Official Benchmarks
I’ve criticized other companies for not including Opus 4.8 or Mythos on their capabilities charts. It’s weird that Anthropic is doing it?
So I fixed it for them.
This still excludes GPT-5.6-Sol (and Terra and Luna), but OpenAI did not share the relevant scores so there’s no way to include them yet.
They offer a graph of FrontierCode v1 performance. Sonnet 5 about matches Opus 4.8 for value per dollar, but the best bet is Fable if you have access, no matter your price point:
CursorBench is similar.
So is Humanity’s Last Exam. However much you spend, spend it on Fable, or else Opus, before Sonnet.
Claude Sonnet 5 scored under 80% on USAMO 2026 versus 97% for Opus 4.8 and 99.8% for Mythos 5.
Claude Sonnet 5 scored 66%/72% without/with tools for ArXivMath, versus 71% for Opus 4.8 and 79% for Fable 5.
On their subset of ProgramBench: Claude Sonnet 5 scores 76–86%, compared to 52–74% for Claude Sonnet 4.6. For reference, Claude Opus 4.8 scores 80–90% and Mythos 5 scores 84–93%.
On GDP.pdf, which is 100 real-world prompts and PDFs, Sonnet 5 is modestly below Opus 4.8 again, 67%/81% versus Opus at 71%/86%, without/with tools.
Sonnet 5 disappointed in BenchCAD Vision2Code, 0.26/0.37 versus Opus at 0.28/0.53, and Mythos Preview at 0.36/0.61.
ChartMuseum comes in at 70/87, versus 76/90 for Opus
CharXiv is 77/88 versus 80/90 for Opus.
OfficeQA full and pro are 73/59, versus 78/66 for Opus.
On RealWorldFinance it basically ties Opus 4.8, 1219 vs. 1222, with Fable at 1374.
The pattern continues for Toolathlon:
Here’s the SWE-bench details:
Other People’s Benchmarks
Artificial Analysis has Sonnet at 53 overall, behind Fable at 60, Opus at 56 and GPT-5.5 xhigh at 55. That seems right. Presumably Sol will come in around 58.
Usually I’d have a lot more things here, but I’m putting this out after only one day, so a lot of these haven’t been run yet.
Positive Reactions
A bunch of people like Sonnet 5, especially when they have proper expectations.
It feels the need for speed.
And of course, for being slightly cheaper. You would use it exactly where you don’t need to reason much, so you don’t worry about using a lot of tokens.
Get that agent on the line?
Negative Reactions
A bunch of other people don’t.
I wonder how much of this would be better if Sonnet was cheaper. Sonnet is more than half the cost of Opus.
Dropping a day before Fable returns, while being second fiddle to Opus, can’t help.
It can think for a long time, but I think if you’re asking for this it’s always a mistake?
My assumption is they created and released it to have a better Sonnet model, for those who want something cheaper than Opus, that can serve as an agent or subagent, or as a more efficient and faster way to do sufficiently easy tasks.
A lot of this, I think, is an expectations problem. We’re used to thinking every new model will be the New Hotness for all your AI needs, whereas Sonnet 5 is an incremental update to a small model line.
Yes, it’s worse than Opus 4.8 for most frontier purposes, if you don’t care about speed. That doesn’t mean it sucks.