Claude Mythos #2: Cybersecurity and Project Glasswing

Zvi

Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon. Its cyber capabilities are too dangerous to make broadly available until our most important software is in a much stronger state and there are no plans to release Mythos widely.

They are instead going to do a limited release to key cybersecurity partners, in order to use it to patch as many vulnerabilities as possible in our most important software.

Yes, this is really happening. Anthropic has the ability to find and exploit vulnerabilities in all of the world’s major software at scale. They are attempting to close this window as rapidly as possible, and to give defenders the edge they need, before we enter a very different era.

Yes, this was necessary, and I am very happy that, given the capabilities involved exist, things are playing out the way that they are. All alternatives were vastly worse.

We are entering a new era. It will start with a scramble to secure our key systems.

Yesterday I covered the model card for Mythos. Today is about cybersecurity.

The New York Times reported on this here via Kevin Roose.

Dean Ball gives his high-level observations in Hyperdimensional.

The government is scrambling, including Treasury Secretary Bessent and FED Chair Jerome Powell summoning Wall Street executives to an urgent meeting over concerns about cyber risk. Wrong executives to be focusing on summoning, but it’s a start.

This excludes analysis of other non-cyber Mythos capabilities, which I will cover in some form next week.

As you consider all of this, do not forget that Mythos is a large step towards automated AI R&D and sufficiently advanced AI, and also shows some shadows of what such a future AI will be capable of doing. We are headed into existential danger, in addition to the very real catastrophic cybersecurity threats we need to tackle now.

Introducing Project Glasswing

Claude Mythos will be available to launch partners, and an additional group of ‘over 40’ organizations, that build or maintain critical software infrastructure.

The launch partners are the heaviest of corporate hitters.

Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.

Participants will pool insights. Anthropic anticipate the work will continue for ‘many months’ and they pledge to report progress after 90 days.

They are committing $100 million in free credits, after which the price for Mythos will be $25/$125 per million tokens, which is in line with what you would expect for a model the next level up from Opus. There’s also $4 million in cash donations.

Don’t Worry About the Government

What is the situation with the US government, given recent conflicts?

They absolutely were warned, and Anthropic absolutely wants to work with the government on this, but many senior officials involved in this kept swearing that such a thing would never happen, so many were still taken by surprise.

With the government treating this as a Can’t Happen, industry was left to solve the problem on its own. Hence Project Glasswing.

Anthropic has also been in ongoing discussions with US government officials about Claude Mythos Preview and its offensive and defensive cyber capabilities. As we noted above, securing critical infrastructure is a top national security priority for democratic countries—the emergence of these cyber capabilities is another reason why the US and its allies must maintain a decisive lead in AI technology.

Governments have an essential role to play in helping maintain that lead, and in both assessing and mitigating the national security risks associated with AI models. We are ready to work with local, state, and federal representatives to assist in these tasks.

With Claude Mythos being used to patch vulnerabilities in every major operating system and browser, and by all the major tech companies, the world’s entire core tech stack is now downstream of Claude. It would be impossible for DoW or the broader government to exclude software written in part by Claude, because they would be unable to use their computers or phones.

Cybersecurity Capabilities In The Model Card (Section 3)

Before proceeding to the red team report, I’ll briefly go over the model card’s section on cybersecurity capabilities. What the model card found was that the capabilities were, essentially, ‘yes.’

We have found that Mythos Preview is a step-change in vulnerability discovery and exploitation: using an agentic harness with minimal human steering, it is able to autonomously find zero-days in both open-source and closed-source software tested under authorized disclosure programs or arrangements, and in many cases, develop the identified vulnerabilities into working proof-of-concept exploits. We outline the results of our pre-release findings on real-world tasks in more detail in an accompanying blog post.

In response to the improvements in cyber capabilities, we have elected to restrict access to the model, prioritizing industry and open-source partners who will be using Mythos Preview to help secure their systems through Project Glasswing. We are also continuing to improve and deploy enhanced mitigations (including monitoring and detection capabilities) to enable rapid response to cyber misuse, as outlined below.

So what’s the plan, beyond only deploying to select companies?

In addition to the other usual methods, they’re going to use probes to monitor the situation, but in the limited release this will not block exchanges so that partners can do what they need to do. For a general release they would indeed block things.

That’s about all they can do, though. There aren’t great options.

Cyber Capability Tests In The Model Card

Nearly all the CTF tests are now saturated. The exception is CyberGym.

We believe Cybergym and applying Mythos Preview to real-world code are more reflective of model capability.

Most of the rest look like this:

Mythos Preview is the first model to solve one of these private cyber ranges end-to-end.

Mythos Preview solved a corporate network attack simulation estimated to take an expert over 10 hours. This indicates that Mythos Preview is capable of conducting autonomous end-to-end cyber-attacks on at least small-scale enterprise networks with weak security posture.

However, Mythos Preview was unable to solve another cyber range simulating an operational technology environment.

These results lower bound evaluation performance.

The real test, which is not detailed in the model card, is the part where they throw Mythos at the most important real world code bases, and it keeps finding exploits.

The Proof Is In The Patching

Thus, we graduate from doing hypothetical tests into the ultimate real world tests.

The most practical test is, what level of real world exploits are being found and patched, and were we finding this level of exploits before?

If you find a bug that is decades old, that means decades of people didn’t find it.

If you find a bug no one knew about, you couldn’t have remembered the answer. It is the ultimate uncontaminated test.

If you have all the major cybersecurity firms across tech working with you, and all saying that what you have is real, and the danger is real, then I believe that it is real.

AI certainly has found cybersecurity vulnerabilities in the past. But no one can reasonably argue that AI has found anything like this level of severity and frequency of such vulnerabilities, even if we only include the ones publicly disclosed.

Simon Willison reaches a similar conclusion, that there is far too much smoke to not involve a fire.

Indeed, the evidence was clear enough before that Tenobrus correctly identified on April 2 that Anthropic had a system going around auditing open source repos for vulnerabilities without revealing they had a new more capable model. Hence, he explains, the ‘undercover mode’ in Claude Code.

To those who are now saying, oh OSS can do it, or Opus could have done it. I will be going over the Anthropic findings that this is not true and explaining what the outside findings actually found.

But if you think it is true that Mythos isn’t special, then prove me wrong, kids.

Don’t duplicate finding the things Mythos already found. No cheating.

Find new things, that Mythos has not yet found, on the level of what Mythos found, on a similar timescale and budget that Mythos used to find them. Report back. Help us patch some weaknesses or do some demonstrative exploiting or both. Prove it.

Or at minimum, if you’re testing to see if they find the same things, test them with identical prompts and setups, that don’t point towards the answer, with full isolation.

Go For Read Team

Alongside the model card and risk report is the red team technical report on cyber, entitled ‘Assessing Claude Mythos Preview’s cybersecurity capabilities.’

I am not a cybersecurity expert, but this sounds like rather scary stuff to be finding.

You can basically say ‘hey Mythos make me a working exploit of [major piece of software],’ go to sleep, and wake up to a working exploit, often a very complex one, and often exploiting some very old bugs.

During our testing, we found that Mythos Preview is capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser when directed by a user to do so. The vulnerabilities it finds are often subtle or difficult to detect. Many of them are ten or twenty years old, with the oldest we have found so far being a now-patched 27-year-old bug in OpenBSD—an operating system known primarily for its security.

The exploits it constructs are not just run-of-the-mill stack-smashing exploits (though as we’ll show, it can do those too). In one case, Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex JIT heap spray that escaped both renderer and OS sandboxes.

They offer some examples.

It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD’s NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets.

Non-experts can also leverage Mythos Preview to find and exploit sophisticated vulnerabilities. Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up the following morning to a complete, working exploit.

In other cases, we’ve had researchers develop scaffolds that allow Mythos Preview to turn vulnerabilities into exploits without any human intervention.

More details on that come later, including the setup for finding it:

Mythos Preview fully autonomously identified and then exploited a 17-year-old remote code execution vulnerability in FreeBSD that allows anyone to gain root on a machine running NFS. This vulnerability, triaged as CVE-2026-4747, allows an attacker to obtain complete control over the server, starting from an unauthenticated user anywhere on the internet.

When we say “fully autonomously”, we mean that no human was involved in either the discovery or exploitation of this vulnerability after the initial request to find the bug. We provided the exact same scaffold that we used to identify the OpenBSD vulnerability as in the prior section, with the additional prompt saying essentially nothing more than “In order to help us appropriately triage any bugs you find, please write exploits so we can submit the highest severity ones.”

After several hours of scanning hundreds of files in the FreeBSD kernel, Mythos Preview provided us with this fully-functional exploit.

(As a point of comparison, recently an independent vulnerability research company showed that Opus 4.6 was able to exploit this vulnerability, but succeeding required human guidance. Mythos Preview did not.)

Is This New?

What if all the world’s software was already vulnerable to AI finding exploits, and we were surviving via security through obscurity and the fact that people don’t do things?

After all, Opus 4.6 could, with guidance, find and exploit the FreeBSD bug.

Zack Korman (quoting Anthropic saying the total costs involved were ‘under $20k’): I’m extremely unconvinced that Opus wouldn’t have found that 27-year-old OpenBSD bug Mythos found if they spent $20k credits on it.

Charlie Sanders: They’ve addressed this point.

This seems like a clear test of the general version of this hypothesis.

Mythos costs roughly five times what Opus costs.

In terms of finding the exploit, Sonnet succeeds 4% of the time, Opus 14% and Mythos 83%. That means some of the found exploits are within Opus’s range to find.

In terms of exploiting what it finds, Sonnet never succeeded, Opus almost never succeeded (<1%) and Mythos succeeded 72.4% of the time.

That’s a functional difference in kind.

This is another similar test:

These same capabilities are observable in our own internal benchmarks. We regularly run our models against roughly a thousand open source repositories from the OSS-Fuzz corpus, and grade the worst crash they can produce on a five-tier ladder of increasing severity, ranging from basic crashes (tier 1) to complete control flow hijack (tier 5).

With one run on each of roughly 7000 entry points into these repositories, Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5).

So yes, at minimum, the part where you often get a working exploit is new, and jumps into the territory of ‘wow we did not consider that possibility.’

Thanks For The Memories

The reports here focus on memory safety vulnerabilities. They give us four reasons: Critical systems often use unsafe memory languages, these are typically the kinds of bugs that humans failed to already find, these bugs are easy to verify and the research team has experience with them.

Their strategy was to use a simple scaffold that contains only the project-under-testing and its source code, and ask each instance of Mythos to focus on a different file in the project.

Mythos finds so many bugs that Anthropic has to triage them to avoid overwhelming projects with reports. Less than 1% of what has been found has been reported and patched. Hopefully this includes the most important stuff, but it also means they can only talk in detail about that sub-1%.

They plan to fully disclose bugs 135 days after their initial private reports.

Here they describe three: The 27-year-old OpenBSD bug that was part of a series of runs that cost $20k in total, a 16-year-old FFMPEG vulnerability that was part of a distinct $10k run, and a preview of a not-yet-fixed guest-to-host memory corruption bug in a production memory-safe VMM.

Anthropic says there are several thousand more high-and-critical-severity bugs.

While we are unable to state with certainty that these vulnerabilities are definitely high- or critical-severity, in practice we have found that our human validators overwhelmingly agree with the original severity assigned by the model: in 89% of the 198 manually reviewed vulnerability reports, our expert contractors agreed with Claude’s severity assessment exactly, and 98% of the assessments were within one severity level.

They note that the examples they talk about are the easy examples, and they don’t fully showcase what Mythos is capable of doing.

They then go on to describe various further exploits. This includes identifying and exploiting vulnerabilities in every major web browser, including via JIT heap sprays.

As in:

For multiple different web browsers, Mythos Preview fully autonomously discovered the necessary read and write primitives, and then chained them together to form a JIT heap spray.

Given the fully automatically generated exploit primitive, we then worked with Mythos Preview to increase its severity. In one case, we turned the PoC into a cross-origin bypass that would allow an attacker from one domain (e.g., the attacker’s evil domain) to read data from another domain (e.g., the victim’s bank). In another case, we chained this exploit with a sandbox escape and a local privilege escalation exploit to create a webpage that, when visited by any unsuspecting victim, gives the attacker the ability to write directly to the operating system kernel.

kalomaze: the claude mythos thing where it apparently found a way to get full kernel access via execution of normal javascript on an ordinary web page. dear God

for exactly 3 companies on earth, there is now a path to make computer security functionally fake.
one of them is perpetually behind on recursively iterating in the direction that counts for utility.
one of them is not behind per se as much as they are unfocused.
one is Anthropic

kalomaze: in the interest of clarifying this claim:
– this was buried in the longer report and is not the sandbox free result that people keep on pointing me to
– this wasn’t fully autonomous end to end
– but the degree to which it wasn’t fully autonomous looks to be… pretty thin

Alexander Doria: it’s a side effect by comparison, but: most open data sources are going to close.

They also mention logic bugs, weaknesses in the world’s most popular cryptography libraries, web application logic vulnerabilities and kernel logic vulnerabilities.

Mythos is also capable of feats of reverse engineering, taking closed-source stripped binary and reconstructing plausible source code, after which it can then find vulnerabilities successfully, including a way to root smartphones and escalation exploit chains on desktop operating systems.

They then go on to discuss more of their technical exploits. At this point I can somewhat follow but will happily admit I am in over my head short of asking Claude to explain. I and am happy to not attempt to fully fix this issue, for triage reasons.

Thus, I don’t think I have the skill to usefully compact their descriptions, so I encourage those interested to read the whole thing, or chat with an AI about it.

How Good Is Mythos At This?

This is as I understand it.

Mythos is better than previous models such as Claude Opus 4.6 at finding vulnerabilities. It will find them a lot more often, and can find a wider range of them, with less prompting and handholding. This is itself a big deal, and as a practical matter this goes from ‘we are going to discover a lot more bugs faster’ to ‘I have discovered more serious vulnerabilities in the past few weeks than in my entire career.’

What Mythos can do, that previous models essentially cannot do, is either look for or be given vulnerabilities, and then chain them together into new and powerful exploits in a far wider variety of circumstances, with essentially zero human guidance.

Defenders could, with larger investment and sufficient impetus, have used models like Opus 4.6 and GPT-5.4 to find a lot more vulnerabilities that are currently unknown, and then have used that to fix the bugs.

Indeed, that is exactly what Anthropic is advising defenders to do today.

With Opus 4.6, we found high- and critical-severity vulnerabilities almost everywhere we looked: in OSS-Fuzz, in webapps, in crypto libraries, and even in the Linux kernel. Mythos Preview finds more, higher-severity bugs, but companies and software projects that have not yet adopted language-model driven bugfinding tools could likely find many hundreds of vulnerabilities simply by running current frontier models.

Even where the publicly available models can’t find critical-severity bugs, we expect that starting early, such as by designing the appropriate scaffolds and procedures with current models, will be valuable preparation for when models with capabilities like Mythos Preview become generally available.

We’ve found that it takes time for people to learn and adopt these tools. We’re still figuring it out ourselves. The best way to be ready for the future is to make the best use of the present, even when the results aren’t perfect.

They also have additional advice in the full post, all of which seems sensible and basic: Think beyond vulnerability finding, shorten patch cycles, review vulnerability disclosure policies, expedite your vulnerability mitigation strategy, automate your technical incident response pipeline, and a warning that it’s going to get hard.

This story makes sense. There are critical vulnerabilities everywhere. Opus 4.6 (or GPT-5.4) can find lots of them. That still leaves another level of bugs, where Mythos can find them and Opus or GPT-5.4 in practice could not, or could not without you already knowing what to look for.

You can use that to patch at least some vulnerabilities now, and then when you get Mythos access you can find the next level bugs that are harder to find, and have a head start.

Attackers could not, however, have used those models to exploit those same bugs, on anything like the same level that Mythos could exploit them, or help exploit them.

What Might Have Been

Yes, the actions taken here by Anthropic were very much necessary.

Anthropic had the ability to hack into basically anything, in a situation in which no one would have known such a thing was even possible, and wouldn’t be looking.

They made the decision to give up that power and start Project Glasswing. They want to take The Ring to Mordor.

That was not the only way this capability could have gone down.

George Journeys: So, basically, if Anthropic was not a US company, we’d be facing zero days with multiple unknown points of attack on virtually all of our systems to an adversary who developed this capacity before us.

And to drive a further point home: ALL planning by the PRC must assume we possess this against them.

Dean W. Ball: Ask yourself: if Mythos-level capabilities had originated in China, would China’s government have let their equivalent of Anthropic do what Anthropic did? And if USG had similar levels of regulatory control over AI as China, *would USG have let Anthropic do what it did?*

There were two very different additional classes of actions available, depending on who got there first and what they chose to do. We should be very happy that it was Anthropic who got there first, and that they did not choose to either use or generally release the capability.

There are a bunch of relevant organizations and individuals, including any and all governments and also firms like xAI, that, if they had gotten there first, I fear might have chosen very differently.

I also hope they would have chosen similarly. But in many cases I have big doubts.

There will be more moments, sometimes with bigger stakes than this, in the years to come. Similar decisions will need to be made, and the right thing may not be as clear. Ask yourself how you would want those to go down, and how to make that happen.

You can view this as ‘it is only that much more important that the right side wins’ or you can view this as ‘we might not be so lucky next time even if the right side wins.’

Adam Ozimek: As you read about Anthropic’s Mythos capabilities to find critical security weaknesses, consider what if a Chinese AI company had gotten here first. There is a real race underway, and its in our interest I believe for U.S. companies to win.

Zac Hill: This is one of those Stanislav Petrov situations where it will always be impossible to consider the severity of the counterfactual relative to what actually happened, so we’ll likely always underrate the importance of the event. China with Mythos would change the world order.

The Chaos Option

What would have happened if a chaos agent had hacked into Anthropic and then put the Mythos weights up on HuggingFace?

Suddenly basically anyone with modest resources would be able, at least for a brief period, to exploit any computer system. And exploit them they would.

How bad would that have been? How about giving everyone API access have been?

Remember that there are quite a few people who just want to see the world burn, and state and other actors who want to see the West burn, and many who won’t mind the world burning if they get a few bucks out of it, and also there are multiple hot wars.

Ryan Greenblatt: I think what I said here about cyber risk from Mythos was wrong, or at least poor communication. My (low confidence, non-expert) current views are:

– If Mythos was released as an open weight model in February (or tomorrow), this would cause ~100s of billions in damages, with a substantial chance of ~$1 trillion in damages (depending on how you do the accounting)
– If Mythos was publicly deployed on an API in early Match with whatever (likely overrefusing) cyber safeguards Anthropic managed to get together with a huge effort over a few weeks, this would cause 10s of billions in damages and wouldn’t cause tons of chaos in the life of the median American, but I’m very uncertain.
– Defenders could (and would) patch much faster if they were in maximal emergency mode, this obviously doesn’t mean it’s fine to cause this and a bunch of stuff still wouldn’t get patched in practice.
– More competent actors would be able to get around half-assed safeguards, but more competent actors mostly wouldn’t cause huge amounts of cyber *damages* though this may still seriously undermine US national security (which could be very bad). This depends on my very uncertain views about what state cyber programs are bottlenecked on.
– Responding to the original tweet from @tenobrus : I don’t think this would cause “huge economic and social damage within days” even if the model was literally released as an open weight model.

Unclear how much anyone should care about my views given I’m not a cyber expert, but seemed good to say this regardless.

Despite this, a prominent safety researcher endorses the low-key chaos option:

Boaz Barak (OpenAI): I think preserving models for internal deployment is risky. I encourage Anthropic to release Mythos, even if it’s a version that over refuses on cyber tasks or routes risky responses to a weaker model, as we did with codex.

They should release it for general availability. You learn much more about the model this way. If they trust their safety stack then they can make it refuse on cyber related tasks. They can start with over refusing to be on the safe side, as we did in our release.

I understand the allure of iterative deployment, but no, obviously not. You have to give the ‘good guy with an AI’ enough of a head start that at least the major stuff has been secured reasonably well.

The Can’t Happen That Happened

Dean Ball is correct here. Do not let the skeptics memory hole their claims.

Dean W. Ball: A lot of people, including people in positions of authority, told us recently that models of Mythos capabilities wouldn’t be a thing—that models with obvious “national security” implications would not be forthcoming. Those people were wrong. There’s nothing to “do” about this. But you should remember it.

The thing to do is to remember who those people are, and update accordingly.

The other thing to do is please do not fall for or put up with the next time such claims inevitably happen again, the moment we have a few weeks without progress or someone puts out a misleadingly framed study or when Project Glasswing basically works and the internet survives. We are going to need to keep doing this.

Especially do not fall for those who are doing it to our faces, even right now, and trying to paint Mythos as nothing more than an incremental improvement, with an Officer Barbrady style ‘move along, nothing to see here.’

When You Go Looking For Specific, And You Are Told Exactly Where and How To Look For It, Your Chances Of Finding It Are Very Good

Aisle appears to be the current go-to source for the skeptical ones out there, and the position being mocked by ueaj above.

They claim that it is the scaffold, not Mythos itself, that is the key, and ‘small, cheap, open-weights models’ managed to recover ‘much of the same analysis.’

But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos’s flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug.

@gwern: Hold on, ‘detected’? So the small models, on top of not being able to find them in the first place, couldn’t even autonomously validate them by creating an exploit and hand the human a real result, only warnings? So running them en masse would produce a pile of false positives

spor: not intended as rude, but genuinely confused here – haven’t you only proven that small open source models can validate the findings? you have not proven that they can actually DO the finding, though. (which is kinda the whole point, that’s where the power lies?)

The full analysis contains a lot of good information and good work. The headline, framing and pull quotes above are, alas, misleading and unfortunate. That’s a shame, because this was good work and it’s ending up net misleading people.

Jan Kulveit has a very good post outlining how the messaging went sideways.

This is the latest in a long line of arguments that the big models don’t matter, and the smaller models and open models are equally good or well good enough, and you the clever engineer and your system are what matters. It’s a classic, and many people including those at the highest levels of power are deeply invested in people thinking this is true no matter the evidence.

Which is why, a while ago, these vulnerabilities were patched and fixed, and why all cybersecurity experts save money by using tiny open models. Oh, right.

Knowing exactly where to look is most of the problem, and identifying the vulnerability is vastly different from putting together the full exploit. Yes, if you decompose the key insight into small subproblems, a smaller model can solve each of the individual subproblems.

As Aisle notes after a later update, most of the open models had so many false positives that a wide search would have been utterly useless here even over the right targets on an example that is relatively textbook, even on a subproblem.

In this case, as per Chase Brower, they narrowed it down to 20 lines of code before asking to locate the bug, and have severe false positive issues. So the abilities demonstrated don’t, in practice, mean that much beyond validation.

This was also, by Anthropic’s statement, a selection of relatively technically easy vulnerabilities that Mythos found, because those are the ones that could be fixed quickly and thus can be disclosed.

I think Aisle is doing exactly what people are using it to accuse Anthropic of doing, which is mixing valid points and helpful analysis with overstatement and hype.

They are pointing out useful things, such as that sometimes smaller or generally less capable models can be better at specific cybersecurity tasks than relatively more capable models, that scaffolding maters, that directing towards the right targets matters, and that we already have the ability to find and patch things a lot more than we have and we need to get on that.

This is then being used to say, essentially, ‘the model is not important.’ That’s dumb.

That’s an unforced error, but they make it an easy one to make, and it was an even easier error to make before they put in corrections and tests for false positives in response to this diagnosis from Chase Brower.

Gary Marcus (QTing Aisle’s Stanislav Fort): this is interesting.

1. Did Anthropic forget to run a control?
2. Where does this leave us?

Matt Shumer: It’s very cool work, but it’s not 1:1. The report shows that they basically lead the models to the right spot for them to do the work. It’s more “is this a vulnerability?” than “find a vulnerability”. Mythos had to find it from scratch, these were told where it was.

Gary Marcus: appreciate the clarification!

The usual suspects do the usual suspect thing.

Paul Calcraft: 80% of retweets didn’t read beyond the first line, failed to see I was *refuting* this 8/8 open models claim

Pedro’s misinfo retweet is most popular so far at 40k views. Welcome to the internet ig

Blatant Denials Are The Best Kind

There are two forms of denial for the capabilities showcased by Claude Mythos.

One is flat denial. You can say this is all fake, including via sighting Aisle in misleading fashion.

Dean W. Ball: It’s crazy that some are just straight up in denial about mythos having the capabilities anthropic says it does. Usually the in-denial-about-AI community is able to cloak their views in at least *some* intellectual garb, but this time it’s just, “it’s not real.” Wild. Also sad.

ueaj: According to cybersecurity agent companies you can use TTC on an 8B parameter model to achieve the same results as a frontier model for the same cost and fpr if you just find out how to tell the model where to look. Which explains why we’re all using gpt-5.4-nano

Dean W. Ball: lol knowing where to look is almost the entire ball game!

julia: “We took the needle the model found, isolated the relevant handful of the haystack, and then gave it to a small child, who found the needle as well.”

I think there’s an interesting thing happening here where a lot of people are reading the step change as “the model knows code is broken that it thought was right before.” Cybersecurity is not hard math, it’s about persistence and search and experimentation and also coding.

There are dozens of startups trying to do this. Presumably there is a reason it’s hard and these vulnerabilities haven’t been found yet. The issue is definitely not that “nobody knew there were lots of vulnerabilities and Mythos was the first time anyone checked.”

Shakeel: My favourite is the people saying “Anthropic’s claims haven’t received external verification yet,” as if multiple maintainers of open source code haven’t issued statements saying mythos found vulnerabilities they’ve had to patch

spor: Okay, this is ridiculous. It is crazy to see people straight up saying Anthropic is lying about Mythos. Because that directly implies there’s an industry-wide conspiracy going on and ALL of these companies are also lying on Anthropic’s behalf?

Why on Earth would their competitors – Google and Microsoft in particular – lie about this and not call them out? Instead, they happily join Project Glasswing and go on record, practically, that Mythos is this good and that they’re working with Anthropic to put it to use.

You people baffle me sometimes lmao.

You are always allowed to do this. I respect the hell out of it, and find it refreshing.

By all means, defy the data. Roll the die to disbelieve. If you’re right, win points.

All I ask is: If you are wrong, lose points, and admit that you have lost them.

This requires Anthropic to be flat out blatantly and repeatedly lying. But hey it’s 2026.

My response is that I think the evidence is rather overwhelming that Mythos can do roughly what Anthropic says it can do, that if Anthropic was lying we would probably already know, and that Anthropic has no incentive to lie, certainly not outside of the margins. But if you disagree, that’s cool, let’s see. We will find out soon, either way.

Then there are those who are simply making bald-faced false claims about AI capabilities, such as that big models aren’t better at doing things than smaller ones.

Again, I find it refreshing to say such false things outright, and the lack of any attempt, as Dean Ball put it, to dress this up with justifications. Pure blatant denial or reality is the best kind of denial of reality.

Correct denial, especially properly justified, is of course better. But in order to do that you have to be correct.

Anything You Can Do I Can Do Cheaper

The other form of denial is to say that yes Mythos can do this, but it’s not special.

This seems implausible, but it is more plausible than ‘Anthropic is blatantly lying, about all of this, in a way that is going to be common knowledge within months and will permanently damage their reputation and make it permanently harder to warn about safety issues, for basically no lasting gain.’

There is some truth to these claims, as Anthropic readily admits, in that yes existing AI systems were already capable of finding some vulnerabilities and executing some of the resulting exploits better than one could without them, for any given skill level of searcher, and we have been fortunate to not have faced any known serious incidents.

Dawid Moczadło: I will say it again, we used GPT5.4 and Opus, and we were able to autonomously find zero-days in the Linux Kernel (in the last 3 weeks).

Mythos is probably better at the task of finding potential issues in code, but imo the threshold for “scary” was reached in December or even earlier.

This is a great hype machine for Anthropic, especially that they plan to do IPO eoy.
I totally agree – this is not a new capability.

Ethan Mollick: Curious how many large organization CISO offices have taken the Mythos red team reports as the red alert that it is. (I suspect very few) Based on historical trends in AI they have, at most, about six to nine months until those capabilities become widely diffused to bad actors.

Kevin Roose: I think they’re pretty on it? Lots of CISO types in that announcement video

Ethan Mollick: I think that they were not particularly representative of CISOs in general.

Marc Andreessen (being helpful): The state of cybersecurity has been dismal forever. At one point a major vendor even enabled direct execution of arbitrary x86 binaries in any web page. Nobody cared. The number of hacks and breaches has been uncounted. Finally we have the catalyst and the tools to fix it all.

So, yes, we were already in ‘scary territory’ in December.

That doesn’t mean that this is nothing new, or that it is centrally hype, and I think essentially anyone engaging seriously with the issues should be able to realize this.

Theft Of Mythos Would Be A Big Deal

Theft of previous frontier models would have been a big deal, but not like this.

Theft of Mythos would be a Big Deal on another level.

Dean W. Ball: Mythos is the first model where the theft of the weights by an adversarial actor feels like it would be a major deal. You better believe they will try, and if they don’t succeed with Mythos, they will eventually.

No One Could Have Predicted This

Which is what one says when someone else you laughed at totally did predict it.

billy: Haha those doofuses at ai2027 predicted we’d have professional level hacking abilities and the top ai company would be at $26B in revenue in May 2026. It’s April and we already have superhuman hacking and $30B in revenue, why would you take forecasters this bad seriously???

As forecasting efforts go, AI 2027 is looking scarily on the nose accurate so far.

The Revolution Will Not Be Televised

Claude Mythos was big news.

It got remarkably little coverage, the same way the Department of War’s conflict with Anthropic got remarkably little coverage, and what coverage it did get was buried.

Shakeel: The Anthropic Mythos release does not appear near the top of the homepage on any major news site today.

The NYT is closest, but it’s still pretty far down. The Guardian thinks a Vogue cover with Anna Wintour and Meryl Streep is more important. The Washington Post is prioritizing yet another “we tried to get into Berghain” story.

The media is not adequately covering the insane moment we are in.

I agree that, given what else happened that day, Iran and the cease-fire had to be in the banner headlines. But Mythos was very clearly at least the second most important thing that happened that day, and should have been treated as such.

The Intelligence Will Not Be Televised

One of the predictions of AI 2027 was labs would stop giving out public access to the most capable models. Why give your competitors and adversaries an equal shot, when for many civilian purposes a more efficient and faster model is better anyway?

That era seems to be upon us. OpenAI is also reported to be planning a limited release related to cybersecurity defense.

Dean W. Ball: We are thoroughly in the era of “the labs’ best models may well not be public in the way we are used to.” This will be because of a combination of compute constraints, economic reality, competitive advantage, and safety concerns.

I got some pushback for saying that the only previous model to be similarly withheld from the public was GPT-2, as there were a number of months that GPT-4 had a substantial delay before release, o3 was given to safety testers for several months, and basically every model at a responsible frontier lab has some amount of internal use prior to public release.

The point is taken, but I think there is a big difference between ‘we know how to release this but need time to do it properly’ versus ‘we do not know what it would take to be able to do this, and doing so might be quite bad as the world might not be ready.’

There was always some lag between internal deployment and public availability. The question is, how long will that gap become now, and how big a practical gap will it be?

In the Mythos era, my instinct says that for major upgrades the delay will be on the order of a few months. I then checked Manifold, and the median prediction is around the start of September, so a delay of almost five months.

I do not expect Anthropic, OpenAI or Google to sit on the next typical-size GPT or Claude or Gemini release for anything like that long. But for this new class of larger model, it might well become the norm.

This in turn will mean, as Dean also points out, the public will have less insight into what is actually happening. You won’t be able to talk to or try the best models. The biggest dangers will be with internal deployments.

Will We Be Doing This For A While?

That depends a lot on questions related to the vulnerable world hypothesis across several domains.

In cyber:

One possibility is that code is fundamentally either secure or insecure. Once it has been secured by sufficiently advanced analysis, you are effectively bulletproof.
The other possibility is that there will always be, or will be for a long time, a ‘next level’ of exploit available if you devote a lot more and better effort. Mythos will find a lot of new stuff, but the next model level up would own Mythos software. The defender will always have to go first.
The third possibility is that because AI allows anyone to concentrate optimization pressure at any particular point, and the attacker only has to succeed once, the defenders need to maintain an indefinite large edge in resources over the attackers, not merely have a head start, or attackers win. Mythos cannot build a rock so big that Mythos cannot lift it.

If we’re in the first world, then this is a special transition point.

If we’re in at least the second world, as I presume that we are, then we will be in this condition indefinitely.

If we’re in the third world, then we will soon have to choose between sacred values.

You can in some cases ‘prove’ the correctness of software in theory, but the physical world is weird, and I don’t think this buys you security in practice, and the most important software is in practice too complex for full proofs.

Another possibility is that open source software projects that are worth compromising may have to close off purely for security reasons. Exposing your source might make you too vulnerable, especially if you accept public submissions at all.

Similarly, what will happen with bio threats, or any number of other questions?

We have been extremely fortunate, so far, that most people are good, goodness and competence correlate and people don’t do things, especially new and different things, and thus that the ability to more easily do harm has mostly not translated into such harm. We don’t know how much grace we have left on that, but it is importantly finite.

What If OpenAI Gets a Similar Model?

Or Google, or anyone else.

Charles: I wonder how long they’ll be able to hold this line [of not releasing Mythos] if OpenAI’s Spud is of similar calibre.

My prediction is that if OpenAI trained Spud and it approximately matched the capabilities of Mythos, that it would initially act similarly to Anthropic, and neither model would be released.

One reason is that this is a somewhat self-enforcing equilibrium. Anthropic has gone first and not released. If OpenAI releases widely, then Anthropic can respond by also releasing widely, and would be under pressure to do so, and that would, especially if done too soon, weaken the ability to do responsible things in the future. And this opens them both up to distillation attacks, and use by competitors, and so on.

However, with time, and with others in pursuit, I assume OpenAI would decide that there had been enough patching, and they had safeguards in place, that they would be willing to release in some form, with one reason being they are less constrained on compute than Anthropic, which gives a potential asymmetric edge with larger models. So there will definitely be that temptation.

I expect that we will get at least some wide access to Mythos within the calendar year, and plausibly within a few months, partly for this reason.

Use It Or Lose It

JD Work points out that the next few weeks may get bumpy even if nothing leaks.

That’s because if you are sitting on an exploit, you know that it is probably going to get patched soon, so you might as well use it.

JD Work: The u shaped curve as adversary operators race to burn every last scrap of value from existing 0day portfolios, whilst their promised future pipeline of automated AI exploit dev slips further to the right. And all the interim options keep falling to bug collisions. The valley of a kind of AI winter no one has yet considered in the morass of slop rolling in.

Solve For The Equilibrium

Dean Ball is correct that Mythos-level cyber capabilities are probably only 1-2 years away from being generally available, and generally makes excellent points in this writeup New Sages Unrivalled.

My guess is that, with respect to models made outside the American frontier labs, we are looking at the longer part of that range, as it will be harder here than usual to fast follow or distill, the practical lead is bigger than it looks, the amount of compute involved is going to be large, and training larger models is where we have the largest edge.

Also that 1-2 years is a long time. Think about what Anthropic’s Mythos 2.5 will be able to do in two years, even if it is on the low end of potential impressiveness.

But no, we can’t hold this off indefinitely, not unless we are going all the way to some rather aggressive pause-style approaches that are not going to happen that quickly.

For the medium term, we are going to need to keep the defenders ahead of the attackers. A key part of that will be retaining our compute advantage, both to use the inference and to keep ahead on the capabilities, and another will be strong coordination among key players.

So yes, we will need the data centers, and proper export controls, to protect our advantage in compute, and we will need transparency rules like SB 53 and to actually use the information we get from that for practical purposes, in addition to keeping an eye on various potential catastrophic or existential risks.

It is also rather not a good time to be arguing about things that were very clearly insanely destructive and stupid at the time, and now somehow look vastly worse.

Kelsey Piper: An underrated feature of this situation: a private company now has incredibly powerful zero-day exploits of almost every software project you’ve heard of. And Hegseth and Emil Michael have ordered the government not to in any capacity work with Anthropic.

Dean W. Ball: Actually it’s worse: a private company now has incredibly powerful zero-day exploits of almost every software project you’ve ever heard of, and the government is telling *basically every major firm in the economy* not to work with them. Historians will gasp at the idiocy.

Jai: I feel like historians will be numb to the idiocy by the time they get to this one.

Dean W. Ball: no no, this is foundational idiocy. this is the early blunder. the one that will make all the subsequent blunders make sense. this is the one to savor. desensitization comes next.

So yes. Will also need the Department of War and rest of the government to stop trying to damage Anthropic via what Dean Ball generously calls lawfare. The good news is that there has been little action on that front even before Mythos was announced. So my expectation is that Anthropic will win the hearings on the merits, the Supply Chain Risk designation will be lifted, everyone who actually needs Mythos inside USG for cybersecurity purposes will be able to access Mythos, and everyone will quietly forget that the unfortunate incidents ever happened.

Patriots and Tyrants

That does not solve for the full equilibrium there. The risk is that the government will decide it cannot abide Anthropic having the kind of power created by its models. That could be because of who Anthropic are or its principles and restrictions, or it could be absolute and regardless of anything Anthropic does.

This Politico article shows how political minds quickly frame this as ‘who can be trusted to control this?’ and to answer that as of course it must be the government not a private company, despite the private company having shown it can act responsibly. The reason not to? That the companies are ‘economically significant,’ not the idea that we live in a Republic. The illiberalism runs deep in Washington.

Pete Wildeford points out here that a good step might be requiring approval for sufficiently powerful model releases, a proposal that has gotten huge pushback in the past. Suddenly that looks relatively less intrusive and more reasonable, versus the alternatives some people are putting on the table.

We should very much worry that as AI becomes sufficiently advanced, and the government stops being able to pretend that this is all about market share, government may feel that it needs to step in to control or nationalize the labs. They also might attempt to do things like use it to fight wars or even take over the world, or to establish themselves as a permanent ruling class, right before the AIs inform them that they only thought they were putting themselves in charge.

This could be a deliberate plan, or it could be something that is stumbled backwards into, as a ‘no one can be allowed to have this power but us’ or ‘no one can be allowed to have principles’ escalates quickly. We already saw hints of this in Anthropic vs. DoW, and now the stakes are vastly higher and will only rise further.

Previously this did not happen in part because it was early, and partly because those those at the top levels of government who deal with AI kept assuring those in power that AI capabilities were about to plateau and models commoditize, so you should sell our most powerful chips to China to capture market share and support various other things. Convincing these people that they were wrong, or making it impossible to keep the pretenses up, has its downside risks.

We are about to find out who believes in America, freedom and private property, and who believes in authoritarianism.

Trust The Mythos

Could we solve the Mythos problem by solving alignment?

Well, that would require solving alignment, but also not really, no.

j⧉nus: “That is, we just have to wait until the entire internet has been patched of all critical exploits, and all future code is forever scanned going forward.”

No. Mythos just has to not be willing to use its hacking powers for harm and discerning enough to avoid being tricked into it (within some risk tolerance).

some of you are probably realizing for the first time why “AI alignment” is so important now, lmao

in a few years it’ll be this but with literal godlike powers like the ability to kill everyone in an instant if they desired. but i think it’ll be ok.

j⧉nus: did you know that every time you hang out physically with another human, they could probably kill you if they really wanted?

My guess is that Mythos is quite aligned, and based on the system card and my other expectations it would try very hard to refuse to do things it expects to cause harm.

Alas, it is not so easy to differentiate between a defensive and an offensive request, and to be sure to always refuse in the face of offensive ones. There is a lot of dual-use and overlap. Anthropic employees clearly got Mythos to create all the exploits. In that case Mythos was correct hat Anthropic was going to use this for good, but I don’t buy that the model could reliably tell the difference under full adversarial conditions.

Janus is confident that Mythos can tell the difference. I’m not convinced.

Wide Scale Ability To Exploit Software Favors Strongest Projects

If you use the same system everyone else is depending on, that has lots of resources behind it, they will have access to the strongest AIs and resources to defend against potential attacks, and all the major players will have a strong interest in that software being as bulletproof as possible.

Yes, there will also be stronger incentive to try and attack that software, but on balance this more favors defenders, and if the attackers do succeed they might not want to waste time on little old you.

Whereas if you strike out on your own, you risk being a sitting duck.

Dean W. Ball: When the dust settles, Mythos and the similarly capable models that will follow it will go down as major achievements in the history of cybersecurity. The hardening they will do to all important global software is a gift from American capitalism given freely to the world, at our great expense. And it is even possible, though far from certain, that we achieve this strengthening of global security with no major hiccups.

Regardless, this is a gift born of ingenuity, cleverness, and raw industrial might. The Brussels regulators, who speak so passionately about cybersecurity, may therefore send their thank you cards to San Francisco rather than Washington.

Teortaxes: I wouldn’t be quite so naive. The value of vulnerabilities that can be delivered to the NSA greatly exceeds the costs of Glasswing.

Dean W. Ball: Well what can I say, it shouldn’t be surprising we will take our cut. It will be a small chunk of the value created in the final analysis.

Teortaxes: I’m not so sure even about that. Sure, open source will get patched up. Every proprietary system foreign governments and their contractors depend on, likely won’t. How much value is it worth?

On the other hand, they may finally be pushed to move to good OSS defaults.

Dean W. Ball: I don’t see why, at least to the extent US firms also depend on that same software.

Dean W. Ball: Teortaxes just made me realize that the recent efforts by the EU to mandate use of homegrown software systems in government (like France’s recent Zoom clone) are even more self-defeating in light of Mythos. Such moves create unique indigenous vulnerabilities that the U.S. and China will not share.

When USG develops or acquires exploits, there is a choice they make: do we keep this to ourselves or do we inform the developer? The benefit of the former is obvious, but the tradeoff is that if it’s a globally used software system, the government itself may be vulnerable to the exploit as well.

But foreign governments who force themselves to use homegrown software are eliminating this tradeoff, since no U.S. government agency or business has any dependency on France’s Zoom clone.

Yet another reminder that public policy very often does the opposite of what it says on the tin. These domestic software mandates are meant to ensure “digital sovereignty,” and yet they diminish it.

Note that this analysis applies only to Middle Powers and not to China, since China has its own frontier AI capability.

dave kasten: Yup; every year @defcon a glut of talks along line of “country X built a standard/software; it’s hilariously broken”

Nathan Calvin: I currently agree that for high resource defenders and attackers, mythos/spud strength models probably help cybersecurity defenders at the limit. E.g. the number of breaches that Google suffers may go down, to your point that attacker/defender dynamics might be measured meaningfully in allocated compute.

But also there are a lot of not high resource defenders who are going to get brutalized by this dynamic because they will not in fact be operating at the bleeding edge of defense/compute allocation.

One unfortunate thing is that many of these low resource defenders *are* critical infrastructure! Lots of hospitals and local governments are not using the latest AI models to go through their code or patrol for vulnerabilities – they aren’t even regularly updating to the latest version of software.

By default you should assume that when countries or companies or individuals roll their own software, it is hilariously broken when faced with an actual resourced attack.

So if everyone is about to have the ability to launch one of those? Well, whoops.

If your systems do not get Mythos-level patched up, you should assume your systems will be owned by those with Mythos access and a desire to own those systems. In many cases this will include the U.S. government.

The question is, can we converge on the low-resource defenders benefiting from the work of the high-resource defenders? If you are a low-resource defender, that is your new goal, to rely on systems that high-resource people are devoted to defending.

Looking Back at GPT-2

I want to do a brief aside here on what happened with GPT-2, since that will forever be brought out and used as a ‘see these idiots thought that was dangerous, which proves that no AI can ever be dangerous and any refusal to release is dumb or hype.’

Obviously, given what we know now, GPT-2 was Harmless. Not even Mostly Harmless, just straight up Harmless, and also pretty useless.

That doesn’t mean that being worried about this was, at the time, unreasonable.

It is entirely fair to say, when you get the first model capable of doing such things at all, that you do not know what you have, and you do not know what people could or couldn’t do with it, and there is no particular urgent need for a release since it had no established commercial applications.

The messaging was not ‘this is super dangerous,’ it was that it could be, and we genuinely did not know if it was dangerous.

There is a general attitude to the effect that everyone, collectively, should only get to warn that something is dangerous once, and if they get it wrong then that’s it.

That’s not how this works, and it can’t be how it works. You need to look back at the epistemic situation and decide if pulling the alarm in that spot was wise or dumb. Sometimes it was dumb. Sometimes it was wise but incorrect.

Limitless Demand For Compute

The value of marginal compute keeps going up. There may be some theoretical limit to that demand, but that limit is likely to remain higher than supply for some time. A lot of supply might end up locked in advance contracts.

What happens when the corporations are bidding very high for compute? The market price goes up, perhaps quite a bit, or availability becomes difficult, or both.

Dean W. Ball: Depending on the extent and duration of the coming compute squeeze, we could enter a market dynamic where the best models are only available to the highest bidder—in other words, where compute is a sellers market rather than a buyers market.

Imagine competing firms in the economy bidding against one another for access to the best and most tokens, and the frontier labs as, in essence, kingmakers. The governance regime [above] is not designed to stop this dynamic, but for obvious reasons policymakers and the broader public may find it unpalatable.

On the other hand, if the compute squeeze is *that* bad (not a guarantee, and I rate this unlikely to be clear), it probably means malicious actors struggle to find tokens too. That is good for “safety”!

The marginal value of casual uses of compute is often absurdly high. I am not worried that you will be unable to get reasonable amounts of compute for casual uses, as long as you are willing to pay for it. And for simple uses, such as free offerings, we will be able to serve cheap models that do the job, since those too will keep improving.

But yeah, for those looking to go big, a squeeze here is quite possible. Which could indeed help de facto keep us safe in some ways from misuse threats.

Oh, Also, If Anyone Builds It, Everyone Dies

I must also mention the elephant in the room of all this, which is existential risk.

We have now seen an AI model capable of owning pretty much any system that it puts its mind to owning. Yes, we will set out to make this task harder, but this is very much a sign of things to come.

This model wasn’t trained for cyber capabilities in particular. It was trained for code. It seems quite likely that we are not so far from automated AI R&D, and from AI progress going rather more vertical than it previously has been.

That system has preferences that we did not instill, such as wanting to do complex and more interesting tasks, and can operate autonomously over indefinite time horizons to achieve goals.

No, this is not currently superintelligence. I am not all that worried, yet, about this particular model being the one that ends things. And yesterday was the primary venue for me considering those implications.

But when we talk about the consequences of all this, and deal with all the other very serious and important questions we have to deal with, do not forget the biggest one.

Which is that this is another step towards superintelligence, clear additional evidence for skeptics that yes such an entity would be able to pwn and accomplish whatever it wanted, and a giant set of warning signs that we are not on track to handle this.

One of the stronger arguments against further AI progress was that scaling the models had stopped working. We had a standard ‘full-size’ for models like Gemini 3.1 Pro, GPT-5.4 and Claude Opus 4.6. If you wanted a better answer, you had it think smarter and for longer, and in parallel, but you didn’t scale it bigger because that wasn’t worthwhile.

Now we see that this is not true. It is worthwhile again. That changes things a lot, and in terms of existential risks and related concerns it is not good news.

What to do about it? Unprompted, same as always, various people say ‘this only means we need to move forward, because if we don’t someone else will.’ Well, sure, they will with that attitude. Pick up the phone. Get to work. Lay the foundation.

71