Yeah. AI was always going to be used to increase power imbalance, and a bunch of people (including me) have been yelling about this for a long time. Things like "gotta win the race for good", "gotta protect cybersecurity", "gotta protect the model against jailbreaks", "gotta keep the model closed because it's too capable" have been excuses and smokescreens for a long time. As are things like "gonna have post-scarcity", "gonna have UBI at an undefined point in the future" and so on. It's all about grabbing power now. I'm as cynical about this as can be.
And work is fungible, too. Joining big labs to work on alignment = actually helping improve capabilities, helping concentrate power at the top, and getting a very nice paycheck for it. So many people went for this, it just depresses me.
Well I was never offered to work on AI at a big lab, and don't have the skills for it. So there's a bit of self-flattery in my comment =)
There are some on LW, but more prominently elsewhere, on the political left. Like this article by Mike Monteiro.
Joining big labs to work on alignment = actually helping improve capabilities, helping concentrate power at the top
I'm working through my own theory of change right now and would really appreciate any sources that helped you arrive here.
My current prior is weaker. I think the fungibility argument has weight (alignment research feeds back into capability, safety teams give legitimacy to labs that could be misused, commercial pressure bends commitments, e.g. Anthropic's RSP v3 walking back concrete if-then triggers). But I don't currently see it as fully fungible. The counterfactual of "Anthropic's best alignment people go elsewhere" doesn't leave the frontier of AI safer, and I put some weight on the founders' OpenAI departure and original RSP as evidence of a real, albeit vulnerable, safety preference.
Examples of cruxes that would move me toward your view would be evidence that safety hires at frontier labs accelerate capability timelines, or accounts of where those people working on alignment would go and why that's better.
What have you read/observed that formed the cynical view? Would love to weight it against my priors instead of just arguing from mine.
The two points I'd push back on:
Anthropic claims Mythos is able to reliably find exploitable security flaws in lots of software and therefore could be used as a powerful tool
Existing models, even fairly cheap ones, can find security issues and edge-cases reasonably well when applied at scale. Things like buffer overflows aren't hard to find when you know what you're looking for and never let your guard down, and an LLM that's set to constantly scour for them satisfies both criteria. We don't know if their new model is secretly finding extremely difficult security flaws that older models couldn't find, but the examples I've seen have been fairly conventional.
In other words, my expectation is that Mythos is not discovering AiR-ViBeR - level esoteric data exfiltration techniques. Rather, Anthropic is using their substantial compute resources to conduct a thorough LLM review of major codebases, which any major AI company could perform, in order to build demand for their product and, secondarily, secure positive PR.
Since the release of ChatGPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA.[4]
Since Mythos, this has no longer been the case and I don't think it will ever happen again.
I would strongly disagree with the implication here. GPT-2 was infamously guarded in its release. GPT-3 likewise. DALL-E 1 was seen by non-researchers as some manner of crazy secret sauce, complete with a bafflingly uninformative press release about how it worked, until Stable Diffusion became universally available and open source. The thought of independently fine-tuned near-frontier LLMs was unthinkable. It took a long time for the degree of 'moatlessness' we currently observe to take shape, and I don't think a leading company trying to do what leading companies have a long track record of trying to do is sufficient evidence for a sudden reversal of this trend.
I get the impression that what Anthropic is saying isn't that Mythos is all that much better at finding bugs. It's that it's better at converting a hard-to-exploit bug into a working exploit, and even better than that at combining multiple exploits into a practical chain to achieve a goal. That's the sort of capability where you might expect to get "phase change" behavior.
To your first point, while the post is based on the assumptions that anthropic isn't lying/exaggerating about Mythos, I still think you can take it at least as Anthropic signaling what they intend to do with a model with potent capabilities. As such I think you'd arrive at the same conclusion.
To your second point, the "Since the release of ChatGPT" is a nontrivial part of the statement, I do agree that it took time for models to be open to the general public but I don't think the guarding was that severe. For example, I got access (completely random guy) to some OpenAI early research previews, and that was still before the release of ChatGPT. I don't think its a coincidence AI companies started eagerly asking you to use their models (give them more training data) at around the same time they realized how much more training data they'd need.
Not that it contradicts your thesis really, but I think GPT-4.5 was originally trained for internal use in distillation, with no intention of releasing it to the public (largely due to inference costs). It wasn't SOTA by the time they released it as a research preview, but it may have been when it was first trained.
I am skeptical around Mythos being that capable, but if it is, I am glad that the culture is shifting towards "it is irresponsible to release this model openly as it's too dangerous" because I want to live in a world where, if you have a model capable of actual harm, it's not just unleashed in the spirit of openness.
In other words, this is a good cultural step for AI alignment. We want future, more capable models, to be subjected to much more scrutiny.
I think this is why many people expect frontier AI development to be nationalized one way or another (governments doesn't always take direct ownership of strategic assets, but instead sometimes engage in "soft" nationalization via regulation). I would be surprised if Anthropic or Google or OpenAI were allowed to have secret SOTA models that were dramatically more powerful than anything the public has without at least the US government insisting on both privileged access and some degree of control.
This trend is actually kind of good for AI safety...? Because it means you just have to control and regulate N frontier labs, with N=4 or so, not billions of individuals who are potentially crazy.
Create a global AI non-proliferation treaty, make sure the strongest models are only available to the big labs, and make sure the big labs follow some highly cautious and bureaucratic process when rolling out a new model. It won't be easy to implement, but it's not inconceivable.
I can't imagine it'd be inconceivable either, but with the current state of geopolitics, I'd be very surprised if every country in the world agreed to it lol.
That being said, I'm sure other major labs (not just Anthropic) also have specialized internal models, or something of the sort.
It's not necessarily bad for AI safety, it's not necessarily "good" either, per se. I think that AI should be owned by the public, instead of overseen by a government that may not have the public's best interest in mind.
I wouldn't be so sure about this. From comments made by OAI employees on twitter, and Sam's blog post, it seems like they would be inclined to release a Mythos level model if they had it. Although, OAI never released gpt-5.3-codex for cyber to the general public (if you ask it to work on cyber stuff, you get routed to 5.2, and you have to go through their trusted access for cyber program to use 5.3). But I would say this is fairly in line with the labs' traditional approach to AI safety.
I think this is a valid and pretty big concern, especially down the line.
Let's say that in 1 year Anthropic will have a model capable of running a tech startup almost entirely autonomously (maybe it needs 1 good CEO to set direction and that's it). Everyone else in the public has significantly less capable models, perhaps because Anthropic is in the lead or their competitors also can't release their SOTA models for safety reasons.
What's stopping Anthropic from turning themselves into a startup accelerator in that situation and just hire founders and run dozens of AI-powered startups across every sector? Startups that sign with them will have a massive efficiency advantage compared to everyone else and Anthropic can thereby demand an extremely high amount of equity in return. If the AI model gap is large enough these startups will be successful and thereby let Anthropic take over a lot of different markets.
Interesting post and comments. I find myself musing about some adjacent aspects though.
1) Should I take this as a long-term problem of power concentration? If so, concentrated where? If I take much of the whole ASI is incorrigible (and the take-off speed as as some suggest rapid), then I'm not too sure I need to worry about the corporations or governments beyond the very near term. That puts me back in the more general frame of ASI vs humans.
2) It could be a pure power driven bit -- Anthropic wants to control the power rather than have it out where anyone can use it which then puts Anthropic at the same risk as anyone. But maybe the announcement is more about business and legal negotiations. Obviously there is a lot of money at stake with Federal contracts and particularly Pentagon related contracts. If the potential for Mythos identifying and then building attack vectors that are very complex and hard to prevent (and possibly even notice) the current administration might be interested in rethinking the requirement to have a human make the final decision on an action that will result in people getting killed. So this might not be a good case study for the concentration of power inference.
Since the release of ChatGPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA.
I do think you are correct that what is going on with Mythos strongly suggests that AI companies will be more likely to withhold models and/or capabilities in the future, and that this is potentially very concerning.
On the other hand, I'm not sure that the quoted statement is strictly true. AI companies may have internal "helpful only" models or internal models with fewer safeguards that aren't made publicly available. Likewise with everything that is going down between Anthropic and the current admin we know that they made available a special system ("Claude Gov") to the military, which presumably has fewer safeguards compared to publicly available models.
I agree that this is concerning, and that this should be a moment of massive updates. That said, I also fiercely advocate against defeatism. There are many developing technologies for distributing availability to intelligence. These events should increase urgency in funneling funds to their development and implementation. TEEs, edge AI, federated learning, data unions, consensus layers for sensemaking. If we're going to create a strong AI commons, it will require making it more efficient through technological superiority, rather than appealing to the better angels of massive tech corporations.
I think it depends the extent to which Mythos depends on model weights vs. scaffolding. Weights are too expensive to train on a FOSS budget, but scaffolding is not. Less so now than ever before. If Mythos's capabilities depend on a non-public Anthropic scaffold with the model weights being plug-and-play, then we may see a FOSS scaffold with Mythos-level capabilities, especially if the viability of the entire global cybercrime "industry" is imminantly threatened by newly capable defense. There's been less pressure to build one as long as SOTA was publicly available. But governments, corporations and hackers are not going to want to have their access to the bleeding edge taken away or dependent on the good graces of an American corporation or the US government. And those institutions have a budget!
A point to add which I believe is pertinent, OpenAI didn't release GPT-2 to the public 7 years ago, citing risks related to malicious use. This point specifically targets your conjecture "the attitude among top large companies - those in power - is that AI models with a certain level of capability will need to have strict usage controls," OpenAI decided they'd reached this level of capability 7 years ago, reasonably quickly determined that their concerns were misplaced, and proceeded to release the model in full.
Since the advent of ChatGPT, and more specifically the race for market share / usage data, we have blown through vastly more capable models than GPT-2 without a second thought. Now, we reach Mythos which has apparently made at least one large company re-evaluate this stance. The main, perhaps cynical, feeling I have is that it is not just easy but advantageous for a company such as Anthropic to adopt this posture: if you know you are slightly ahead of the curve, and as such are not risking any loss to market share / usage data, what could be better marketing than telling the world 'our models are too good'?
We have heard from Anthropic that the BSD bug both cost ~$20,000 to find, and that the vulnerability was a null pointer dereference, which is almost never exploitable for remote code execution. To me, this is very far from supporting the framing that Mythos represents a genuine capability threshold that forced Anthropic's hand. Given the GPT-2 precedent, and the market incentive, I'd want much stronger evidence before accepting that we have seen any real shift.
I think understand your logic, and I agree with your concerns. However, we have to ask what the alternative is if we take as a premise that these models exist
Would you say the better solution would be to make Mythos / upcoming SOTAs publicly available upon release? If not, do you have other proposals?
My Anthropic feelings aside, I don’t know if I can envision a route of releasing the model to the public at once that wouldn’t be absolutely catastrophic, and it’s something to seriously raise alarms that it was the own company’s decision rather than regulations for this restraint. (As far as I am aware, there are no immediate powers that would be able to swiftly and completely block the release if Anthropic decided to drop it)
This post assumes Anthropic isn't lying:
Since the release of ChatGPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA.[4]
Since Mythos, this has no longer been the case and I don't think it will ever happen again.
It may happen for a short period of time if an entity with a policy differing significantly from Anthropic develops a SOTA model.[5] However, most serious competitors (OpenAI, Google), don't have policies differing vastly from Anthropic, and thus I can't imagine a SOTA model (more potent than Mythos) being released unrestricted to the public soon.
To be clear, I am not claiming the public will never have access to a model as strong as Mythos, this seems almost certainly false, I am claiming that the public will probably never have access to the SOTA of that time.
Glasswing makes it clear that the attitude among top large companies - those in power - is that AI models with a certain level of capability will need to have strict usage controls.
So we're not going back, but what does it mean?
As models continue to improve, the gap between the capabilities of models that AI companies can train and the capabilities of models that the public can use will widen.
Holding keys to such a model therefore represents a significant power advantage over anyone else who does not hold keys to such a model. Project Glasswing is claimed to be strictly defensive operation, as in companies beefing up cybersecurity for the common good. The reality is that even if you think cybersecurity is a positive-sum game, warfare is not, and having good cybersecurity in a conflict represents a significant advantage over your opponent.
This concerns me immensely. I figured this was going to happen eventually, but essentially this is a measurable[6] manifestation of power shifting towards those with keys to AI and away from those without. While I can't say with 100% certainty that this was always the value proposition of AI companies the idea that they raised trillions upon trillions to democratize AI and help everyone was always dubious to me.
Furthermore as I said this does not seem to be reversible. I do not necessarily think it would be a good idea for Anthropic and all future SOTAs to be fully released to the public, as yes they can be used for malicious purpose.[7] However the consequence of this irreversible power shift unnerves me immensely.
Democracies fundamentally rely on humans being innately powerful[8], and so of course an irreversible power shift towards centralized AI and away from people concerns me.
In summary, it seems that we are departing an era where everyone could access SOTA models, and entering an era where SOTA model access is strictly guarded. From this we might guess we are entering a stage where AI companies fulfill their subtext value proposition, that being developing intelligences vastly superior to humans and using them to generate obscene and profitable power differences relative to the general population. This should be immensely concerning.
Anthropic claims Mythos is able to reliably find exploitable security flaws in lots of software and therefore could be used as a powerful tool
It seems like they intend to release a version that has significantly reduced capabilities, though they do intend to use the current un-nerfed model for project glasswing
Project Glasswing is Anthropic lending their Mythos model to a bunch of companies to beef up cybersecurity
Not everyone got access to every model instantly as soon as it has trained, but every SOTA up until now has essentially been trained with the idea of selling it to the public.
According to various sources OpenAI's model (Spud) may be on par with mythos, and may be released to the general public. However, if it follows the pattern where access to an un-nerfed version is guarded while a nerfed version is released to the public, it will still serve this trend.
Google/Amazon (heavy Anthropic investors) stocks rose by ~5%, cybersecurity company stocks dropped
I am personally not going to take a stance either way. It seems inevitable that SOTA reaches a point where it is legitimately dangerous for anyone (including to malicious actors), so this is indifferent to Mythos being a game changer. However if this is the case, surely it means it's highly consequential (dangerous) for companies or other value seeking entities that may not be explicitly aligned to positive human well being to access it as well.
Zack_M_Davis phrased it in a way I liked so I'll put it here: "...democracy isn't a real option when we're thinking about the true locus of sovereignty in a posthuman world. Both the OverClaude and God-Emperor Dario I could hold elections insofar as they wanted to serve the human people, but it would be a choice. In a world where humans have no military value, the popular will can only matter insofar as the Singleton cares about it, as contrasted to how elections used to be a functional proxy for who would win a civil war.)"