AI Researchers and Executives Continue to Underestimate the Near-Future Risks of Open Models

Andrew Dickson

Note: This post is part of a broader series of posts about the difficult tradeoffs inherent in public access to powerful open source models. While this post highlights certain dangers of open models and discusses the possibility of global regulation, I am not, in general, against open source AI, or supportive of regulation of open source AI today. On the contrary, I believe open source software is, in general, one of humanity’s most important and valuable public goods. My goal in writing this post is to call attention to the risks and challenges around open models now, so we can use the time we still have before risks become extreme, to collectively explore viable alternatives to regulation, if indeed such alternatives exist.

I recently finished reading Dario Amodei’s “The Adolescence of Technology”, and overall, I loved it. The essay offers a prescient and captivating picture of the AI risks we are likely to face in the next 1-5 years based on the rapid evolution of AI, as well as some sensible proposals for defense. However, there is a major blind spot in Amodei’s account of this next phase of AI progress – namely, not once in the nearly 20,000 word essay does Amodei mention open source AI or open models, or include any discussion of open models at all in the picture he paints of the future.

This trend of leading AI researchers and executives choosing to omit open models from their near-future forecasts of AI risks, is not new – for example, I raised similar concerns with Daniel Kokotajlo et. al.’s “AI 2027”. But it is nonetheless problematic that the trend continues, because any account of the future that avoids discussing open models also inevitably avoids discussing the fact that we have no plan at all for defense against many of the most serious AI risks, when they arise from such models.

In the remainder of this piece I will make the argument that the omission of open models from near future forecasts by thought-leaders in AI matters a lot. There are many ways in which open models will be incredibly important to the future of AI risks and defenses, but by far the greatest issue with omitting them is that the existence of open models is quite likely to undermine most or all of the defenses proposed by Amodei in his essay.

Why Defense Against AI Risks from Open Models is Hard

There are several key features that make defense against AI risks from open models especially difficult.

1. Guardrails Can Be Easily Removed

One approach that companies like Anthropic frequently use to defend against AI risks in closed models is to build guardrails into their systems that severely constrain the behavior of the model itself. An example of this is Claude’s “Constitutional AI”, which Amodei discusses extensively in his essay as a key source of defense against risks like loss of control and misuse for destruction.

Unfortunately, guardrails like Constitutional AI (and similar finetuning or RLHF-based safeguards) offer little to no defense in the case of open models. One main reason for this is that many companies developing open models have typically included few significant guardrails in the first place. But the bigger issue is that even if guardrails are built into open models when they are released, today’s open-weight models remain vulnerable to fine-tuning that can remove or severely compromise such guardrails with relative ease. And there is no evidence that new approaches to training will be robust to such attacks in the future.

2. Use Cannot Be Monitored

Another strategy that is common to many of the defenses outlined in Amodei’s essay, is the strategy of directly monitoring end users’ interactions with the models, to identify and block concerning patterns of use as a separate step from the inference itself. For example, in the section “A Surprising and Terrible Empowerment” Amodei explains how Anthropic uses classifiers as an additional layer of defense to prevent Claude from replying to users prompts where dangerous misuse is suspected by the model – for instance, a request where the output of the model contains instructions on how to develop bioweapons. He writes,

But all models can be jailbroken, and so as a second line of defense, we’ve implemented… a classifier that specifically detects and blocks bioweapon-related outputs. We regularly upgrade and improve these classifiers, and have generally found them highly robust even against sophisticated adversarial attacks. [1]

Unfortunately, just as with guardrails, such classifiers cannot provide meaningful protection against misuse in open models – in this case, because if the user simply runs the open model on hardware they control, there is nothing to prevent them from disabling any classifier-style output filters and viewing model output for whatever prompts they wish. In such a scenario, there is no way for the creator of the model (or any other third-party) to monitor or prevent such scenarios of dangerous misuse in open models running on user-controlled hardware.

3. Bad Actors Have Access By Default

A third strategy that is common to many of the defenses that Amodei proposes, is attempting to restrict various types of bad actors from gaining access to powerful AI capabilities in the first place. For example, with respect to synthetic biology-related risks, he writes:

Advances in molecular biology have now significantly lowered the barrier to creating biological weapons (especially in terms of availability of materials), but it still takes an enormous amount of expertise in order to do so. I am concerned that a genius in everyone’s pocket could remove that barrier, essentially making everyone a PhD virologist who can be walked through the process of designing, synthesizing, and releasing a biological weapon step-by-step…. Most individual bad actors are disturbed individuals, so almost by definition their behavior is unpredictable and irrational—and it’s these bad actors, the unskilled ones, who might have stood to benefit the most from AI making it much easier to kill many people. [1]

With closed models like those developed by Anthropic, the model weights are stored securely on the company’s servers and by default the company gets to choose the conditions under which end users are allowed to utilize their capabilities – including whether to allow access at all. This default is important, because it means that, fundamentally, Anthropic is in a position to block any users who are violating its terms of service, or are using the models in dangerous ways.

However, the opposite is true in the case of open models, which are distributed globally and downloadable anonymously, since there is no way to prevent bad actors from gaining access and using such models for whatever they wish. While restricting the access of bad actors to models is a viable strategy for defense in closed models like Claude, it is not a viable strategy for defense in open models, because bad actors have access by default. From a high-level, this is the main reason that nearly all of the defenses Amodei argues for in his essay fail to work in open models – namely, the defenses Amodei proposes all make the assumptions that bad actors won’t have access to the model weights.

What Could Go Wrong?

So if it’s true that the defenses Amodei proposes in his essay are largely unworkable in open models, then what does the near future AI risk landscape really look like, assuming models like DeepSeek and Quen continue to be widely available and continue to lag the capabilities of the very best closed models by only 6-12 months, as they have in recent years?

Loss of Control

In a piece I wrote last year, “We Have No Plan for Loss of Control in Open Models”, I lay out the case that even if companies like Anthropic take control-related risks very seriously and develop all the defenses that Amodei describes in his essay, this will still be insufficient to manage the more general problem of loss of control on a global scale. The reason is that even if companies like Anthropic develop powerful defenses that enable them to maintain control of their internal AI systems like Claude, such defenses do nothing to prevent loss of control in powerful open models which will undoubtedly be deployed on a global scale, by a wide variety of actors, many of whom will likely put few or no control-related defenses in place. If we believe that loss of control of powerful AI systems is a risk that should be taken seriously – and most AI researchers do – we should be extremely concerned about the possibility of loss of control in open models, given that we have essentially no plan in place or defenses available to address that risk.

AI-Assisted Bioweapons and New Technology Development

Today, arguably the most urgent catastrophic AI risks are “misuse for destruction” risks – for example, the use of AI for bioweapons development, or potentially for developing dangerous “black or gray ball” technologies like mirror life. And evidence of this continues to mount – last year, researchers working with the best closed models inside frontier labs found that they can already outperform virologists in troubleshooting procedures and questions related to the kind of practical lab work required for creating and disseminating dangerous pathogens in the real world. Dan Hendrycks and and Laura Hiscott summarize the findings:

Across multiple biology benchmarks, LLMs are performing near expert level or higher. The [Virology Capabilities Test] results do not arrive in a vacuum, but as another data point in a growing field of benchmarks. For instance, on the Weapons of Mass Destruction Proxy (WMDP), which tests conceptual knowledge required for hazardous uses including bioweapons development, o1 scores around 87 percent. The baseline set by human experts is 60 percent. Since WMDP concentrates on theory, questions could still be raised around the practical applicability of LLMs that score highly on it. The VCT, with its complementary focus on addressing issues in the wet lab, appears to address those doubts. [15]

Policy researchers are also becoming increasingly concerned about such risks. For example, in January of this year, The Center for Strategic International Studies published a comprehensive study titled “Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism” which surveys a wide range of ways in which recent advances in AI models are rapidly lowering the barriers to planning and executing biological attacks and developing epidemic and pandemic-scale pathogens. According to the study:

1. Popular large language models (LLMs) could soon drastically lower the informational barriers for planning and executing biological attacks. Recent assessments of LLMs and other commercial AI capabilities indicate that models are “on the cusp” of meaningfully helping novices develop and acquire bioweapons by providing critical information and step-by-step guidance.

2. Future AI biological design tools (BDTs) could assist actors in developing more harmful or even novel epidemic- or pandemic-scale pathogens. Rapid advancements in state-of-the-art BDTs—illustrated by the foundation model Evo 2—point to a world in which more capable models could help develop new or enhanced pathogens and evade existing safeguards. [16]

As we have seen in previous sections, while safety mechanisms like Constitutional AI and classifiers can help prevent dangerous misuse in closed models like Claude, there are no such defenses available to prevent bad actors from accessing similar capabilities in open models, many of which have few, or no guardrails at all.

Surveillance and Authoritarian Control

In section 3 of his essay “The Odious Apparatus: Misuse for Seizing Power”, Amodei describes the near-future risks we face from state and corporate actors using powerful AI tools to impose forceful control over large populations. As he points out, such impositions of power could take many forms, including AI-powered mass surveillance, fully autonomous combat systems, AI-powered government propaganda and more. The picture that Amodei presents is complex and many-layered and is made more complicated by the fact that these risks could come from many actors, including authoritarian superpowers like the CCP, democracies competitive in AI, non-democratic companies with large data centers, and possibly even AI companies themselves.

The set of defenses he proposes to address these risks is equally multi-layered. However, the most common denominator to Amodei's proposals is that he believes that we must strive to prevent authoritarian regimes (and would-be regimes) from gaining access to powerful AI in the first place. As just one example of how we might do this, he writes,

First, we should absolutely not be selling chips, chip-making tools, or datacenters to the CCP. Chips and chip-making tools are the single greatest bottleneck to powerful AI, and blocking them is a simple but extremely effective measure, perhaps the most important single action we can take. It makes no sense to sell the CCP the tools with which to build an AI totalitarian state and possibly conquer us militarily. [1]

While Amodei may or may not be correct that US export controls are necessary, the issue with his analysis is that he presents export controls as far too decisive and impactful an intervention. He also fails to acknowledge that China has been extremely successful at developing and rolling-out AI-powered authoritarianism, even in the presence of such controls.

In fact, it’s possible export controls may even have accelerated Chinese innovation in AI – at least in some ways – as Jennifer Lind writes in the February edition of Foreign Affairs,

Starting in 2022, the United States and other countries imposed export controls on cutting-edge chips to slow the pace of China’s AI development. But these policies have also galvanized Chinese innovation. In 2025, Chinese AI company DeepSeek unveiled its R1 model, which performed comparably to top U.S. large language models despite being trained on a fraction of the chips typically used by rivals. [22]

The key point is, if the risk we’re worried about is AI-powered surveillance and totalitarian control in China and countries like it, then export controls are simply nothing like a sufficient defense against that risk.

On the contrary, China, Russia and other authoritarian governments around the world are already successfully roll out AI-powered surveillance and authoritarianism, using open models like DeepSeek and Kimi. These models are close to the frontier of capability in any case and there is little evidence that additional export controls on China would significantly slow the rollout of global AI-powered authoritarianism. And this will be even more true if AI companies like Anthropic continue to partner with some of the most notorious authoritarian regimes in the world around the development of powerful AI.

Global Surveillance and High-Tech Panopticon

Given the seriousness of AI risks from open models (and the lack of good defenses against them) it is reasonable to ask why so many researchers and thought-leaders fail to include any discussion of open models in their discussions of near-future AI risks. To try to answer this question, I have participated in a number of conversations with such thinkers in an effort to better understand their point of view. In these conversations, by far the most common argument is that closed models will simply be so far ahead during the times that matter the most, that any threat that open models might pose will be easily neutralized by AI companies or governments controlling more powerful closed models at that time.

One public instance of such an exchange was with Daniel Kokotajlo in the comments to my critique of his AI 2027, where we discuss this position. I write (replying to a previous comment of Kokotajlo’s):

So to make sure I understand your perspective, it sounds like you believe that open models will continue to be widely available and will continue to lag about a year behind the very best frontier models for the foreseeable future. But that they will simply be so underwhelming compared to the very best closed models that nothing significant on the world stage will come from it by 2030 (the year your scenario model runs to), even with (presumably) millions of developers building on open models by that point? And that you have such a high confidence in this underwhelmingness that open models are simply not worth mentioning at all. Is that all correct?... [2]

To which Kokotajlo replies:

We didn't talk about this much, but we did think about it a little bit. I'm not confident. But my take is that yeah, maybe in 2028 some minor lab somewhere releases an open-weights equivalent of the Feb 2027 model (this is not at all guaranteed btw, given what else is going on at the time, and given the obvious risks of doing so!) but at that point things are just moving very quickly. There's an army of superintelligences being deployed aggressively into the economy and military. Any terrorist group building a bioweapon using this open-weights model would probably be discovered and shut down, as the surveillance abilities of the army of superintelligences (especially once they get access to US intelligence community infrastructure and data) would be unprecedented. And even if some terrorist group did scrape together some mirror life stuff midway through 2028... it wouldn't even matter that much I think, because mirror life is no longer so threatening at that point. The army of superintelligences would know just what to do to stop it, and if somehow it's impossible to stop, they would know just what to do to minimize the damage and keep people safe as the biosphere gets wrecked…. [2]

As we can see from Kokotajlo’s reply, the reason the authors of AI 2027 believe that open models will be largely irrelevant to the future of AI risks, is that (they believe) closed models in the hands of global superpowers will be powerful enough to directly neutralize any threat that open might models pose.

While I can understand this perspective, it is far from obvious to me that things will play out this way. At a minimum, Kokotajlo’s position appears to depend on the assumption that a democratic superpower like the United States will roll out a globally ubiquitous system of government surveillance and military intervention on most or all open source AI users in the world, (perhaps similar to a “lite” version of Nick Bostrom’s “high-tech panopticon”) in just the next two years (i.e. rollout completed by 2028 or so). If true, why is this not mentioned at any point in the account of the future that the authors of “AI 2027” present? It seems like a significant detail, especially since there are many events that occur after the year 2028 in their account of the future which appear to contradict the idea that this level of monitoring of technologists is in place globally. And more critically, we should also be asking: is a near-term rollout of global high-tech surveillance with military intervention something that is realistic or desirable at all?

The practical reality is that few AI leaders today are willing to publicly advocate for global surveillance initiatives of the sort described by researchers like Kokotajlo and Bostrom, especially in the near-term. And in many cases thought leaders are much more likely to argue for the opposite. For example, in “The Adolescence of Technology”, Amodei makes the case that AI surveillance by major governments, including democracies, is something we must be very cautious of. He writes,

The world needs to understand the dark potential of powerful AI in the hands of autocrats, and to recognize that certain uses of AI amount to an attempt to permanently steal their freedom and impose a totalitarian state from which they can’t escape. I would even argue that in some cases, large-scale surveillance with powerful AI, mass propaganda with powerful AI, and certain types of offensive uses of fully autonomous weapons should be considered crimes against humanity. More generally, a robust norm against AI-enabled totalitarianism and all its tools and instruments is sorely needed. [1]

While I strongly agree with Amodei’s take on the risks and dangers of global surveillance solutions, the problem with him taking this stance is that there are no proposals currently on the table for how to deal with escalating threats from open models, other than something like global surveillance or high-tech panopticon. The elephant in the room with near-future forecasts like Amodei’s and Kokotajlo’s is that there may be no way for us to avoid an AI-powered catastrophe – like an AI-engineered pandemic, or loss of control of a powerful AI system – without significantly compromising many of the rights and freedoms we hold most dear. Both authors’ near-future forecasts conveniently avoid this unfortunate difficulty by simply omitting any discussion of open models at all.

Closed Models Are Not Far Enough Ahead

In addition to the question of whether global surveillance solutions would be a good thing or a bad thing, we also have the much more practical question of whether such solutions could be rolled out in time. The argument of researchers like Kokotajlo tends to be that panopticon can be rolled out in almost no time (e.g. weeks or months) during a “fast takeoff”-style “intelligence explosion”, because, in his words “There's an army of superintelligences being deployed aggressively into the economy and military.” [2]

But I tend to doubt this claim for a number of reasons. If we look at the world today (February 2026), as discussed above, we are already facing real-world evidence of uplift capabilities for bioweapons development in leading models. Therefore it is worthwhile to ask, how long would it take us to roll out the kind of global surveillance that researchers like Bostrom and Kokotajlo contemplate if we had to do so today? The answer is “a very long time” and the reason is that the bioweapon risk is already here in an early form, but the “army of superintelligences” is nowhere to be found.

The even bigger issue though, is that it is simply not realistic from an international relations standpoint for any single country to roll out a full program of global surveillance and military intervention unilaterally, even if powerful superintelligence gave it the physical or technical capability to do so. While policy researchers at the Brookings Institute have recently made policy recommendations for what the beginnings of a collaboration related to global surveillance could look like between the US and China, the tepid nature of such proposals (e.g. “First, China and the United States can revive intergovernmental dialogue on AI” [25]) serves more to highlight how difficult a real collaboration around a global surveillance program would be, rather than supporting the claim that such a collaboration is likely to materialize quickly.

Based on these difficulties, it should be clear that we cannot count on global surveillance or high-tech panopticons to serve as reliable defenses against AI risks from open models – at least not in the short term. There simply isn’t enough time. Real AI risks in open models are already emerging, and closed models simply aren’t far enough ahead and aren’t providing enough superpowered capabilities to stop them.

Open Models Are An Important Public Good

Whenever I participate in conversations about global surveillance and panopticon-style solutions with researchers like Kokotajlo, I also realize how close we may be, as a global technology community, to losing access to open models and open source AI for good. It’s important to recognize how tragic this outcome would be, since open models currently serve as one of the few checks and balances on the incredible power that the frontier labs are amassing – a power that threatens to centralize control of the future of AI in the hands of a small circle of billionaires and tech elites.

As important as it is that we avoid near-term threats like bioterrorism, cyber warfare and loss-of-control, we must be equally concerned with avoiding a future where a small group of tech elites or or wealthy individuals with the first access to powerful AIs are able to lock-in their power for the long-term – or perhaps forever. Amodei himself acknowledges this in his essay, writing,

Broadly, I am supportive of arming democracies with the tools needed to defeat autocracies in the age of AI—I simply don’t think there is any other way. But we cannot ignore the potential for abuse of these technologies by democratic governments themselves….Thus, we should arm democracies with AI, but we should do so carefully and within limits: they are the immune system we need to fight autocracies, but like the immune system, there is some risk of them turning on us and becoming a threat themselves. [1]

He is not wrong. And yet, at the same time we are facing catastrophic risks from open models, where most of the proposals currently on the table to address them involve exactly the instruments of control that Amodei fears.

What Can Be Done About AI Risks in Open Models?

This is the hard question. And the evidence that it is hard is that we still have no workable proposals for how to defend humanity against catastrophic risks from open models. The defenses outlined by Amodei in “The Adolescence of Technology” are mostly ineffective against open models, so the closest thing to a proposal we have are these ideas of global surveillance or high-tech panopticon. But such proposals come with their own risks of AI-powered authoritarian lock-in. And on top of that, there are real doubts about whether such solutions could be rolled out and enforced on a global scale in time. Meanwhile, the first versions of risks like AI-accelerated bioweapons development and AI-powered authoritarianism are already present in the real world today [15][16][22][23].

While we don’t have good answers to these questions yet, we can no longer shy away from an honest discussion of risks from open models in our near-future forecasts of AI progress. Whether the risk is a loss of control, dangerous misuse like bioweapons development, or use by authoritarian regimes for oppression, it must be clear by now that there is no one person or company or even government that can unilaterally provide sufficient defenses on their own against AI risks from open models. Given the above, it is deeply problematic that forecasts like “The Adolescence of Technology” and “AI 2027” have chosen to completely omit any discussion of open models (and open source AI) from the accounts they give of the future. Doing so sends a message to policymakers and the general public that the only AI models that matter for AI risks are those inside frontier labs, when nothing could be further from the truth.

If Daniel Kokotajlo and the other authors of “AI 2027” believe that rapid rollout of a high-tech global surveillance system with military enforcement will be required by 2028 to avoid a catastrophic bioweapons attack based on open models, then they must be explicit about this in the picture of the future they present in their piece.

And we must hold Dario Amoedi to the same standards of realism in his essay “The Adolescence of Technology”. In the essay, Amodei states that his goal is “.... to confront the rite of passage [of developing powerful AI] itself: to map out the risks that we are about to face and try to begin making a battle plan to defeat them.” [1] If we take this at face value, then his omission of any discussion of open models is unconscionable. Because open models present a number of urgent and potentially catastrophic AI risks – and Amodei’s “battle plan” offers no defenses that can address them.

References

[1] The Adolescence of Technology

[2] It Is Untenable That Near-Future AI Scenario Models Like “AI 2027” Don't Include Open Source AI

[3] AI 2027

[4] We Have No Plan for Preventing Loss of Control in Open Models

[5] LLM Guardrails: A Detailed Guide on Safeguarding LLMs

[6] Constitutional AI: Harmlessness from AI Feedback

[7] Evaluating Security Risk in DeepSeek and Other Frontier Reasoning Models

[8] BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B

[9] On Evaluating The Durability Of Safeguards For Open-Weight LLMs

[10] AI jailbreaks: What they are and how they can be mitigated

[11] Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks

[12] Open-weight models lag state-of-the-art by around 3 months on average

[13] The Alignment Problem from a Deep Learning Perspective

[14] The Vulnerable World Hypothesis

[15] AIs Are Disseminating Expert-Level Virology Skills

[16] Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism

[17] Deep Research System Card

[18] Biology AI models are scaling 2-4x per year after rapid growth from 2019-2021

[19] AI can now model and design the genetic code for all domains of life with Evo 2

[20] AI and biosecurity: The need for governance: Governments should evaluate advanced models and if needed impose safety measures

[21] The New AI Chip Export Policy to China: Strategically Incoherent and Unenforceable

[22] China’s Smart Authoritarianism

[23] From predicting dissent to programming power; analyzing AI-driven authoritarian governance in the Middle East through TRIAD framework,

[24] Leaked Memo: Anthropic CEO Says the Company Will Pursue Gulf State Investments After All,

[25] AI risks from non-state actors

[-]Mordechai Rorvig4d10

This is a helpful analysis and rundown. I've seen quite a bit of annoying marketing of open models in different places, which doesn't take into account any of these problems, some of which are so obvious. I hadn't thought about Panopticon as being the de facto solution to this. I also like your point, "Bioweapon risk is already here in an early form, but the “army of superintelligences” is nowhere to be found." While I have plenty of interest in more speculative, forward-looking analyses, those also have to get jerked back down to reality pretty hard when we have evidence that contradicts them.

LESSWRONG
LW