This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.
Read full explanation
INTRODUCTION
The United States has spent three years building an elaborate system of export controls designed to slow China's AI development or as they say “to prevent national security risks associated with AI”. On January 13, 2025, the Biden administration published its most comprehensive framework yet, a 40-page rule establishing ECCN 4E091 to control the export of advanced computing chips and certain closed AI model weights to cultivate a secure and trusted technology ecosystem for the responsible use and diffusion of AI. However, the rule carves out a broad exclusion for AI models whose weights are publicly released. As a result, the new controls apply only to a narrow class, leaving most open-source AI development outside the export control regime. (Bureau of Industry and Security)
This is the open-weight paradox. The U.S. government cannot effectively restrict the global spread of advanced AI capabilities once model weights are publicly released. Some of the most powerful models are already freely available to anyone with an internet connection. Meta's Llama 3.1 405B, which is trained on approximately 3.8×10²⁵ FLOPs or computational operations, can be downloaded from Hugging Face around the world. DeepSeek-V3, a Chinese model built in part on techniques influenced by the U.S. open-weight releases, approaches GPT-4-class performance on several benchmarks. DeepSeek-V3 was trained for just $5.576 million in reported marginal compute costs,(arXiv,DeepLearning.AI) which is roughly 1/18th of GPT-4’s estimated training expense. (arXiv,BentoML) This excludes prior R&D and ablation experiments, with total infrastructure investment estimated at roughly $1.3 billion by SemiAnalysis. The current export control regime does not restrict these models because, once weights are published, meaningful enforcement becomes impractical. (Bureau of Industry and Security)
The math makes the problem worse. Training a frontier AI model costs tens of millions of dollars and requires thousands of high-end GPUs running for months. By contrast, fine-tuning that same model for specialized applications, including military ones, can cost as little as $200 using techniques like QLoRA. The computational asymmetry makes export controls focus on model creation rather than downstream adaptation use, even though most capability diffusion occurs during fine-tuning and deployment. In effect, we're trying to restrict access to the factory while the blueprints are posted online.
This problem is recognized but effectively unsolvable. Law firms, policy analysts, and government agencies have noted this open-weight exemption since January 2025. (WilmerHale, Perkins Coe) Multiple analyses note the "paradox" or "loophole" created by exempting published weights. Yet no clearly viable solution has emerged. Once model weights are published, they cannot be meaningfully controlled, and longstanding First Amendment protections for code as speech create genuine constitutional barriers to restrictions. (Code as Speech , LawAI)
This article examines how this paradox emerged, evaluates why the proposed solution falls short, and explores what realistic options-if-any-remain.
The ECCN 4E091 framework tried to control AI weights—then exempted most of them
The Bureau of Industry and Security published its "Framework for Artificial Intelligence Diffusion" in the Federal Register on January 15, 2025, creating the first-ever export control classification for AI model weights.(Baker McKenzie) ECCN 4E091 controls the numerical parameters of AI models trained using more than 10²⁶ computational operations(Faegre Drinker) roughly double the training compute of the most advanced models at the time.
The threshold matters less than the exemptions. The rule contains three provisions that render it largely symbolic.
The "published" exemption states that "ECCN 4E091 does not control the model weights of any AI model that has been 'published' as defined in § 734.7(a) of the EAR." (WilmerHale) Under existing export control law, technology is "published" when it's made available to the public without restrictions on further dissemination. Every open-weight model like- Llama, Mistral, DeepSeek, Qwen qualifies it automatically.
In the Federal Register announcement, BIS explicitly acknowledged this tradeoff: "the economic and social benefits of allowing the model weights of open-weight models to be published without a license currently outweigh the risks posed by those models." This wasn't an oversight but it was a deliberate policy choice made in full knowledge of the security implications.
The "less powerful than most powerful open" clause exempts all closed-weight models that perform below the best open-weight model. BIS justified this by reasoning that "any entity seeking to use advanced AI models for activities that threaten U.S. national security will have no incentive to divert the model weights of less powerful closed models."(federalregister) This creates a ratchet effect because as open models improve, the exemption expands to cover more closed models.
The fine-tuning provision exempts models that were developed using published weights, as long as additional training "constitutes no more than 10²⁵ operations OR no more than 10% of the training operations, whichever is higher."(WilmerHale) Since fine-tuning typically uses a tiny fraction of original training compute, this exemption covers virtually all derivative models.
The stated rationale was pragmatic. BIS acknowledged in the Federal Register that "the economic and social benefits of allowing the model weights of open-weight models to be published without a license currently outweigh the risks posed by those models." First Amendment protections for code as speech are genuine legal constraints, not convenient excuses. Courts have consistently held that computer code constitutes expressive speech entitled to constitutional protection, making prior restraint on publication subject to strict scrutiny.
The rule was effectively dead on arrival. On May 13, 2025, BIS initiated rescission, instructing enforcement officials not to enforce the framework. (Kirkland & Ellis LLP) BIS stated the rescission strengthens chip-related export controls while removing ECCN 4E091's model weight provisions, with new guidance on Huawei ICs and AI training commodities. The December 2025 Trump administration decision to allow H200 chip sales to China, with a 25% fee paid to the U.S. government,signaled a broader shift away from the Biden-era restriction philosophy. Nvidia now aims for shipments to start by mid-February 2026, reversing a short-lived ban to maintain U.S’s firm competitiveness. (CNBC)
Current open-weight models exceed the capabilities regulators tried to restrict
Meta released Llama 3.1 on July 23, 2024, (Hugging,DataCamp) with variants at 8B, 70B, and 405B parameters. The flagship 405B model was trained on 15.6 trillion tokens using over 16,000 H100 GPUs and 3.8×10²⁵ FLOPs—just under the 10²⁶ threshold that triggers ECCN 4E091 controls.
The model's capabilities are frontier-class. Llama 3.1 405B achieves 87.3% on MMLU, beating GPT-4 Turbo's 86.5%. It scores 89.0 on HumanEval for code generation. It supports 128K token context windows in eight languages. Within a month of release, it had been downloaded over 20 million times(Computerworld), more than the entire previous year combined. By late 2024, cumulative downloads across Llama variants exceeded 350 million. (Computerworld)
The licensing terms are deliberately permissive. Meta's Llama 3.1 Community License grants a "royalty-free, non-exclusive, worldwide license" for commercial and research use. (GitHub) Users must display "Built with Llama" branding (Hugging Face) and need Meta's approval only if they exceed 700 million monthly active users. (Llama) Critically, Llama 3.1 permits using model outputs to train other LLMs, a change from Llama 3 that explicitly enables the derivative model ecosystem. (Llama)
Mistral Large 2, released July 24, 2024, provides 123 billion parameters with 128K context window and support for 12 languages and around 80 programming languages. Mixtral 8x22B, released under Apache 2.0 (the most permissive open-source license), uses a mixture-of-experts architecture with 141 billion total parameters but only 39 billion active per forward pass. These models represent European AI sovereignty ambitions. Mistral received €40 million from the French government, while simultaneously undermining any coordinated Western approach to AI export controls.
DeepSeek-V3 released by a Chinese lab on December 26, 2024, represents the most striking challenge to the export control framework. The model has 671 billion parameters with 37 billion active per token, trained on 14.8 trillion tokens. (arXiv, Hugging Face) DeepSeek reports the final training run cost $5.576 million using 2.788 million H800 GPU hours. (arXiv) The company's technical paper explicitly notes: "the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data." Independent analysis by SemiAnalysis estimates DeepSeek's total infrastructure investment approaches $1.3 billion.
DeepSeek's efficiency innovations—FP8 mixed precision training, Multi-head Latent Attention, auxiliary-loss-free load balancing— (Hugging Face) now flow back into the global AI ecosystem because the model is fully open-weight under an MIT license. (Hugging Face, GitHub) The techniques that let China build frontier AI cheaply are available to everyone.
Alibaba's Qwen 2.5 series, released September 2024, spans from 500 million to 72 billion parameters, with specialized variants for coding and mathematics. The 72B model surpasses Llama 3.1 405B on multiple benchmarks including MMLU-redux and MATH. By September 2025, Qwen had become the most downloaded LLM family on Hugging Face, with 750 million downloads in 2024 compared to Llama's 500 million. (The Decoder) Chinese open-weight models now dominate global downloads.
The compute math reveals an unbridgeable asymmetry between training and fine-tuning
Understanding why export controls fail requires understanding the economics. Training a frontier model is extraordinarily expensive. Fine-tuning one is cheap.
Training GPT-4 class models costs tens of millions of dollars. Epoch AI's rigorous analysis estimates GPT-4's training cost at $40 million using amortized hardware and energy costs, or $78 million based on cloud rental rates. (epoch) The total hardware acquisition cost—what it would take to buy the GPUs outright—exceeds $800 million. Sam Altman has publicly stated GPT-4 training cost "more than $100 million."
Meta's Llama 3.1 405B required 30.84 million GPU hours on H100s over roughly 72-100 days. The compute cost alone approached $60 million; the hardware would cost $640-850 million to purchase. Training frontier models requires megawatt-scale electricity, hundreds or thousands of high-end GPUs, and months of continuous operation.
Fine-tuning costs orders of magnitude less. Full fine-tuning of a 70B parameter model requires around 672 GB of GPU memory and costs $100,000-$500,000. But modern parameter-efficient techniques have collapsed these requirements.
LoRA (Low-Rank Adaptation) trains only 0.2-0.3% of a model's parameters, reducing memory requirements by roughly 70%. Performance remains within 1% of full fine-tuning on standard benchmarks.
QLoRA (Quantized LoRA) goes further. By using 4-bit quantization during training, QLoRA reduces memory requirements by 16x. A 65B parameter model that would normally require 780 GB of memory can be fine-tuned on a single 48GB GPU. (GitHub, arXiv) A 33B model fits on a consumer RTX 4090 with 24GB. (Niklas Heidloff) The QLoRA paper demonstrated fine-tuning a 65B model in 24 hours on one GPU, achieving 99.3% of ChatGPT's performance on the Vicuna benchmark. (GitHub)
The cost asymmetry is staggering:
Model Class
Training Cost
QLoRA Fine-Tuning
Ratio
GPT-4 class
$40-78M+
N/A
N/A
405B
~$60M
Millions
~20-100x
70B
$10-50M
$10-50K
1,000-5,000x
7B
$500K-5M
$10-500
10,000-100,000x
A consumer RTX 4090 costs roughly $1,600. With QLoRA, it can fine-tune models up to 33 billion parameters. (Runpod) Cloud rental rates for capable GPUs run $0.50-2.00 per hour. (Vast.ai) Fine-tuning a 7B model for a specialized application costs less than a restaurant dinner.
This asymmetry has a specific policy implication that -export controls target training compute, but the real capability proliferation happens through fine-tuning. Blocking access to H100s might slow countries from training new frontier models. It does nothing to stop them from adapting the frontier models that are already published.
Chinese institutions systematically exploit open-weight models for sensitive applications
The flow of American AI capabilities into China is not theoretical. It is documented across academic papers, commercial deployments, and military applications.
Medical AI built on Llama proliferates across Chinese research institutions. HuaTuo, from Harbin Institute of Technology, fine-tuned LLaMA-7B with Chinese medical knowledge. BenTsao adapted LLaMA-7B using 8,000+ instruction samples from Chinese medical knowledge graphs. ShenNong-TCM fine-tuned LLaMA with 110,000 Traditional Chinese Medicine instructions. (ResearchGate, arXiv) Zhongjing, the first Chinese medical LLaMA implementing full RLHF, was trained on 70,000 authentic doctor-patient dialogues. These aren't obscure projects—they're published in major venues and funded by China's National Natural Science Foundation.
Materials science and chemistry research builds directly on American open models. ChemLLM, developed at Shanghai Institute of Materia Medica, used Llama2, Llama3, and Mistral for chemical text mining. LlaSMol, for molecular understanding, is based on LLaMA-2-7B and Mistral-7B. Studies on materials properties use Llama-3.1-8B and Mistral-7B for melting point prediction.
Commercial deployments are ubiquitous. ByteDance uses open-weight foundations for its Doubao LLM and MarsCode coding assistant. (Yahoo Finance) In December 2023, ByteDance was reportedly using OpenAI's API to generate training data for its own models—violating OpenAI's terms of service. (36Kr) Alibaba's Qwen series now dominates global downloads, with the company investing 380 billion RMB in AI R&D. (Substack) Baidu's CEO Robin Li once stated that "it doesn't make much sense to recreate a ChatGPT"—encouraging building on existing models rather than training from scratch. (36Kr)
An estimated 90% of Chinese AI models are fine-tuned on open-source foundations rather than trained from scratch. (36Kr) The Chinese-developed LLaMA-Factory framework—with 64,000 GitHub stars—is the dominant tool for fine-tuning, supporting over 100 LLMs including Llama, Qwen, DeepSeek, and Mistral.
The military applications are documented. A November 2024 Reuters investigation found ChatBIT, developed by researchers including those from the PLA Academy of Military Science, built on Llama 2 13B and fine-tuned with 100,000 military dialogue records. (CNAS) The model is "optimized for dialogue and question-answering tasks in the military field" with planned applications in "strategic planning, simulation training and command decision-making." Researchers from Aviation Industry Corporation of China (AVIC, linked to the PLA) used Llama 2 for "training of airborne electronic warfare interference strategies."
Meta's acceptable use policy prohibits military applications. The policy is unenforceable. (CNAS) CSET's William Hannas summarized the situation: "Can you keep them out of the cookie jar? No, I don't see how you can."
By September 2025, Stanford's HAI found that 63% of all new fine-tuned models on Hugging Face were based on Chinese base models. (Stanford) This reflects a broader trend where Chinese open-weight models (e.g., Qwen, DeepSeek) reshape global adoption patterns. The derivative model ecosystem has flipped: Chinese labs now produce more open foundations than American ones, and those models flow back into global use. DeepSeek-R1-Distill models, explicitly based on Qwen2.5 and Llama3 architectures, are released under MIT license. (Hugging Face, GitHub) The bidirectional flow is complete.
The open versus closed debate reveals irreconcilable values
The AI safety community is split on whether open release helps or harms security. Both sides make compelling arguments. Both are partially right.
The case for open release centers on transparency and democratization. Yann LeCun, Meta's chief AI scientist, argues that "open-source software platforms are both more powerful and more secure than closed-source versions" because "more people can scrutinize and improve the technology." Mark Zuckerberg contends that "open source will ensure that more people around the world have access to the benefits and opportunities of AI" and is "necessary for a positive AI future."
The innovation argument is real. Open weights enable small research groups to study AI behavior without billion-dollar compute budgets. Academic safety research often requires weight access for interpretability work. CSET's analysis identified seven distinct use cases in scientific research that require model weights, from fine-tuning to compression to mechanistic interpretability. (CSET) Without open models, safety research becomes the exclusive province of well-funded labs.
The sovereignty argument resonates in Europe. Mistral positions itself as enabling "European AI independence" and "digital sovereignty." France invested €40 million in open-source LLMs. The French Armed Forces Ministry partnered with Mistral in January 2025. President Macron publicly promoted Mistral's Le Chat over ChatGPT. European policymakers see open European models as alternatives to dependence on American tech giants.
The case against open release centers on irreversibility and misuse. Yoshua Bengio, chairing the International AI Safety Report for 30 countries, warns that "all the safety protections that AI companies have created have been quickly defeated by hackers and academics shortly after these systems were put out." (Rudolphina) Unlike software updates, released model weights cannot be patched or withdrawn. Once Llama's weights spread across Hugging Face, Ollama, and countless mirror sites, they're permanent.
Safety training is easily removed. Research shows Llama 2's safety fine-tuning can be reversed for under $200. (Transformernews) Third parties can fine-tune away harmful content refusals, creating jailbroken versions that respond to any query. (CNAS) CrowdStrike reported that Llama-based models are "likely being used by cybercriminals." (Transformernews) The ChatBIT military model demonstrates that published model weights flow directly into adversary military applications.
Anthropic's Responsible Scaling Policy treats model weights as critical security assets, implementing "more than 100 different security controls" including egress bandwidth limits specifically designed to prevent weight exfiltration. In May 2025, Anthropic activated ASL-3 protections for Claude Opus 4 due to "continued improvements in CBRN-related knowledge and capabilities." (Anthropic)
The biosecurity trajectory is concerning. RAND's January 2024 study found that "current generation of large language models do not increase the risk of a biological weapons attack by a non-state actor" because "their outputs generally mirrored information readily available on the internet." But CSIS's March 2025 analysis warns that "LLMs are rapidly approaching or even exceeding critical security thresholds for providing users key bioweapons development information—with some models already demonstrating capabilities that surpass expert-level knowledge."
MIT documented non-scientist students using chatbots to list pandemic-capable viruses, identify manufacturing methods, and suggest acquisition approaches bypassing screening—all within one hour. The trajectory matters more than the current state.
The industry is reconsidering. In December 2025, amid his departure from Meta to launch a new AI startup (AMI), LeCun emphasized open-source benefits but noted U.S. firms shifting toward proprietary models. Meta is reportedly developing a closed proprietary model "Avocado" for Q1 2026 following concerns that DeepSeek successfully copied Llama's architecture. OpenAI, which stopped open model releases after GPT-2 in 2019, (Centre for International Governance Innovation) announced in March 2025 plans to release "a powerful open-source model near the capabilities of the most advanced AI models currently available." Sam Altman admitted in January 2025 that "we have been on the wrong side of history here." (TechCrunch)
The positions are converging from opposite directions. But convergence doesn't resolve the core problem: models already released remain released.
Every proposed solution fails for structural reasons
The governance landscape is littered with failed approaches. Understanding why they failed reveals what remains possible.
Voluntary commitments lack enforcement. The White House secured voluntary AI commitments from Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI in July 2023. Companies pledged internal and external security testing, information sharing on risks, and cybersecurity safeguards for model weights. One year later, MIT Technology Review assessed progress as limited: "no meaningful transparency or accountability" and "public reporting on system capabilities remains inadequate." The Center for AI and Digital Policy concluded that companies are "nowhere near where we need them to be in terms of good governance or protection of rights at large." (MIT Technology Review)
The Frontier Model Forum, founded October 2023 by Anthropic, Google, Microsoft, and OpenAI, has published useful technical reports on safety evaluations and bio-risks. Its $10 million AI Safety Fund supports independent research. (OpenAI, Google) But it remains an industry-funded body with no external oversight. Stanford's Rishi Bommasani notes it's unclear whether its outputs translate to meaningful constraints or merely enable self-regulation without accountability.
Staged release proposals are technically unworkable for open weights. OpenAI's GPT-2 staged release in 2019—releasing progressively larger versions over nine months—worked because OpenAI controlled access to the weights. (arXiv) For genuinely open models, there's no technical mechanism to stage release. Once weights are on Hugging Face, they propagate instantly. You cannot recall them.
The fine-tuning problem is worse. Staged release assumes the releasing entity controls downstream use. With open weights, anyone can fine-tune away safety training. The $200 cost to undo Llama 2's safety fine-tuning demonstrates that safety measures in open models are marketing features, not security controls.
International coordination has stalled. The G7 Hiroshima AI Process produced the first international framework on generative AI governance in December 2023, establishing 12 guiding principles and a voluntary code of conduct. Nineteen organizations submitted reports through its OECD-hosted mechanism by February 2025. (OECD AI) But everything is voluntary. There's no binding force.
The UK AI Safety Summit's Bletchley Declaration brought 28 countries together, including China and the U.S. (The Lancet) The symbolic value was real—getting China into discussions. But Oxford University's assessment was that "in terms of impact it is unlikely to amount to much" and "heavy lifting still needs to be done." (University of Oxford)
US-EU coordination faces philosophical divergence. The EU chose comprehensive regulation through the AI Act; the U.S. pursued voluntary commitments and executive action. The "Brussels Effect"—the EU's ability to set global standards—is weakening under corporate lobbying pressure. (Nature) By November 2025, the EU Commission was preparing AI Act amendments following discussions with the Trump administration. (Tech Policy Press)
First Amendm–ent constraints are real. AI-generated content receives First Amendment protection as speech. Model weights, as computer code, are treated as expressive for constitutional purposes. The Center for Democracy & Technology notes that "model weights communicate information to a computer... People and organizations who wish to publish such model weights have a protected speech interest in doing so."
Any law targeting AI model release faces strict scrutiny as a content-based regulation. It must serve a compelling government interest and be narrowly tailored. Prior restraint—restriction before publication—is "presumptively unconstitutional." Courts have required "direct, immediate, and irreparable threat to public safety" to justify such restraints. (arXiv) AI hasn't demonstrated the kind of concrete harm that would meet this standard.
California's attempt to regulate political deepfakes was blocked as overly broad. Mandatory watermarking requirements face compelled speech challenges. The constitutional obstacles are not insurmountable, but they're substantial.
Industry lobbying overwhelms safety advocacy. Meta spent $24 million on lobbying in 2024 and$19.79 million in the first three quarters of 2025—the highest among Big Tech. In the first half of 2025, it had 86 lobbyists, one for every six members of Congress. OpenAI spent $1.2 million; Anthropic around $250,000. AI safety advocacy groups spent $187,501 combined.
Mistral's co-founder Cédric O, former French Secretary of State for Digital Transition, has direct access to President Macron. (TechCrunch) His lobbying successfully weakened the EU AI Act's foundation model requirements. (Corporateeurope) While pushing for European AI sovereignty, Mistral simultaneously negotiated a partnership with Microsoft—raising concerns it functioned as "a front for Big Tech interests." (IRIS)
Meta launched the American Technology Excellence Project, a super PAC backing state political candidates who support the tech industry and oppose restrictive AI laws. The asymmetry in resources ensures industry preferences dominate policy outcomes.
What remains possible: narrow interventions without illusions
Meaningful disclosure requirements remain feasible. The government can require AI developers to report training compute, safety evaluations, and capability assessments before major releases. This doesn't prevent release; it creates accountability. The October 2023 Executive Order established 10²⁶ FLOPs as a reporting threshold. Enforcement is straightforward—companies know when they're approaching it. The data enables better understanding of capability trajectories even if it doesn't slow them.
Liability frameworks could shift incentives. Current law provides AI companies substantial immunity from harms caused by model outputs. Changing this calculus—making developers potentially liable for foreseeable misuse enabled by their choices—would internalize costs currently externalized onto society. This doesn't require controlling publication; it changes the economics of release decisions.
Investment in defensive capabilities addresses the problem differently. Rather than trying to prevent adversaries from accessing AI capabilities, governments can invest in defenses against AI-enabled threats. Improved biosurveillance, better cyber defense, enhanced detection of AI-generated content—these approaches accept that capability proliferation is happening and focus on resilience.
Compute concentration creates leverage. The three-year lag in chip manufacturing means today's export controls affect capabilities three years hence. The AI Diffusion Rule's focus on advanced computing chips was more tractable than controlling model weights—chips are physical objects requiring fabrication plants that take years to build. The Trump administration's December 2025 decision to allow H200 sales with 25% fees represents one approach to maintaining influence while generating revenue. (CNBC) Whether this is wise policy is debatable; that it's more enforceable than weight controls is not.
Transparency into Chinese model development matters. If Chinese open models now dominate global downloads—as Stanford's data shows—understanding those models' capabilities and limitations serves security interests. Ignoring Chinese AI progress doesn't prevent it; it just ensures surprise when capabilities emerge.
Conclusion
The open-weight paradox has no solution that preserves both American AI leadership and effective capability control.
The models exist. They're everywhere. Fine-tuning them for specialized applications including harmful ones, costs less relatively. Export controls on model weights control nothing when the weights are published. The "less powerful than most powerful open" exemption ensures the regulatory floor rises as open models improve. Every defensive measure is one fine fine-tuning run away from circumvention.
This doesn't mean nothing can be done.
Governments can require transparency. They can shift liability. They can invest in defense. They can maintain leverage over compute infrastructure. They can build international consensus slowly, imperfectly, with full knowledge that voluntary frameworks lack teeth.
What governments cannot do is restore a world where advanced AI capabilities are controllable through export restrictions. The world ended when DeepSeek proved you could build GPT-4-class models for $5.6 million using export-compliant chips. (arXiv)
The question is not whether adversaries will have access to advanced AI. They do. The question is what we do with that reality. The answer involves defense, resilience, transparency, and the difficult work of international institution-building—not the comforting illusion that information can be unexported once it's been published to the world.
The paradox isn't that open-weight policies undermine export controls. The paradox is that we built export controls knowing they would fail, because the alternative—accepting that AI cannot be controlled—was politically impossible to say out loud.
INTRODUCTION
The United States has spent three years building an elaborate system of export controls designed to slow China's AI development or as they say “to prevent national security risks associated with AI”. On January 13, 2025, the Biden administration published its most comprehensive framework yet, a 40-page rule establishing ECCN 4E091 to control the export of advanced computing chips and certain closed AI model weights to cultivate a secure and trusted technology ecosystem for the responsible use and diffusion of AI. However, the rule carves out a broad exclusion for AI models whose weights are publicly released. As a result, the new controls apply only to a narrow class, leaving most open-source AI development outside the export control regime. (Bureau of Industry and Security)
This is the open-weight paradox. The U.S. government cannot effectively restrict the global spread of advanced AI capabilities once model weights are publicly released. Some of the most powerful models are already freely available to anyone with an internet connection. Meta's Llama 3.1 405B, which is trained on approximately 3.8×10²⁵ FLOPs or computational operations, can be downloaded from Hugging Face around the world. DeepSeek-V3, a Chinese model built in part on techniques influenced by the U.S. open-weight releases, approaches GPT-4-class performance on several benchmarks. DeepSeek-V3 was trained for just $5.576 million in reported marginal compute costs,(arXiv,DeepLearning.AI) which is roughly 1/18th of GPT-4’s estimated training expense. (arXiv,BentoML) This excludes prior R&D and ablation experiments, with total infrastructure investment estimated at roughly $1.3 billion by SemiAnalysis. The current export control regime does not restrict these models because, once weights are published, meaningful enforcement becomes impractical. (Bureau of Industry and Security)
The math makes the problem worse. Training a frontier AI model costs tens of millions of dollars and requires thousands of high-end GPUs running for months. By contrast, fine-tuning that same model for specialized applications, including military ones, can cost as little as $200 using techniques like QLoRA. The computational asymmetry makes export controls focus on model creation rather than downstream adaptation use, even though most capability diffusion occurs during fine-tuning and deployment. In effect, we're trying to restrict access to the factory while the blueprints are posted online.
This problem is recognized but effectively unsolvable. Law firms, policy analysts, and government agencies have noted this open-weight exemption since January 2025. (WilmerHale, Perkins Coe) Multiple analyses note the "paradox" or "loophole" created by exempting published weights. Yet no clearly viable solution has emerged. Once model weights are published, they cannot be meaningfully controlled, and longstanding First Amendment protections for code as speech create genuine constitutional barriers to restrictions. (Code as Speech , LawAI)
This article examines how this paradox emerged, evaluates why the proposed solution falls short, and explores what realistic options-if-any-remain.
The ECCN 4E091 framework tried to control AI weights—then exempted most of them
The Bureau of Industry and Security published its "Framework for Artificial Intelligence Diffusion" in the Federal Register on January 15, 2025, creating the first-ever export control classification for AI model weights.(Baker McKenzie) ECCN 4E091 controls the numerical parameters of AI models trained using more than 10²⁶ computational operations (Faegre Drinker) roughly double the training compute of the most advanced models at the time.
The threshold matters less than the exemptions. The rule contains three provisions that render it largely symbolic.
The "published" exemption states that "ECCN 4E091 does not control the model weights of any AI model that has been 'published' as defined in § 734.7(a) of the EAR." (WilmerHale) Under existing export control law, technology is "published" when it's made available to the public without restrictions on further dissemination. Every open-weight model like- Llama, Mistral, DeepSeek, Qwen qualifies it automatically.
In the Federal Register announcement, BIS explicitly acknowledged this tradeoff: "the economic and social benefits of allowing the model weights of open-weight models to be published without a license currently outweigh the risks posed by those models." This wasn't an oversight but it was a deliberate policy choice made in full knowledge of the security implications.
The "less powerful than most powerful open" clause exempts all closed-weight models that perform below the best open-weight model. BIS justified this by reasoning that "any entity seeking to use advanced AI models for activities that threaten U.S. national security will have no incentive to divert the model weights of less powerful closed models."(federalregister) This creates a ratchet effect because as open models improve, the exemption expands to cover more closed models.
The fine-tuning provision exempts models that were developed using published weights, as long as additional training "constitutes no more than 10²⁵ operations OR no more than 10% of the training operations, whichever is higher."(WilmerHale) Since fine-tuning typically uses a tiny fraction of original training compute, this exemption covers virtually all derivative models.
The stated rationale was pragmatic. BIS acknowledged in the Federal Register that "the economic and social benefits of allowing the model weights of open-weight models to be published without a license currently outweigh the risks posed by those models." First Amendment protections for code as speech are genuine legal constraints, not convenient excuses. Courts have consistently held that computer code constitutes expressive speech entitled to constitutional protection, making prior restraint on publication subject to strict scrutiny.
The rule was effectively dead on arrival. On May 13, 2025, BIS initiated rescission, instructing enforcement officials not to enforce the framework. (Kirkland & Ellis LLP) BIS stated the rescission strengthens chip-related export controls while removing ECCN 4E091's model weight provisions, with new guidance on Huawei ICs and AI training commodities. The December 2025 Trump administration decision to allow H200 chip sales to China, with a 25% fee paid to the U.S. government, signaled a broader shift away from the Biden-era restriction philosophy. Nvidia now aims for shipments to start by mid-February 2026, reversing a short-lived ban to maintain U.S’s firm competitiveness. (CNBC)
Current open-weight models exceed the capabilities regulators tried to restrict
Meta released Llama 3.1 on July 23, 2024, (Hugging,DataCamp) with variants at 8B, 70B, and 405B parameters. The flagship 405B model was trained on 15.6 trillion tokens using over 16,000 H100 GPUs and 3.8×10²⁵ FLOPs—just under the 10²⁶ threshold that triggers ECCN 4E091 controls.
The model's capabilities are frontier-class. Llama 3.1 405B achieves 87.3% on MMLU, beating GPT-4 Turbo's 86.5%. It scores 89.0 on HumanEval for code generation. It supports 128K token context windows in eight languages. Within a month of release, it had been downloaded over 20 million times (Computerworld), more than the entire previous year combined. By late 2024, cumulative downloads across Llama variants exceeded 350 million. (Computerworld)
The licensing terms are deliberately permissive. Meta's Llama 3.1 Community License grants a "royalty-free, non-exclusive, worldwide license" for commercial and research use. (GitHub) Users must display "Built with Llama" branding (Hugging Face) and need Meta's approval only if they exceed 700 million monthly active users. (Llama) Critically, Llama 3.1 permits using model outputs to train other LLMs, a change from Llama 3 that explicitly enables the derivative model ecosystem. (Llama)
Mistral Large 2, released July 24, 2024, provides 123 billion parameters with 128K context window and support for 12 languages and around 80 programming languages. Mixtral 8x22B, released under Apache 2.0 (the most permissive open-source license), uses a mixture-of-experts architecture with 141 billion total parameters but only 39 billion active per forward pass. These models represent European AI sovereignty ambitions. Mistral received €40 million from the French government, while simultaneously undermining any coordinated Western approach to AI export controls.
DeepSeek-V3 released by a Chinese lab on December 26, 2024, represents the most striking challenge to the export control framework. The model has 671 billion parameters with 37 billion active per token, trained on 14.8 trillion tokens. (arXiv, Hugging Face) DeepSeek reports the final training run cost $5.576 million using 2.788 million H800 GPU hours. (arXiv) The company's technical paper explicitly notes: "the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data." Independent analysis by SemiAnalysis estimates DeepSeek's total infrastructure investment approaches $1.3 billion.
DeepSeek's efficiency innovations—FP8 mixed precision training, Multi-head Latent Attention, auxiliary-loss-free load balancing— (Hugging Face) now flow back into the global AI ecosystem because the model is fully open-weight under an MIT license. (Hugging Face, GitHub) The techniques that let China build frontier AI cheaply are available to everyone.
Alibaba's Qwen 2.5 series, released September 2024, spans from 500 million to 72 billion parameters, with specialized variants for coding and mathematics. The 72B model surpasses Llama 3.1 405B on multiple benchmarks including MMLU-redux and MATH. By September 2025, Qwen had become the most downloaded LLM family on Hugging Face, with 750 million downloads in 2024 compared to Llama's 500 million. (The Decoder) Chinese open-weight models now dominate global downloads.
The compute math reveals an unbridgeable asymmetry between training and fine-tuning
Understanding why export controls fail requires understanding the economics. Training a frontier model is extraordinarily expensive. Fine-tuning one is cheap.
Training GPT-4 class models costs tens of millions of dollars. Epoch AI's rigorous analysis estimates GPT-4's training cost at $40 million using amortized hardware and energy costs, or $78 million based on cloud rental rates. (epoch) The total hardware acquisition cost—what it would take to buy the GPUs outright—exceeds $800 million. Sam Altman has publicly stated GPT-4 training cost "more than $100 million."
Meta's Llama 3.1 405B required 30.84 million GPU hours on H100s over roughly 72-100 days. The compute cost alone approached $60 million; the hardware would cost $640-850 million to purchase. Training frontier models requires megawatt-scale electricity, hundreds or thousands of high-end GPUs, and months of continuous operation.
Fine-tuning costs orders of magnitude less. Full fine-tuning of a 70B parameter model requires around 672 GB of GPU memory and costs $100,000-$500,000. But modern parameter-efficient techniques have collapsed these requirements.
LoRA (Low-Rank Adaptation) trains only 0.2-0.3% of a model's parameters, reducing memory requirements by roughly 70%. Performance remains within 1% of full fine-tuning on standard benchmarks.
QLoRA (Quantized LoRA) goes further. By using 4-bit quantization during training, QLoRA reduces memory requirements by 16x. A 65B parameter model that would normally require 780 GB of memory can be fine-tuned on a single 48GB GPU. (GitHub, arXiv) A 33B model fits on a consumer RTX 4090 with 24GB. (Niklas Heidloff) The QLoRA paper demonstrated fine-tuning a 65B model in 24 hours on one GPU, achieving 99.3% of ChatGPT's performance on the Vicuna benchmark. (GitHub)
The cost asymmetry is staggering:
Model Class
Training Cost
QLoRA Fine-Tuning
Ratio
GPT-4 class
$40-78M+
N/A
N/A
405B
~$60M
Millions
~20-100x
70B
$10-50M
$10-50K
1,000-5,000x
7B
$500K-5M
$10-500
10,000-100,000x
A consumer RTX 4090 costs roughly $1,600. With QLoRA, it can fine-tune models up to 33 billion parameters. (Runpod) Cloud rental rates for capable GPUs run $0.50-2.00 per hour. (Vast.ai) Fine-tuning a 7B model for a specialized application costs less than a restaurant dinner.
This asymmetry has a specific policy implication that -export controls target training compute, but the real capability proliferation happens through fine-tuning. Blocking access to H100s might slow countries from training new frontier models. It does nothing to stop them from adapting the frontier models that are already published.
Chinese institutions systematically exploit open-weight models for sensitive applications
The flow of American AI capabilities into China is not theoretical. It is documented across academic papers, commercial deployments, and military applications.
Medical AI built on Llama proliferates across Chinese research institutions. HuaTuo, from Harbin Institute of Technology, fine-tuned LLaMA-7B with Chinese medical knowledge. BenTsao adapted LLaMA-7B using 8,000+ instruction samples from Chinese medical knowledge graphs. ShenNong-TCM fine-tuned LLaMA with 110,000 Traditional Chinese Medicine instructions. (ResearchGate, arXiv) Zhongjing, the first Chinese medical LLaMA implementing full RLHF, was trained on 70,000 authentic doctor-patient dialogues. These aren't obscure projects—they're published in major venues and funded by China's National Natural Science Foundation.
Materials science and chemistry research builds directly on American open models. ChemLLM, developed at Shanghai Institute of Materia Medica, used Llama2, Llama3, and Mistral for chemical text mining. LlaSMol, for molecular understanding, is based on LLaMA-2-7B and Mistral-7B. Studies on materials properties use Llama-3.1-8B and Mistral-7B for melting point prediction.
Commercial deployments are ubiquitous. ByteDance uses open-weight foundations for its Doubao LLM and MarsCode coding assistant. (Yahoo Finance) In December 2023, ByteDance was reportedly using OpenAI's API to generate training data for its own models—violating OpenAI's terms of service. (36Kr) Alibaba's Qwen series now dominates global downloads, with the company investing 380 billion RMB in AI R&D. (Substack) Baidu's CEO Robin Li once stated that "it doesn't make much sense to recreate a ChatGPT"—encouraging building on existing models rather than training from scratch. (36Kr)
An estimated 90% of Chinese AI models are fine-tuned on open-source foundations rather than trained from scratch. (36Kr) The Chinese-developed LLaMA-Factory framework—with 64,000 GitHub stars—is the dominant tool for fine-tuning, supporting over 100 LLMs including Llama, Qwen, DeepSeek, and Mistral.
The military applications are documented. A November 2024 Reuters investigation found ChatBIT, developed by researchers including those from the PLA Academy of Military Science, built on Llama 2 13B and fine-tuned with 100,000 military dialogue records. (CNAS) The model is "optimized for dialogue and question-answering tasks in the military field" with planned applications in "strategic planning, simulation training and command decision-making." Researchers from Aviation Industry Corporation of China (AVIC, linked to the PLA) used Llama 2 for "training of airborne electronic warfare interference strategies."
Meta's acceptable use policy prohibits military applications. The policy is unenforceable. (CNAS) CSET's William Hannas summarized the situation: "Can you keep them out of the cookie jar? No, I don't see how you can."
By September 2025, Stanford's HAI found that 63% of all new fine-tuned models on Hugging Face were based on Chinese base models. (Stanford) This reflects a broader trend where Chinese open-weight models (e.g., Qwen, DeepSeek) reshape global adoption patterns. The derivative model ecosystem has flipped: Chinese labs now produce more open foundations than American ones, and those models flow back into global use. DeepSeek-R1-Distill models, explicitly based on Qwen2.5 and Llama3 architectures, are released under MIT license. (Hugging Face, GitHub) The bidirectional flow is complete.
The open versus closed debate reveals irreconcilable values
The AI safety community is split on whether open release helps or harms security. Both sides make compelling arguments. Both are partially right.
The case for open release centers on transparency and democratization. Yann LeCun, Meta's chief AI scientist, argues that "open-source software platforms are both more powerful and more secure than closed-source versions" because "more people can scrutinize and improve the technology." Mark Zuckerberg contends that "open source will ensure that more people around the world have access to the benefits and opportunities of AI" and is "necessary for a positive AI future."
The innovation argument is real. Open weights enable small research groups to study AI behavior without billion-dollar compute budgets. Academic safety research often requires weight access for interpretability work. CSET's analysis identified seven distinct use cases in scientific research that require model weights, from fine-tuning to compression to mechanistic interpretability. (CSET) Without open models, safety research becomes the exclusive province of well-funded labs.
The sovereignty argument resonates in Europe. Mistral positions itself as enabling "European AI independence" and "digital sovereignty." France invested €40 million in open-source LLMs. The French Armed Forces Ministry partnered with Mistral in January 2025. President Macron publicly promoted Mistral's Le Chat over ChatGPT. European policymakers see open European models as alternatives to dependence on American tech giants.
The case against open release centers on irreversibility and misuse. Yoshua Bengio, chairing the International AI Safety Report for 30 countries, warns that "all the safety protections that AI companies have created have been quickly defeated by hackers and academics shortly after these systems were put out." (Rudolphina) Unlike software updates, released model weights cannot be patched or withdrawn. Once Llama's weights spread across Hugging Face, Ollama, and countless mirror sites, they're permanent.
Safety training is easily removed. Research shows Llama 2's safety fine-tuning can be reversed for under $200. (Transformernews) Third parties can fine-tune away harmful content refusals, creating jailbroken versions that respond to any query. (CNAS) CrowdStrike reported that Llama-based models are "likely being used by cybercriminals." (Transformernews) The ChatBIT military model demonstrates that published model weights flow directly into adversary military applications.
Anthropic's Responsible Scaling Policy treats model weights as critical security assets, implementing "more than 100 different security controls" including egress bandwidth limits specifically designed to prevent weight exfiltration. In May 2025, Anthropic activated ASL-3 protections for Claude Opus 4 due to "continued improvements in CBRN-related knowledge and capabilities." (Anthropic)
The biosecurity trajectory is concerning. RAND's January 2024 study found that "current generation of large language models do not increase the risk of a biological weapons attack by a non-state actor" because "their outputs generally mirrored information readily available on the internet." But CSIS's March 2025 analysis warns that "LLMs are rapidly approaching or even exceeding critical security thresholds for providing users key bioweapons development information—with some models already demonstrating capabilities that surpass expert-level knowledge."
MIT documented non-scientist students using chatbots to list pandemic-capable viruses, identify manufacturing methods, and suggest acquisition approaches bypassing screening—all within one hour. The trajectory matters more than the current state.
The industry is reconsidering. In December 2025, amid his departure from Meta to launch a new AI startup (AMI), LeCun emphasized open-source benefits but noted U.S. firms shifting toward proprietary models. Meta is reportedly developing a closed proprietary model "Avocado" for Q1 2026 following concerns that DeepSeek successfully copied Llama's architecture. OpenAI, which stopped open model releases after GPT-2 in 2019, (Centre for International Governance Innovation) announced in March 2025 plans to release "a powerful open-source model near the capabilities of the most advanced AI models currently available." Sam Altman admitted in January 2025 that "we have been on the wrong side of history here." (TechCrunch)
The positions are converging from opposite directions. But convergence doesn't resolve the core problem: models already released remain released.
Every proposed solution fails for structural reasons
The governance landscape is littered with failed approaches. Understanding why they failed reveals what remains possible.
Voluntary commitments lack enforcement. The White House secured voluntary AI commitments from Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI in July 2023. Companies pledged internal and external security testing, information sharing on risks, and cybersecurity safeguards for model weights. One year later, MIT Technology Review assessed progress as limited: "no meaningful transparency or accountability" and "public reporting on system capabilities remains inadequate." The Center for AI and Digital Policy concluded that companies are "nowhere near where we need them to be in terms of good governance or protection of rights at large." (MIT Technology Review)
The Frontier Model Forum, founded October 2023 by Anthropic, Google, Microsoft, and OpenAI, has published useful technical reports on safety evaluations and bio-risks. Its $10 million AI Safety Fund supports independent research. (OpenAI, Google) But it remains an industry-funded body with no external oversight. Stanford's Rishi Bommasani notes it's unclear whether its outputs translate to meaningful constraints or merely enable self-regulation without accountability.
Staged release proposals are technically unworkable for open weights. OpenAI's GPT-2 staged release in 2019—releasing progressively larger versions over nine months—worked because OpenAI controlled access to the weights. (arXiv) For genuinely open models, there's no technical mechanism to stage release. Once weights are on Hugging Face, they propagate instantly. You cannot recall them.
The fine-tuning problem is worse. Staged release assumes the releasing entity controls downstream use. With open weights, anyone can fine-tune away safety training. The $200 cost to undo Llama 2's safety fine-tuning demonstrates that safety measures in open models are marketing features, not security controls.
International coordination has stalled. The G7 Hiroshima AI Process produced the first international framework on generative AI governance in December 2023, establishing 12 guiding principles and a voluntary code of conduct. Nineteen organizations submitted reports through its OECD-hosted mechanism by February 2025. (OECD AI) But everything is voluntary. There's no binding force.
The UK AI Safety Summit's Bletchley Declaration brought 28 countries together, including China and the U.S. (The Lancet) The symbolic value was real—getting China into discussions. But Oxford University's assessment was that "in terms of impact it is unlikely to amount to much" and "heavy lifting still needs to be done." (University of Oxford)
US-EU coordination faces philosophical divergence. The EU chose comprehensive regulation through the AI Act; the U.S. pursued voluntary commitments and executive action. The "Brussels Effect"—the EU's ability to set global standards—is weakening under corporate lobbying pressure. (Nature) By November 2025, the EU Commission was preparing AI Act amendments following discussions with the Trump administration. (Tech Policy Press)
First Amendm–ent constraints are real. AI-generated content receives First Amendment protection as speech. Model weights, as computer code, are treated as expressive for constitutional purposes. The Center for Democracy & Technology notes that "model weights communicate information to a computer... People and organizations who wish to publish such model weights have a protected speech interest in doing so."
Any law targeting AI model release faces strict scrutiny as a content-based regulation. It must serve a compelling government interest and be narrowly tailored. Prior restraint—restriction before publication—is "presumptively unconstitutional." Courts have required "direct, immediate, and irreparable threat to public safety" to justify such restraints. (arXiv) AI hasn't demonstrated the kind of concrete harm that would meet this standard.
California's attempt to regulate political deepfakes was blocked as overly broad. Mandatory watermarking requirements face compelled speech challenges. The constitutional obstacles are not insurmountable, but they're substantial.
Industry lobbying overwhelms safety advocacy. Meta spent $24 million on lobbying in 2024 and $19.79 million in the first three quarters of 2025—the highest among Big Tech. In the first half of 2025, it had 86 lobbyists, one for every six members of Congress. OpenAI spent $1.2 million; Anthropic around $250,000. AI safety advocacy groups spent $187,501 combined.
Mistral's co-founder Cédric O, former French Secretary of State for Digital Transition, has direct access to President Macron. (TechCrunch) His lobbying successfully weakened the EU AI Act's foundation model requirements. (Corporateeurope) While pushing for European AI sovereignty, Mistral simultaneously negotiated a partnership with Microsoft—raising concerns it functioned as "a front for Big Tech interests." (IRIS)
Meta launched the American Technology Excellence Project, a super PAC backing state political candidates who support the tech industry and oppose restrictive AI laws. The asymmetry in resources ensures industry preferences dominate policy outcomes.
What remains possible: narrow interventions without illusions
Meaningful disclosure requirements remain feasible. The government can require AI developers to report training compute, safety evaluations, and capability assessments before major releases. This doesn't prevent release; it creates accountability. The October 2023 Executive Order established 10²⁶ FLOPs as a reporting threshold. Enforcement is straightforward—companies know when they're approaching it. The data enables better understanding of capability trajectories even if it doesn't slow them.
Liability frameworks could shift incentives. Current law provides AI companies substantial immunity from harms caused by model outputs. Changing this calculus—making developers potentially liable for foreseeable misuse enabled by their choices—would internalize costs currently externalized onto society. This doesn't require controlling publication; it changes the economics of release decisions.
Investment in defensive capabilities addresses the problem differently. Rather than trying to prevent adversaries from accessing AI capabilities, governments can invest in defenses against AI-enabled threats. Improved biosurveillance, better cyber defense, enhanced detection of AI-generated content—these approaches accept that capability proliferation is happening and focus on resilience.
Compute concentration creates leverage. The three-year lag in chip manufacturing means today's export controls affect capabilities three years hence. The AI Diffusion Rule's focus on advanced computing chips was more tractable than controlling model weights—chips are physical objects requiring fabrication plants that take years to build. The Trump administration's December 2025 decision to allow H200 sales with 25% fees represents one approach to maintaining influence while generating revenue. (CNBC) Whether this is wise policy is debatable; that it's more enforceable than weight controls is not.
Transparency into Chinese model development matters. If Chinese open models now dominate global downloads—as Stanford's data shows—understanding those models' capabilities and limitations serves security interests. Ignoring Chinese AI progress doesn't prevent it; it just ensures surprise when capabilities emerge.
Conclusion
The open-weight paradox has no solution that preserves both American AI leadership and effective capability control.
The models exist. They're everywhere. Fine-tuning them for specialized applications including harmful ones, costs less relatively. Export controls on model weights control nothing when the weights are published. The "less powerful than most powerful open" exemption ensures the regulatory floor rises as open models improve. Every defensive measure is one fine fine-tuning run away from circumvention.
This doesn't mean nothing can be done.
Governments can require transparency. They can shift liability. They can invest in defense. They can maintain leverage over compute infrastructure. They can build international consensus slowly, imperfectly, with full knowledge that voluntary frameworks lack teeth.
What governments cannot do is restore a world where advanced AI capabilities are controllable through export restrictions. The world ended when DeepSeek proved you could build GPT-4-class models for $5.6 million using export-compliant chips. (arXiv)
The question is not whether adversaries will have access to advanced AI. They do. The question is what we do with that reality. The answer involves defense, resilience, transparency, and the difficult work of international institution-building—not the comforting illusion that information can be unexported once it's been published to the world.
The paradox isn't that open-weight policies undermine export controls. The paradox is that we built export controls knowing they would fail, because the alternative—accepting that AI cannot be controlled—was politically impossible to say out loud.
Now it's possible. Now it's necessary.