The Case for Mandatory Pre-Deployment Safety Audits of Frontier AI

ronbodkin

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, assisted/co-written, or edited work.

Read full explanation

Epistemic status: Confident in the diagnosis that voluntary governance cannot survive the competitive dynamics frontier labs now operate under, and that the missing piece is authority to block a release. I'd appreciate feedback on the regime design below and on the broader risks and challenges of a mandatory regulatory approach.

Summary

Voluntary Responsible Scaling Policies are breaking under competitive pressure. RSP commitments are weakening and capabilities are outrunning the evaluations meant to bound them.
The failure is a coordination problem, not insufficient goodwill or insufficient transparency.
The minimum intervention that supplies the missing forcing function is mandatory pre-deployment safety compliance: external standards, independent audit, authority to block a release that fails the standard, and post-deployment incident reporting.
California is the right jurisdiction to start. Most US frontier labs are headquartered there, and the state has repeatedly set standards that propagated upward (CARB, CCPA, SB 53).
Mandatory compliance is not in tension with a pause. It builds the institutional capacity a pause would require if the evidence later demands one.

The governance spectrum

Five positions on frontier AI governance, ordered by degree of constraint:

Self-governance. Labs set their own thresholds and evaluations. The current operative reality. Anthropic's RSP, OpenAI's Preparedness Framework, Google DeepMind's Frontier Safety Framework.
Transparency and accountability rules. Disclosure, incident reporting, and whistleblower protection. California's SB 53 is a pure disclosure regime. New York's RAISE Act goes further by prohibiting deployment creating "unreasonable risk of critical harm" — but the developer writes their own protocol and defines "reasonable." Enforcement is after-the-fact. Neither sets a pre-deployment safety standard or empowers anyone to block a release.
Voluntary incentives. Carrots for opt-in verification. EU GPAI Code of Practice, Dean Ball's private governance framework, Virginia's IVO legislation (SB 384/HB 797).
Mandatory safety compliance. Government-defined minimum standards with an enforcement regime. Deployment requires a positive finding. The EU AI Act already requires conformity assessment for high-risk AI. Bengio has argued for mandatory government oversight of frontier development including registration and licensing.
Pause or ban. Capability-indexed moratorium, ideally via binding international agreement. PauseAI, MIRI's 2025 treaty call. Safest on paper. Advocates agree it requires binding US–China participation at a minimum.

The right point depends on the magnitude of the risk, the costs of the intervention, and the feasibility of the policy. As capabilities increase, the appropriate point should move toward more constraint. The question this piece is about is which point is needed today.

Why mandatory compliance is the right floor

Labs can already identify increasing risk in their own systems. They cannot act on that risk unilaterally without surrendering competitive ground. Voluntary frameworks cannot resolve that asymmetry because they provide no binding standard against which any lab's decision can be judged. The problem is not insufficient awareness, insufficient transparency, or insufficient goodwill. It is the absence of a forcing function that overrides race dynamics.

Mandatory compliance is the minimum intervention that supplies that forcing function. Standards are set externally. Deployment is gated on an independent finding against those standards. The standard cannot be negotiated down under competitive pressure.

Lighter regulation does not include a binding standard and so cannot do this work. A pause can, but at costs and feasibility problems discussed below.

This is not a novel idea. The EU AI Act already requires conformity assessment for high-risk AI systems. Brundage et al. lay out a detailed framework for frontier AI auditing defining assurance levels, scoping principles, and the institutional infrastructure needed for rigorous third-party assessment. The building blocks exist in the literature and in law. What's missing is a concrete proposal that assembles them into a workable regime for frontier AI.

Regime design

The minimum viable regime has four elements:

A defined safety standard that frontier developers must meet.
A pre-deployment audit against that standard, conducted by an independent body.
Regulatory authority to block a release that cannot meet the standard.
Post-deployment incident reporting to catch risks that surface in the wild.

Applied initially to the largest frontier developers. This is the smallest package that does the work voluntary governance cannot.

Design principles:

Standards scale with capability. More powerful systems face more stringent requirements.
Deployment is gated on independent audits, not self-certification.
The developer's path to approval is producing safety evidence, not negotiating the standard down.
Standards can tighten or relax as the science evolves, tracking evidence rather than ideology.

The standard should be operationalized as a structured safety case against a defined risk tolerance. Aviation, pharmaceuticals, and nuclear power all followed this pattern. None had mature safety science at inception. They set the standard and relied on the pressure of the standard to produce the science. Aviation safety improved because regulators required it, not because airlines volunteered.

The regulated artifact should be the system as the covered developer provides it: the model, its scaffolding, and the affordances they make accessible.

Scope should start narrow — the ten to twelve largest AI developers, roughly those spending at least $10B/year on AI R&D:

Hyperscalers with frontier programs: Google, Microsoft, Meta, Amazon.
Pure-play frontier labs: OpenAI, Anthropic, xAI.
Largest Chinese developers: ByteDance, Alibaba, Tencent, Baidu.

A narrow initial scope is pragmatic, not a claim that downstream actors are safe. Frontier developers define each new capability ceiling, and their spend is observable enough to enforce against, which makes them the right first target. But capability migrates downstream through distillation, fine-tuning, and open-weight release, sometimes faster than spend thresholds would capture. The regime needs expansion triggers beyond spend: demonstrated capability by a mid-tier developer, release of a capable open-weight model, or documented misuse incidents should each expand coverage. Starting narrow gets the regime operating; dynamic triggers prevent a ceiling where risk escapes regulation because it moved below the spend threshold.

What this is not: a ban on research, a compliance mandate over downstream users or startups, or a regime that requires perfect safety science before it can function. When the safety case for a deployment is strong, the audit clears it. When the case is weak for the capability profile at hand, deployment is delayed until the developer demonstrates the risk is adequately managed.

By requiring safety cases, this proposal would force delays in releases to get to high quality safety cases. It creates direct incentives for developers to invest in the evidence that would resolve uncertainty, and prevents deployment of systems whose risks the developer cannot yet characterize.

The capture concern

Regulatory capture is an important structural objection to mandatory compliance. Incumbents often prefer binding regimes because compliance costs entrench them against smaller competitors, and Anthropic's explicit call for external governance can be read cynically through that lens. Capture is a design constraint to work against continuously, not a binary risk to resolve once.

Specific defenses:

Keep scope narrow. A high spend threshold captures only the largest organizations and leaves startups, academics, and downstream users untouched. That avoids the regime becoming a compliance moat.
Measure auditors against structurally different benchmarks. Track auditor performance against competing auditors, government safety institutes (UK AISI, US CAISI), and substantiated third-party bounty findings. Government institutes do not face fee-for-service drift. Bounties capture risk that surfaces only under decentralized probing.
Let standards move both ways. Independent expert committee reviews thresholds on a regular cadence, with public comment and published dissents on both tightening and relaxation. This limits the one-way ratchet incumbents would prefer: standards that freeze the market in its current shape and slow competitors.
Don't route enforcement through a single door. Whistleblower protection with anti-SLAPP shields, bounties for substantiated third-party findings, criminal liability for false safety cases.

These are directions but need more detailed work to get to a concrete proposal.

Why not lighter-touch regulation

SB 53 and RAISE make it harder to hide safety failures after the fact. Whistleblower protections give insiders a legal path. These are real obligations backed by law. But they regulate disclosure, not safety. Under RAISE a company only needs to implement a policy it deems "reasonable" to address the risk of critical harms. A company in full compliance with every transparency mandate on the books could still ship a model its own safety team believes is dangerous. No law requires meeting a safety standard before deployment.

Incentive-based approaches (GPAI Code of Practice, Dean Ball's private governance, Virginia IVO) share a real insight: compliance can be made cheaper than non-compliance. Two years ago with weaker frontier systems, the combination of mandatory transparency and voluntary incentives might have been adequate. That window has closed. Frontier models now demonstrate situational awareness, strategic deception in controlled settings, and autonomous capability that the labs' own safety teams describe as alarming.

What light-touch regulation lacks is the authority to block a release. Transparency surfaces risk; it does not prevent it. Mandatory compliance supplies the missing authority and defines the standard the developer must meet to clear the gate.

Why not a pause

This is the objection I take most seriously, so I want to engage it in some depth. Pause advocates are not wrong about the risk. The policy question is about sequence and incentives.

The coordination requirement is the first obstacle but not the only one. Dean Ball has argued that operationalizing a pause in policy requires a global enforcement authority with its own risks, and that the verification mechanisms such an authority would need do not yet exist. Building that enforcement regime is plausibly a larger undertaking than building a mandatory compliance regime itself.

The operational requirements of pause proposals are quite extensive. MIRI's November 2025 treaty paper prohibits concentrations above 16 H100-equivalents outside monitored facilities (roughly $500K of hardware), caps FLOP-denominated training runs, and extends upstream to publication bans and legal prohibition on "dangerous AI research" broadly construed. PauseAI's proposal primarily embraces a ban on giant compute training runs, but still asks for publication limits on training algorithms and runtime improvements that could enable capability jumps or decentralized training runs, citing biosecurity as precedent.

The sub-threshold exemption is where these regimes break. MIRI excludes consumer devices to keep the proposal politically tractable. That exemption also excludes the exact failure mode the theory of risk requires them to cover. Prime Intellect's INTELLECT-2 was a 32B-parameter model trained via globally distributed reinforcement learning over the public internet in 2025, with inference workers running on 4×3090 consumer boxes. Decentralized training works now. The infrastructure is public. Communication-efficient training keeps lowering the bandwidth cost. Closing the gap requires either extending monitoring to consumer hardware or restricting the algorithms themselves. While MIRI calls for banning such decentralized training, it is not clear how this could be enforced.

The research-restriction path is MIRI's explicit answer and PauseAI's partial one via the biosecurity analogy. The analogy doesn't carry. Gain-of-function research policy is contested, has been rolled back repeatedly, and holds mostly because few people want to do the work. Transformer-quality ML research is published on arXiv by thousands of researchers, and the important papers are not recognized as capabilities-critical until years later. The research-restriction path is MIRI's explicit answer and PauseAI's partial one via the biosecurity analogy. The analogy doesn't carry. Gain-of-function research policy is contested, has been rolled back repeatedly, and holds mostly because few people want to do the work. Transformer-quality ML research is published on arXiv by thousands of researchers, and the important papers are not recognized as capabilities-critical until years later. Attention Is All You Need (Vaswani et al. 2017) was about machine translation. No review committee in 2016 would have flagged it. A review regime covering computation-adjacent research is unprecedented in any technology; nuclear classification covers specific weapons-design information, not physics research. Systems research, hardware improvements and general-purpose algorithm advances can all accelerate AI research. So can cognitive neuroscience.was about machine translation. No review committee in 2016 would have flagged it. A review regime covering computation-adjacent research is unprecedented in any technology; nuclear classification covers specific weapons-design information, not physics research. Systems research, hardware improvements and general-purpose algorithm advances can all accelerate AI capabilities. So can cognitive neuroscience.

I don't accept the framing that a pause is easier to coordinate than shared safety testing. Safety standards build on existing models of regulatory harmonization: mutual recognition of certifications, Basel-style coordination, aviation safety agreements. Countries can adopt compatible standards at different paces without requiring simultaneous action. A pause requires a novel treaty regime for an unprecedented moratorium, with every participant facing strong incentives to defect. Getting the US and China to agree to stop developing advanced AI is far less plausible than getting them to require independent review of safety cases. Convincing both countries to halt semiconductor development is less plausible still.

The visibility argument also cuts the other way from how it is usually framed. A pause, especially a unilateral one, pushes frontier development into the channels where external visibility is weakest: state-controlled programs, clandestine commercial projects, or rogue actors. Mandatory safety compliance keeps development with visible commercial actors, under inspection, with published safety cases. If the worry is that a powerful system ends up deployed without adequate safety investment, you want development to stay above-ground where external pressure can reach it.

Both the US and China have genuine incentives to develop safe AI rather than unsafe AI. Neither government benefits from a system that is uncontrollable or that produces catastrophic incidents. Standards are a plausible convergence point for that reason. The argument that both sides will race ahead regardless of safety rests on a view I don't share: that either government considers unsafe AI a win. I think we can actually make the case to both that it isn't.

On standards themselves: there is genuine uncertainty about how to do AI assurance to a high confidence level. That is an argument for mandatory high standards, not against them. A high-confidence safety case cannot rest on a few benchmark evaluations — eval awareness, sandbagging, and saturation of existing benchmarks are already documented. The standard has to be independent review of structured safety cases, with assurance levels that a handful of scripted tests cannot clear. Scope should include fine-tuning and scaffolding affordances, not just the model as originally trained. And both loss of control and the development of a superintelligent system could and should be framed as unacceptable risks that must have safety cases to preclude.

The current industry allocation to safety is a small fraction of R&D at most labs and a somewhat larger fraction at a few. Relative to the stakes the labs themselves describe, it's a joke. A mandatory standard that requires high-confidence safety cases creates direct financial pressure to close that gap. Imagine a world where 50% of frontier-lab R&D goes to building and stress-testing safety cases, alignment research, evaluation science, and red-teaming infrastructure. That world probably exists only in a regime that requires it. A pause creates no such pressure; when it lifts, the safety science has advanced at best at a slower rate than today's already inadequate pace.

I think the underlying world-view difference with pause advocates is this: the strongest form of the pause argument rests on a view that there is no near-term path to developing AI safely, so any amount of delay is worth it at any cost. My view is that we should slow down and maximize the chances of developing AI safely — and that increasing the stringency of safety standards is probably the best available mechanism for stopping if we get clear evidence the technology cannot be deployed safely. That mechanism is nearly identical to a pause at the upper end of its range; it just gets there through a standard that ratchets rather than a binary ban.

Two other factual points worth stating directly. Frontier risk is emerging incrementally through steady gains in capability, including developer productivity and AI research capability; the main risk surface is the frontier models produced by the largest labs, which is exactly what a narrow mandatory regime covers. And the United States is currently driving the race. "Missile gap" rhetoric about China racing ahead does not match the behavior of US labs and US policy; if we wanted to slow the race, we have more leverage than we admit.

The coalition that would support a pause if it were achievable should support mandatory compliance now. It is the fastest path to the institutional capacity, expert ecosystem, audit infrastructure, and political legitimacy that would make a pause possible later if the evidence demands one. It also creates the regulatory machinery that makes a pause implementable rather than merely declarative.

The public prefers safety regulation to bans and bans to no rules across every major survey on the question. The political theory under which a draconian moratorium is achievable but mandatory standards are not has the evidence inverted.

Feasibility in the time available:

State-level legislation moves faster than federal. RAISE passed in a single session; SB 53 was enacted roughly eighteen months after its predecessor SB 1047 was introduced. On novel regulatory terrain, that is fast.
The foundational infrastructure exists. Apollo, METR, and others already conduct frontier evaluations. UK AISI and US CAISI are building capacity. A mandatory regime mobilizes existing infrastructure rather than starting from zero.
A narrow initial regime covering the ten-to-twelve largest developers does not require a large new agency to begin operating.

The realistic alternative to mandatory compliance on this timescale is not an international pause treaty. Pause advocates themselves acknowledge that's implausible on short timescales, and the political coalition for a hard pause is nowhere near mature enough to deploy in the window the labs themselves say matters. The political coalition for mandatory standards at the state level is already building.

None of this is a substitute for the broader work pause-adjacent advocates do. Raising public awareness of catastrophic risk, building international coalitions to prepare for ASI, and advocating for off switches are complementary to mandatory safety compliance, not alternatives. The institutional capacity a mandatory regime builds is capacity those efforts also need.

There's also an open question about when upstream controls on training and internal deployments become necessary. A model capable of recursive self-improvement is a risk event at training, not deployment: there is no way to safely train an ASI-capable model even if deployment is gated. Pre-deployment audits are the right floor now, and the institutional capacity they build is the same capacity upstream controls will require.

Why California

California has repeatedly set standards that propagated upward. CARB vehicle emissions standards became the de facto national floor, with other states adopting California standards directly and federal standards following. CCPA shaped every subsequent state privacy law and influenced federal privacy debate. Even in frontier AI, SB 53 became the foundation for New York's RAISE Act.

A California licensing regime reaches most leading US actors directly, given that 75% of US AI investment is in California. It does not require federal preemption or international coordination as preconditions. Virginia's IVO shows state-level AI governance is already building bipartisan momentum.

State action also builds the coalition, expert institutions, audit ecosystem, and political legitimacy that a federal regime or international agreement would require. This is how ambitious regulatory regimes actually get built. State leadership pairs with advocacy for federal measures like import restrictions, trade negotiation, and sectoral regulation to close the gaps state authority alone cannot reach.

Open questions

I don't have a finished framework. The open questions I find hardest:

The right quantitative risk tolerance. What probability of what harm, over what timeframe, should gate deployment?
How audits should handle saturated evaluations and evaluation awareness. Mythos's 29% reasoning-about-being-tested rate is not a solved problem.
How to set meaningful standards for capabilities we cannot yet reliably measure. There is real tension between requiring rigorous evidence and the fact that the most concerning capabilities are the ones evaluation science has most difficulty measuring.
The right relationship between state licensing bodies and government safety institutes (CAISI, UK AISI). Advisory? Co-regulator? Sole arbiter on certain classes of evaluation?
How to bootstrap auditor accreditation. The current auditor ecosystem is small and dependent on lab contracts.
How to structure the interstate compact that turns California adoption into a national floor without ceding to federal preemption.
Whether and how the regime should address foundational training runs in addition to deployment — pre-deployment audits catch deployment risk, not necessarily training risk.

These are not reasons to wait. They are reasons to start building institutional infrastructure now, so standards are refined with real data from real audits rather than designed in the abstract.

Conclusion

The frontier labs are telling us in their own documents that voluntary self-governance is no longer sufficient, and they cannot fix it alone because of the coordination problem. Self-governance is failing because it cannot survive the competitive dynamics it operates under.

Mandatory safety compliance is the most promising and urgent next step. It addresses the failure the labs describe: in a competitive market with no external gate, no individual lab can afford to delay deployment of a system it suspects is unsafe. An external regime with the authority to deny a release removes that trap. It also builds the institutional capacity any stronger policy would need.

The window is narrow. Frontier capability is advancing fast enough that the same lab CEOs who describe the coordination problem are also saying recursive self-improvement is one to two years out. If that timeline is even roughly right, delay in building institutional capacity is not costless.

Pushback welcome, particularly from people working on AI policy, evaluation science, audit design, or state regulatory practice. I'm also looking to connect with anyone interested in collaborating on a concrete California proposal: draft statute, auditor accreditation, division between state licensing and federal sectoral tools.

This piece is also published on LinkedIn with additional framing for a policy audience. The implications of recent advances that make it urgent is in Meditations on Mythos.