Bridging AI governance and deployment with policy derived executable rules

Gautam

Rejected for the following reason(s):

No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

Hi all, we recently published a paper on operationalizing AI policy by turning policy prose into testable, machine-readable rules that downstream enforcement and evaluation tools can use. We will be presenting this more on this work at the AAAI AI GOV 26 workshop next month!

TLDR; Many governance frameworks and standards produce useful guidance, but that guidance still has to be translated into explicit, checkable constraints before it can be enforced or evaluated in real systems. Our work proposes an extensible way to make that translation more systematic

The gap we are trying to close

AI governance today is largely expressed in prose. Major sources such as the EU AI Act, the NIST AI RMF, sector guidance, and enterprise standards articulate obligations and prohibitions in natural language, and they do so deliberately. Their aim is to remain broadly applicable across contexts, which means they often specify what should hold in principle without committing to a concrete, system-specific pass fail check.

In deployment, however, governance becomes actionable only when it is instantiated as something an engineering workflow can run and verify. The practical tooling ecosystem reflects this. Runtime guardrails, evaluation harnesses, and policy engines such as OPA and Rego can enforce constraints and measure compliance, but they largely presuppose that the relevant constraints already exist as structured rules and tests. Closing that gap requires expert labor. Practitioners must interpret policy clauses, resolve scope and exceptions, determine what evidence counts, and translate the result into executable checks. This translation layer becomes a persistent bottleneck. It is slow, error-prone, difficult to scale across documents and domains, and costly to maintain as policies and systems evolve.

Policy→Tests (P2T) is our attempt to make that missing layer explicit and reproducible by providing a bridge from policy prose to normalized, machine-readable rules suitable for downstream enforcement and evaluation.

What P2T does

P2T converts policy documents into a normalized corpus of atomic rules that downstream systems can consume. It is a pipeline plus a compact JSON DSL.

At a high level, the pipeline

ingests and chunks policy text into addressable spans with stable provenance
mines candidate spans likely to contain obligations and prohibitions
performs schema-guarded extraction into a fixed JSON rule format
judges and minimally repairs extractions when fields are missing or inconsistent
deduplicates paraphrases so reviewers see a canonical set of obligations
tags testability and suggests evidence channels that could verify compliance
optionally generates benign and adversarial examples for rules that support I/O checks
optionally runs consistency checks to surface contradictions early

The DSL is intentionally compact. Each extracted rule records provenance plus fields that tend to matter for enforcement

hazard
scope (actor, data domain, context)
conditions and exceptions
requirement
severity
testability and evidence signals (for example I/O check, log check, config check, CI gate)

This is not meant to replace existing guardrail or evaluation frameworks. The goal is to produce reusable rule artifacts that those systems can compile into runtime rails, policy engine constraints, or evaluation prompts.

Why this matters for alignment work

Many alignment and safety debates turn on a common operational question. How do we ensure that deployed systems adhere to constraints that are normative, continually revised, and often specific to a particular domain or institution.

Training-time alignment methods such as RLHF or constitutional approaches can shape baseline behavior by internalizing a relatively stable set of principles. They are valuable, but they offer limited leverage for one recurring need in practice, namely verifying compliance with a growing set of concrete obligations that appear in governance documents and change over time. By contrast, runtime guardrails and evaluation suites can provide explicit verification, but only after those obligations have been rendered into checkable form.

P2T targets that missing step. The framework aims to make the translation from policy language to checkable rules scalable and repeatable, while preserving provenance so that each extracted rule remains auditable back to the clause that motivated it.

If the goal is governance in deployment rather than governance on paper, then a pipeline that can regularly recompile updated policy text into testable artifacts is a plausible part of the broader solution.

What we evaluated

We evaluate P2T on a small but deliberately heterogeneous corpus spanning general governance frameworks, sector guidance, and enterprise standards. Since there is no widely accepted benchmark for policy-to-rule translation, we anchor evaluation in adjudicated human annotations and quantify quality using field-level metrics that reflect how well extracted rules capture the intended structure.

These extraction metrics are only meaningful if the resulting rules improve downstream behavior, so we also run a targeted impact study using NeMO guardrails. We select a small set of HIPAA-derived rules, implement them as runtime guardrails for a generative agent, and compare the guarded system to an otherwise identical baseline. Compliance is assessed by measuring violation rates with an LLM judge calibrated against gold criteria, and we stress test both systems on clean prompts as well as obfuscated and compositional variants. Across these settings, the guarded agent violates the rules far less often, including under obfuscation and composition.

Limitations and what I think is most uncertain

Ambiguity and context dependence are real. Some clauses require external context that is not present in the span, and a compact schema cannot represent every nuance (temporal constraints, probabilistic thresholds, complex cross references).
LLM extraction can hallucinate plausible rules. We mitigate this with schema checks, judge and repair, evidence gating, deduplication, and optional consistency checks, but high-stakes usage still needs human oversight.
Testability tagging is important and easy to get wrong. A policy clause can sound “testable” while being operationally vague once you try to define an oracle.

What we would love feedback on

Representation: is a compact rule schema like this a useful intermediate artifact, or do you think most real deployment work needs richer semantics from the start
Failure modes: what classes of policy language do you expect to break extraction reliably, especially around exceptions, cross references, and implied scope
Integration: if you have experience with eval harnesses or runtime guardrails, what would make a policy-derived rule corpus most useful to you
Benchmarks: any policy corpora or safety benchmarks you think would be valuable stress tests for this kind of pipeline

Thank you so much for reading! If you have experience translating policy language into runnable checks, we would be grateful for your perspective on where this could fit into your workflow, and which parts of the process you expect to be most bottlenecked or prone to failure.

Paper : https://arxiv.org/abs/2512.04408

Code: https://anonymous.4open.science/r/ExecutableGovernance-for-AI-DF49/

LESSWRONG
LW