PipFoweraker's Shortform

1st Sep 2025

1 min read

2

This is a special post for quick takes by PipFoweraker. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:00 AM

[-]PipFoweraker3mo10

A Sketched Proposal for Interrogatory, Low-Friction Model Cards

My auditor brain was getting annoyed by what I see the current state of model cards as being. If we adopt better norms about these proactively, this seems like low effort to moderately good payoff? I am unsure on this, hence, rough draft below.

Problem
Model cards are uneven: selective disclosure, vague provenance, flattering metrics, generic limitations. Regulation (EU AI Act) and risk frameworks (NIST AI RMF) are pushing toward evidence-backed docs, but most “cards” are still self-reported. If we want to close evals gaps and make safety claims more falsifiable, model card norms are an obvious lever

Design Goal
A modern, interrogatory model card that:
- Minimizes authoring friction
- Maximizes falsifiability/comparability
- Fits EU/NIST governance
- Works for open/closed models

Lever: machine-readable schema + sharp prompts + links to evidence (metrics/datasets/scripts), not essays. Fine tune requirements in further versions.

CAN • SHOULD • MUST

MUST
- Identity & lineage (machine-readable)
- Intended & out-of-scope uses (≥3 concrete unsafe uses + rationale)
- Performance claims → dataset version + eval script commit + run hash (subgroup metrics if relevant)
- Limitations & failure modes (worst-case behaviors tried; at least one “worse than baseline” context)
- Data provenance summary (classes/volumes/filters; link Data Cards if possible)
- Safety testing overview (red-team/evals scope: jailbreaks, autonomy, persuasion, cyber, bio)
- Operational constraints + changelog (rate/context limits, moderation, update policy)

SHOULD
- One “executable” eval (small, reproducible subset)
- Map to NIST AI RMF or EU AI Act obligations
- Short System Card if shipped in a product (HITL, retention, deployment affordances)
- Bias/fairness subgroup rationale (why these; what’s missing)

CAN
- Third-party attestations (verify a slice of claims)
- Public card score (completeness/candor)
- Living card (auto-update + diff feed)

Low-Friction Rationale
Schema-first, link artifacts over prose, reuse compliance docs you already produce.
Rough time: 4–10 hrs initial (10–20 if backfilling subgroup metrics); ~1 hr per update.
Doing this establishes norms that are not burdensome and leave compliance gaps more evident

Adoption Pathways
Venues (MUST + one SHOULD), model hubs (completeness badge), procurement (map to EU/NIST), community norms (reward executable claims), standards (you must have this to upload)

Next Steps (repo plan)
- v0.2: add JSON Schema and a minimal example card; CI to validate examples
- v0.3: sharpen interrogatory prompts; add HF-friendly README template
- v0.4: collect feedback, add third-party attestation hooks

Questions
What loopholes or blind spots do you see in this CAN/SHOULD/MUST split?
Am this being interrogatory enough?
How would you game this framework if it was a norm and you were adversarially capabilities-motivated?

Reply

[-]PipFoweraker3mo10

Meta: I have been burrowed away in other research but came across these notes and thought I would publish them rather than let them languish. If there are other efforts in this direction, I would be glad to be pointed that way so I can abandon this idea and support someone else's instead.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

PipFoweraker's Shortform

2