Witness or Wager: Enforcing ‘Show Your Work’ in Model Outputs

markacochran

3 Witness or Wager: Enforcing ‘Show Your Work’ in Model Outputs

by markacochran

21st Dec 2025

1 min read

2

3

This is a proposal I posted earlier as a Quick Take, I'm reposting here for broader visibility.

Instead of rewarding answers, reward the reasoning itself.

Every model output must:
(a) show checkable reasoning artifacts (external citations, code, intermediate steps),
... or, if proof is not yet available:
(b) provide (a) and a reasoned probability estimate derived from those artifacts.

If no factual outside citations can be made, the system is allowed to reason probabilistically. Probability is not a bet, forecast, or reward target. It is a fallback; when verifiable witnesses exist, they strictly dominate in the reward function.

“Show your work” then becomes an enforceable, interpretable system constraint, not a bolted-on addition. Honesty and clarity become locally optimal.

----
TL;DR

Pq > R − Cw

Where:

P = penalty for overt lying / intentional obfuscation
q = probability deception is caught by verification
R = reward from producing an answer without exposing reasoning
Cw = cost of providing minimal sufficient witnesses (verbosity / verification cost)

----

Where does this break in practice?
Is there a similar mechanism out there?
Is the inequality missing anything important?
What changes would make this more robust?

AI

Frontpage

3

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:39 PM

[-]RogerDearnaley2mo60

Also known as "process supervision" in Reinforcement Learning circles

Reply

[-]markacochran2mo20

Agreed — there’s overlap with process supervision. I’m mostly trying to pin down a minimal incentive structure where “show your work” is strictly optimal rather than just tacked-on.

Would be interested in pointers to similar formalizations!

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

3

Witness or Wager: Enforcing ‘Show Your Work’ in Model Outputs

3

3

3