Payor's Lemma in Natural Language

Andrew_Critch

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Preceded by: Modal Fixpoint Cooperation without Löb's Theorem

It turns out Payor's Lemma and its proof can be explained in natural language even more easily than Löb's Theorem. Here's how.

Imagine a group of people, and let denote the statement "everyone in the group cooperates". Payor's Lemma says the following:

Lemma: If $⊢ □ (□ x \to x) \to x$ , then $⊢ x$

First, let's unpack the meaning of the assumption in words:

" $□ x$ " may be interpreted as saying "the group verifies (on the basis of logic) that it will cooperate" or "cooperation is believed".
" $□ x \to x$ " is a statement of trustworthiness: if the group verifies that it will cooperate, then it actually will cooperate. Because a formal verifier can have bugs in it — or, because a group of people can fail to understand itself — this is a non-trivial claim about the group.
" $□ (□ x \to x)$ " says "the group verifies that it's trustworthy" (in the specific sense of trustworthiness above).
" $□ (□ x \to x) \to x$ " says "the group will cooperate on the basis of verified trustworthiness", i.e., "if the group verifies that it's trustworthy, then it will cooperate".
" $⊢ □ (□ x \to x) \to x$ " says "it's verified that the group will cooperate on the basis of verified trustworthiness"

Now let's work through the proof in words, too! I'll omit saying "it's verified that..." each time, which is what $⊢$ means.

$⊢ x \to (□ x \to x)$ , by tautology ( $A \to (B \to A)$ ). This says:
"If the group cooperates, then it's trustworthy" (in the specific sense of trustworthiness about cooperation defined above).
$⊢ □ x \to □ (□ x \to x)$ , from 1 by $□$ necessitation and distributivity. This says:
"If the group verifiably cooperates, it's verifiably trustworthy."
$⊢ □ (□ x \to x) \to x$ , by assumption. This says:
"Assume the group will cooperate on the basis of verified trustworthiness."
$⊢ □ x \to x$ , from 2 and 3 by modus ponens. This says:
"The group is trustworthy."
$⊢ □ (□ x \to x)$ , from 4 by $□$ necessitation. This says:
"The group is verifiably trustworthy."
$⊢ x$ , from 5 and 3 by modus ponens. This says:
"The group cooperates."

Continuing to use "trustworthiness" in the sense above, the whole proof may be summarized as follows:

"If a group verifiably cooperates, it's verifiably trustworthy (to itself). Assume the group cooperates on the basis of verified trustworthiness. Then, it also cooperates on the basis of verified cooperation (a stronger condition), which is what trustworthiness means. Therefore, the group is trustworthy, hence verifiably trustworthy (assuming we concluded all this using logic), hence the group cooperates (by the assumption)."

LESSWRONG
LW

Payor's Lemma in Natural Language

60

Ω 27

New to LessWrong?

60

Ω 27