Modal Fixpoint Cooperation without Löb's Theorem

Andrew_Critch

Modal Fixpoint Cooperation without Löb's Theorem

by Andrew_Critch

4 min read5th Feb 202332 comments

133 Ω 57

Payor's LemmaDecision TheoryLöb's TheoremLogic & Mathematics World Modeling

Frontpage

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Followed By: Payor's Lemma in Natural Language

TL;DR: This post introduces a novel logical approach to achieving group-scale cooperation, based on modal fixpoint theory. This approach is both easier to understand and roughly 3x more efficient than previous approaches that factored through Löb's Theorem, measured in terms of the length / complexity of the proofs involved.

The following theorem was inspired by Scott Garrabrant, and uses a lemma of James Payor in place of Löb's Theorem to prove cooperation between a group of agents. I'll state the theorem for three agents because that's most illustrative of what's going on:

Theorem: Suppose and $C$ are agents that return "true" to signify cooperation and "false" to signify defection. Let $E = A \land B \land C$ , so $E$ is the statement that "everyone cooperates". Let $□_{A}$ , $□_{B}$ , and $□_{C}$ denote proof systems that extend Peano Arithmetic, let $□_{E} X$ stand for $□_{A} X \land □_{B} X \land □_{C} X$ , and suppose the agents behave in a manner satisfying the following conditions:

$⊢ □_{A} (□_{E} E \to E) \to A$
$⊢ □_{B} (□_{E} E \to E) \to B$
$⊢ □_{C} (□_{E} E \to E) \to C$

Then it follows that $⊢ E$ , i.e., everyone cooperates (provably!).

(Intuitively, the strategy of the agents in this theorem is to check that the group is trustworthy in a certain way before joining (cooperating with) the group. The theorem shows that the collective check on trustworthiness nests inside itself in a way that self-validates and yields cooperation.)

Proof:
$⊢ □_{A} (□_{E} E \to E) \land □_{B} (□_{E} E \to E) \land □_{C} (□_{E} E \to E) \to A \land B \land C$ , by combining 1, 2, and 3 with $\land$ .
$⊢ □_{E} (□_{E} E \to E) \to E$ , from 4 by the definition of E and $□_{E}$ .
$⊢ E$ , by the lemma below with $x = E$ .

[end proof]

The key to the theorem is the following lemma, due to James Payor:

Lemma: If $⊢ □ (□ x \to x) \to x$ then $⊢ x$ .

Historical note: Originally this lemma assumed $⊢ □ (□ x \to x) \leftrightarrow x$ , but then I realized only the forward implication is needed, so I rewrote it.

Proof: The proof uses the same modal rules of inference for $□$ as Löb's theorem, namely, necessitation and distributivity:

$⊢ x \to (□ x \to x)$ , by tautology ( $A \to (B \to A)$ ).
$⊢ □ x \to □ (□ x \to x)$ , from 1 by $□$ necessitation and distributivity.
$⊢ □ (□ x \to x) \to x$ , by assumption.
$⊢ □ x \to x$ , from 2 and 3 by modus ponens.
$⊢ □ (□ x \to x)$ , from 4 by $□$ necessitation.
$⊢ x$ , from 5 and 3 by modus ponens.

[end proof]

Sweet! In comparison to Löb's Theorem, two things are beautiful about Payor's lemma:

It sidesteps the use of an auxiliary fixpoint $⊢ Ψ \leftrightarrow (□ Ψ \to x)$ , by examining a proposition of interest ( $x$ ) that itself has the fixpoint structure needed to self-validate; and
It also allows the construction of unexploitable modal agents without Löb's Theorem, as in the theorem above.

Discussion

In the proof of the Theorem, you might be wondering if it really makes sense to be thinking of $□_{E}$ as a logical system of its own. It doesn't need to be, but the answer is yes if $□_{A}$ , $□_{B}$ , and $□_{C}$ are all finite extensions of PA. Then the axioms of $□_{E}$ are just [the conjunction of axioms of $□_{A}$ ] $\lor$ [the conjunction of axioms of $□_{B}$ ] $\lor$ [the conjunction of axioms of $□_{C}$ ].

You also might wonder if an alternative approach to group cooperation might be to instead use the following strategies:

$⊢ A \leftrightarrow □ (□ A \to B \land C)$
$⊢ B \leftrightarrow □ (□ B \to A \land C)$
$⊢ C \leftrightarrow □ (□ C \to A \land B)$

Then you'd be right! Here it also follows that $⊢ A \land B \land C$ . However, the proof involves a lot more nesting, with A thinking about what B's thinking about what C's thinking about (etc.), and it's not as easy or short as the proof of the Theorem above.

Writing implications backwards

If you think of $□ (□ x \to x)$ in Payor's Lemma as "defining" $x$ as a program, it's fun to write all the implication arrows from right to left, so they look a little like variable assignments. (And, under the Curry-Howard correspondence, they do actually correspond to maps.) Here's the lemma and proof again, written that way:

Lemma: If $⊢ x \leftarrow □ (x \leftarrow □ x)$ then $⊢ x$ .

Proof: The same as above, but with the arrows written leftward!

$⊢ (x \leftarrow □ x) \leftarrow x$ , by tautology ( $(A \leftarrow B) \leftarrow A$ ).
$⊢ □ (x \leftarrow □ x) \leftarrow □ x$ , from 1 by $□$ necessitation and distributivity.
$⊢ x \leftarrow □ (x \leftarrow □ x)$ , by assumption.
$⊢ x \leftarrow □ x$ , from 2 and 3 by modus ponens.
$⊢ □ (x \leftarrow □ x)$ , from 4 by $□$ necessitation.
$⊢ x$ , from 5 and 3 by modus ponens.

[end proof]

I find the above somewhat more elegant than the version with rightward arrows, albeit a less standard way of writing.

Conclusion

In my opinion, what's great about the lemma and theorem above is that they're both relatively short and simple (relative to proving and using Löb's Theorem), and they allow a proof of unexploitable group cooperation that's roughly three times shorter than than one that starts by proving Löb's Theorem (only ~6 lines of logic, vs ~18).

PS James says his next idea will be even better ;)

New to LessWrong?

Getting Started

FAQ

Library

Payor's LemmaDecision TheoryLöb's TheoremLogic & Mathematics World Modeling

Frontpage

133 Ω 57

Mentioned in

169Acausal normalcy

80Self-Referential Probabilistic Logic Admits the Payor's Lemma

69Probabilistic Payor Lemma?

60Payor's Lemma in Natural Language

43Some constructions for proof-based cooperation without Löb

Load More (5/6)

New Comment

32 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:21 AM

[-]JenniferRM1y266

This is pretty sweet. Thanks for posting it. It feels like something that academia should be able to cite, but I don't know how they would. Is there a version of this on arxiv maybe?

[-]SamEisenstat1yΩ162410

Here's a simple Kripke frame proof of Payor's lemma.

Let be a Kripke frame over our language, i.e. $W$ is a set of possible worlds, $R$ is an accessibility relation, and $⊩$ judges that a sentence holds in a world. Now, suppose for contradiction that $W ⊩ x \leftrightarrow □ (□ x \to x)$ but that $W ⊮ x$ , i.e. $x$ does not hold in some world $u \in W$ .

A bit of De Morganing tells us that the hypothesis on $x$ is equivalent to $\neg x \leftrightarrow ◊ (□ x \land \neg x)$ , so $u ⊩ ◊ (□ x \land \neg x)$ . So, there is some world $v$ with $u R v$ such that $v ⊩ □ x \land \neg x$ . But again looking at our equivalent form for $\neg x$ , we see that $W ⊩ \neg x \to ◊ \neg x$ , so $v ⊩ □ x \land ◊ \neg x$ , a contradiction. $□$

Both this proof and the proof in the post are very simple, but at least for me I feel like this proof tells me a bit more about what's going on, or at least tells me something about what's going on that the other doesn't. Though in a broader sense there's a lot I don't know about what's going on in modal fixed points.

Kripke frame-style semantics are helpful for thinking about lots of modal logic things. In particular, there are cool inductiony interpretations of the Gödel/Löb theorems. These are more complicated, but I'd be happy to talk about them sometime.

[-]James Payor1yΩ7100

Thanks! I also wrote up my proof in another comment, which should help triangulate intuition.

[-]SamEisenstat1yΩ790

Nice, I like this proof also. Maybe there's a clearer way to say thing, but your "unrolling one step" corresponds to my going from to $v$ . We somehow need to "look two possible worlds deep".

[-]James Payor1yΩ14227

Also, here's a proof that a bot is never exploited. It only cooperates when its partner $B$ provably cooperates.

First, note that $A \to □ A$ , i.e. if $A$ cooperates it provably cooperates. (Proof sketch: $A \leftrightarrow □ X \to □ □ X \leftrightarrow □ A$ .)

Now we show that $A \to □ B$ (i.e. if $A$ chooses to cooperate, its partner is provably cooperating):

We get $A \to (□ □ A \to □ B)$ by distributing.
We get $A \to □ □ A$ by applying internal necessitation to $A \to □ A$ .
By (1) and (2), $A \to □ B$ .

(PS: we can strengthen this to $A \leftrightarrow □ B$ , by noticing that $□ B \to □ (□ A \to B) \leftrightarrow A$ .)

[-]James Payor1yΩ690

A point of confusion: is it enough to prove that ? What about $A \to B$ ? I'm not sure I can say this well, but here goes:

We might not be able to prove $A \to B$ in the theory, or even that $\neg □ ⊥$ (which would mean "there are no proofs of inconsistency"). But if we believe our theory is sound, we believe that it can't show that a copy of itself proves something false.

So $A \to □ B$ tells us that if $A$ is true, the theory would show that a copy of itself proves $B$ is true. And this is enough to convince us that we can't simultaneously have $A$ true and $B$ false.

[-]Andrew_Critch1yΩ240

This is cool (and fwiw to other readers) correct. I must reflect on what it means for real world cooperation... I especially like the A <-> []X -> [][]X <-> []A trick.

[-]James Payor1yΩ587

In case this helps folks' intuition, my generating idea was something like: "look at what my opponent is thinking, but assume that whenever they check if I cooperate, they think the answer is yes". This is how we break the loop.

This results in a proof of the lemma like thus:

(given)
$x \leftrightarrow □ (□ x \to □ (□ x \to x))$ (unrolling one step)
$□ x \to □ (□ x \to x)$ is straightforward intuitively, and we can get there by applying box-distributivity to $□ (x \to (□ x \to x))$

(EDIT: This is basically the same proof as in the post, but less simple. Maybe the interesting part is the "unroll once then break the loop" intuition.)

Relatedly, with two agents $A$ and $B$ , we can have:

$A \leftrightarrow □ (□ A \to B)$ (given)
$B \leftrightarrow □ (□ B \to A)$ (given)
$A \leftrightarrow □ (□ A \to □ (□ B \to A))$ (substituting $B$ into $A$ )
$□ A \to □ (□ B \to A)$ is straightforward as before

[-]James Payor1yΩ460

(In fact, I think "assume they will think I cooperate" turns out to be too strong, and leads to unnecessary defection. I'm still working through the details.)

[-]Amalthea1yΩ010

So, ideally you would like to assume only

□A→B
□B→A

and conclude A and B ?

[-]James Payor1yΩ687

If I follow what you mean, we can derive:

So there's a Löbian proof, in which the provability is self-fulfilling. But this isn't sufficient to avoid this kind of proof.

(Aside on why I don't like the Löbian method: I moreso want the agents to be doing "correct" counterfactual reasoning about how their actions affect their opponent, and to cooperate because they see that mutual cooperation is possible and then choose it. The Löbian proof style isn't a good model of that, imo.)

[-]orthonormal1yΩ250

Very cool! How does this affect your quest for bounded analogues of Löbian reasoning?

[-]Andrew_Critch1yΩ240

I'm working on it :) At this point what I think is true is the following:

If ShortProof(x \leftrightarrow LongProof(ShortProof(x) \to x)), then MediumProof(x).

Apologies that I haven't written out calculations very precisely yet, but since you asked, that's roughly where I'm at :)

[-]James Payor1yΩ230

It looks like you're investigating an angle that I can't follow, but here's my two cents re bounded agents:

My main idea to port this to the bounded setting is to have a bot that searches for increasingly long proofs, knowing that if it takes longer to find a proof then it is itself a bit harder to reason about.

We can instantiate this like:

The idea is that if there is a short way to prove that the opponent $B$ would cooperate back, then it takes just a constant $C$ steps more to prove that we cooperate. So it doesn't open us up to exploitation to assume that our own cooperation is provable in $i + C$ steps.

The way in which this works at all is by cutting the loop at the point where the opponent is thinking about our own behaviour. This bot cuts it rather aggressively: it assumes that no matter the context, when $B$ thinks about whether $A$ cooperates, it's provable that $A$ does cooperate. (I think this isn't great and can be improved to a weaker assumption that would lead to more potential cooperation.)

If you construct $B_{n}$ similarly, I claim that $A_{k}$ and $B_{n}$ mutually cooperate if $k$ and $n$ are large enough, and mutually defect otherwise.

Similarly, I claim $A_{k}$ can mutually cooperate with other bots like $B_{n} (A) = □_{n} A (B_{n})$ .

[-]tailcalled1y44

At first glance, it seems like thinking of E as a proof system requires A, B and C to be finitely axiomatizable.

But I suppose that can be fixed, by instead giving E an infinite family of axioms:

For any finite sets of axioms a, b, and c of A, B, and C, E would have the axiom (conjunction of a) or (conjunction of b) or (conjunction of c).

(This itself is a technique I've never seen before. Might be interesting to think about whether it has any other applications. 🤔)

[-]Vladimir_Nesov1y50

This itself is a technique I've never seen before. Might be interesting to think about whether it has any other applications.

It reminds me of this. Given two sets of polynomials and $T$ , a set of polynomials whose vanishing set is the union of vanishing sets of these two is ${s t | s \in S, t \in T}$ . For the axioms from the post, $E$ has the class of models that is the union of the classes of models for $A$ , $B$ , and $C$ .

[-]Andrew_Critch1y20

Actually the interpretation of \Box_E as its own proof system only requires the other systems to be finite extenions of PA, but I should mention that requirement! Nonetheless even if they're not finite, everything still works because \Box_E still satisfies necessitation, distributivity, and existence of modal fixed points.

Thanks for bringing this up.

[-]Gurkenglas1y2-2

https://en.m.wikipedia.org/wiki/Compactness_theorem

(or if you like arcane ramblings some of whose fragments might suddenly make sense within a year, https://ncatlab.org/nlab/show/compactness+theorem)

[-]Vladimir_Nesov1y02

The thing in the post is an infinite conjunction. ~~What you propose doesn't work, it lets a single proof use axioms from multiple source systems, making it too strong.~~ (Edit: This is wrong, everything is fine except my arguments in this subthread that I've now struck through.) What we need is the intersection of the three theories, or triples of proofs.

But really, there is no need for to be a proof system, as the intersection of those three it has necessitation and distributivity, and that's all the lemma needs.

[-]tailcalled1y42

The thing in the post is an infinite conjunction.

No, the thing in the post is three infinite conjunctions, disjuncted together.

The issue with infinite conjunctions is that they are not valid in typical logics, since typical logics only permit finite-length sentences.

What you propose doesn't work, it lets a single proof use axioms from multiple source systems, making it too strong.

No, the "or"s between the conjunctions means that a proof has to work using axioms from only one of the source systems. The OP does the same (since "or" is the same as ).

In fact, if we pretend that infinite conjunctions make sense, then we can prove my axioms from OP's axioms. For any triple of axiom sets $(a, b, c)$ , we can simply project those out from the OP's axiom to prove my corresponding axiom.

[-]Vladimir_Nesov1y00

No, the thing in the post is three infinite conjunctions

Well yes, I meant any one of them is an infinite conjunction, which is the strange thing that some logics won't permit.

For any triple of axiom sets , we can simply project those out from the OP's axiom to prove my corresponding axiom.

~~This doesn't work~~, for example if all three systems have tautology ( $1$ ) as an axiom, let $b = c = 1$ and let $a$ be a singleton. ~~Then at least all axioms of $A$ become axioms in your system~~ (edit: wrong), which clearly shouldn't happen for $E$ , and can't be derived from that disjunction of infinite conjunctions. Like, $X$ doesn't follow from $X \lor Y$ .

[-]tailcalled1y64

This doesn't work, for example if all three systems have tautology () as an axiom, let $b = c = 1$ and let $a$ be a singleton. Then at least all axioms of $A$ become axioms in your system, which clearly shouldn't happen for $E$

No. For each axiom a of A, "a or tautology" becomes an axiom. But "x or tautology" is itself a tautology for any x, so this isn't a problem; "a or tautology" does not imply "a".

[-]Vladimir_Nesov1y60

That's true. Sorry for not being careful, everything checks out. Something seems to have short-circuited in my mind.

[-]Christopher King1y10

Wouldn't this also let you prove "not E"? 🤔 I think this system might be inconsistent.

EDIT: nvm, I guess it's assumed that the agents are some kind of FairBot (https://www.lesswrong.com/posts/iQWk5jYeDg5ACCmpx/robust-cooperation-in-the-prisoner-s-dilemma#Previously_known__CliqueBot_and_FairBot), which introduces an asymmetry between cooperate and defect.

[-]James Payor1y87

Intuitively, each agent in the group verifies that "if there is some argument compelling to everyone that we all cooperate, then I think everyone cooperates".

But implicit in the math is "if I can prove that there is no such argument, then the hypothetical is moot, and I cooperate anyway". !!

Is this a feature or a bug? Well I think this is part of how the group is ruling out a consistent world in which .

EDIT: Okay so Lob's theorem says you can never prove in the theory that there are no proofs of $X$ , due to the technicality in which $□ X$ means "a proof of any size". See the followup comments.

Alas I don't understand the $□ (□ E \to E)$ version well enough to give a compelling account of how it works.

[-]Daniel Kokotajlo10mo40

So... it's part of the setup that all of these agents will:
--Cooperate if they can prove that there is some argument compelling to everyone that everyone cooperates (because then they prove that everyone cooperates, and that includes them, and their proof system isn't mistaken?)
--Cooperate if they can prove that there is no such argument.
--Else defect.

Am I getting that right?

[-]James Payor10mo50

For the setup , it's bit more like: each member cooperates if they can prove that a compelling argument for "everyone cooperates" is sufficient to ensure "everyone cooperates".

Your second line seems right though! If there were provably no argument for straight up "everyone cooperates", i.e. $□ (□ E \to ⊥)$ , this implies $□ (□ E \to E)$ and therefore $E$ , a contradiction.

Also I think I'm a bit less confused here these days, and in case it helps:

Don't forget that " $□ P$ " means "a proof of any size of $P$ ", which is kinda crazy, and can be responsible for things not lining up with your intuition. My hot take is that Lob's theorem / incompleteness says "with finite proof strength you can only deny proofs up to a limited size, on pain of diagonalization". Which is way saner than the usual interpretation!

So idk, especially in this context I think it's a bad idea to throw out your intuition when the math seems to say something else. Since the mismatch is probably coming down to some subtlety in this formalization of provability/meta-methamatics. And I presently think the quirky nature of provability logic is often bugs due to bad choices in the formalism.

[-]Daniel Kokotajlo10mo50

Thanks for the reply. I'm a bit over my head here but isn't this a problem for the practicality of this approach? We only get mutual cooperation because all of the agents have the very unusual property that they'll cooperative if they find a proof that there is no such argument. Seems like a selfless and self-destructive property to have in most contexts, why would an agent self-modify into creating and maintaining this property?

[-]James Payor10mo122

(Thanks also to you for engaging!)

Hm. I'm going to take a step back, away from the math, and see if that makes things less confusing.

Let's go back to Alice thinking about whether to cooperate with Bob. They both have perfect models of each other (perhaps in the form of source code).

When Alice goes to think about what Bob will do, maybe she sees that Bob's decision depends on what he thinks Alice will do.

At this junction, I don't want Alice to "recurse", falling down the rabbit hole of "Alice thinking about Bob thinking about Alice thinking about--" and etc.

Instead Alice should realize that she has a choice to make, about who she cooperates with, which will determine the answers Bob finds when thinking about her.

This manouvre is doing a kind of causal surgery / counterfactual-taking. It cuts the loop by identifying "what Bob thinks about Alice" as a node under Alice's control. This is the heart of it, and imo doesn't rely on anything weird or unusual.

[-]James Payor10mo132

I tried to formalize this, using as a "poor man's counterfactual", standing in for "if Alice cooperates then so does Bob". This has the odd behaviour of becoming "true" when Alice defects! You can see this as the counterfactual collapsing and becoming inconsistent, because its premise is violated. But this does mean we need to be careful about using these.

For technical reasons we upgrade to $□ A \to B$ , which says "if Alice cooperates in a legible way, then Bob cooperates back". Alice tries to prove this, and legibly cooperates if so.

This setup gives us "Alice legibly cooperates if she can prove that, if she legibly cooperates, Bob would cooperate back". In symbols, $□ (□ A \to B) \to A$ .

Now, is this okay? What about proving $□ A \to ⊥$ ?

Well, actually you can't ever prove that! Because of Lob's theorem.

Outside the system we can definitely see cases where $A$ is unprovable, e.g. because Bob always defects. But you can't prove this inside the system. You can only prove things like " $□_{k} A \to ⊥$ " for finite proof lengths $k$ .

I think this is best seen as a consequence of "with finite proof strength you can only deny proofs up to a limited size".

So this construction works out, perhaps just because two different weirdnesses are canceling each other out. But in any case I think the underlying idea, "cooperate if choosing to do so leads to a good outcome", is pretty trustworthy. It perhaps deserves to be cached out in better provability math.

[-]Vladimir_Nesov1y20

How do you figure?

[-]Christopher King1y3-2

If A doesn't think "everyone cooperates", then A won't cooperate, right? Then by Lob's theorem applied to A, A won't cooperate.

Moderation Log