Something I'm now realizing, having written all these down: the core mechanism really does echo Löb's theorem! Gah, maybe these are more like Löb than I thought.
(My whole hope was to generalize to things that Löb's theorem doesn't! And maybe these ideas still do, but my story for why has broken, and I'm now confused.)
As something to ponder on, let me show you how we can prove Löb's theorem following the method of ideas #3 and #5:
In english:
EDIT (followup): I now do think there's something up about these constructions being better-founded than a Löb-based approach, specifically thanks to these results. I don't have a good account yet for why.
For those who are:
Do you know of a good reference for how to interpret discussions like this?
For example: " tries to prove that , and tries to prove " -- If A and B are propositions, what does it mean for a proposition to try and prove another proposition?
(There might be more language that needs interpreting, but I got stuck there.)
Perhaps the confusion is mostly me being idiosyncratic! I don't have a good reference, but can attempt an explanation.
The propositions and are meant to model the behaviour of some agents, say Alice and Bob. The proposition means "Alice cooperates", likewise means "Bob cooperates".
I'm probably often switching viewpoints, talking about is if it's Alice, when formally is some statement we're using to model Alice's behaviour.
When I say " tries to prove that ", what I really mean is: "In this scenario, Alice is looking for a proof that if she cooperates, then Bob cooperates. We model this with meaning 'Alice cooperates', and follows from ."
Note that every time we use we're talking about proofs of of any size. This makes our model less realistic, since Alice and Bob only have a limited amount of time in which to reason about each other and try to prove things. The next step would be to relax the assumptions to things like , which says "Alice cooperates whenever it can be proven in steps that Bob cooperates".
This post presents five closely-related ways to achieve proof-based cooperation without using Löb's theorem, and muses on legible cooperation in the real world.
I'm writing this as a follow-up to Andrew Critch's recent post, to share more of my perspective on the subject.
We're going to dive straight into the weeds. (I would like to write a more accessible explainer post as well but this is all I have for now.)
I claim the following are sufficient for robust cooperation:
tries to prove that , and tries to prove . The reason this works is that can prove that , i.e. only cooperates in ways legible to . (Proof sketch: .)
The flaw in this approach is that we needed to know that won't cooperate for illegible reasons. Otherwise we can't verify that will cooperate whenever does.
This indicates to me that "" isn't the right "counterfactual". It shouldn't matter if could cooperate for illegible reasons, if is actually cooperating for a legible one.
We can weaken the requirements with a simple change:
Note that this form is close to the lemma discussed in Critch's post.
In this case, the condition is trivial. And when the condition activates, it also ensures that is true, which discharges our assumption and ensures is true.
I still have the sense that the condition for cooperation should talk about itself activating, not . Because we want it to activate when that is sufficient for cooperaion. But I do have to admit that works for mostly the right reasons, comes with a simple proof, and is the cleanest two-agent construction I know.
We can factor the part that is trying to cut the loop out from , like so:
This gives the loop-cutting logic a name, . Now can refer to itself, and roughly says "I'll legibly activate if I can verify this will cause to be true".
Like with idea #2, we just need to reveal a mechanism by which it can be compelled to cooperate. The mechanism is designed to be self-contained and easy for to verify.
What about three people trying to cooperate? We can try applying lots of idea #2:
And, this works! Proof sketch:
The proof simplifies the group one person at a time, since each person is asking "what would happen if everyone else could tell I cooperate". This lets us prove the whole thing by induction. It's neat that it works, though it's not the easiest thing to see.
What if we factor out the choosing logic in a larger group? Here's one way to do it:
This is the cleanest idea I know for handling the group case. The group members agree on some trusted leader or process . They set things up so activates legibly, verifies things in a way trusted by everyone, and only activates when it verifies this will cause cooperation.
We've now localized the choice-making in one place. Everyone can watch our process verify that , which causes to activate, and everyone cooperates.
Centralizing the choosing like in idea #5 make the logic simpler, but this sort of approach is prone to manipulation and other problems when the verification is not reliably done. This means I don't unambiguously prefer idea #5 to idea #4, in which everyone is doing their own legwork.
One solid takeaway from idea #5 though is that if you can actually trust other people/processes to do some of the verification, this can really simplify things. For example, if you believe that your friend Bob's verification is mostly trustworthy, then what Bob thinks can inform your read of the group.
Here's a sketch for something that might work in practice: