Sep 19, 2014

8 comments

I'm not sure if this post is very on-topic for LW, but we have many folks who understand Haskell and many folks who are interested in Löb's theorem (see e.g. Eliezer's picture proof), so I thought why not post it here? If no one likes it, I can always just move it to my own blog.

A few days ago I stumbled across a post by Dan Piponi, claiming to show a Haskell implementation of something similar to Löb's theorem. Unfortunately his code had a couple flaws. It was circular and relied on Haskell's laziness, and it used an assumption that doesn't actually hold in logic (see the second comment by Ashley Yakeley there). So I started to wonder, what would it take to code up an actual proof? Wikipedia spells out the steps very nicely, so it seemed to be just a matter of programming.

Well, it turned out to be harder than I thought.

One problem is that Haskell has no type-level lambdas, which are the most obvious way (by Curry-Howard) to represent formulas with propositional variables. These are very useful for proving stuff in general, and Löb's theorem uses them to build fixpoints by the diagonal lemma.

The other problem is that Haskell is Turing complete, which means it can't really be used for proof checking, because a non-terminating program can be viewed as the proof of any sentence. Several people have told me that Agda or Idris might be better choices in this regard. Ultimately I decided to use Haskell after all, because that way the post will be understandable to a wider audience. It's easy enough to convince yourself by looking at the code that it is in fact total, and transliterate it into a total language if needed. (That way you can also use the nice type-level lambdas and fixpoints, instead of just postulating one particular fixpoint as I did in Haskell.)

But the biggest problem for me was that the Web didn't seem to have any good explanations for the thing I wanted to do! At first it seems like modal proofs and Haskell-like languages should be a match made in heaven, but in reality it's full of subtle issues that no one has written down, as far as I know. So I'd like this post to serve as a reference, an example approach that avoids all difficulties and just works.

LW user lmm has helped me a lot with understanding the issues involved, and wrote a candidate implementation in Scala. The good folks on /r/haskell were also very helpful, especially Samuel Gélineau who suggested a nice partial implementation in Agda, which I then converted into the Haskell version below.

To play with it online, you can copy the whole bunch of code, then go to CompileOnline and paste it in the edit box on the left, replacing what's already there. Then click "Compile & Execute" in the top left. If it compiles without errors, that means everything is right with the world, so you can change something and try again. (I hate people who write about programming and don't make it easy to try out their code!) Here we go:

main = return ()

-- Assumptions

data Theorem a

logic1 = undefined :: Theorem (a -> b) -> Theorem a -> Theorem b logic2 = undefined :: Theorem (a -> b) -> Theorem (b -> c) -> Theorem (a -> c) logic3 = undefined :: Theorem (a -> b -> c) -> Theorem (a -> b) -> Theorem (a -> c)

data Provable a

rule1 = undefined :: Theorem a -> Theorem (Provable a) rule2 = undefined :: Theorem (Provable a -> Provable (Provable a)) rule3 = undefined :: Theorem (Provable (a -> b) -> Provable a -> Provable b)

data P

premise = undefined :: Theorem (Provable P -> P)

data Psi

psi1 = undefined :: Theorem (Psi -> (Provable Psi -> P)) psi2 = undefined :: Theorem ((Provable Psi -> P) -> Psi)

-- Proof

step3 :: Theorem (Psi -> Provable Psi -> P) step3 = psi1

step4 :: Theorem (Provable (Psi -> Provable Psi -> P)) step4 = rule1 step3

step5 :: Theorem (Provable Psi -> Provable (Provable Psi -> P)) step5 = logic1 rule3 step4

step6 :: Theorem (Provable (Provable Psi -> P) -> Provable (Provable Psi) -> Provable P) step6 = rule3

step7 :: Theorem (Provable Psi -> Provable (Provable Psi) -> Provable P) step7 = logic2 step5 step6

step8 :: Theorem (Provable Psi -> Provable (Provable Psi)) step8 = rule2

step9 :: Theorem (Provable Psi -> Provable P) step9 = logic3 step7 step8

step10 :: Theorem (Provable Psi -> P) step10 = logic2 step9 premise

step11 :: Theorem ((Provable Psi -> P) -> Psi) step11 = psi2

step12 :: Theorem Psi step12 = logic1 step11 step10

step13 :: Theorem (Provable Psi) step13 = rule1 step12

step14 :: Theorem P step14 = logic1 step10 step13

-- All the steps squished together

lemma :: Theorem (Provable Psi -> P) lemma = logic2 (logic3 (logic2 (logic1 rule3 (rule1 psi1)) rule3) rule2) premise

theorem :: Theorem P theorem = logic1 lemma (rule1 (logic1 psi2 lemma))

To make sense of the code, you should interpret the type constructor Theorem as the symbol ⊢ from the Wikipedia proof, and Provable as the symbol ☐. All the assumptions have value "undefined" because we don't care about their computational content, only their types. The assumptions logic1..3 give just enough propositional logic for the proof to work, while rule1..3 are direct translations of the three rules from Wikipedia. The assumptions psi1 and psi2 describe the specific fixpoint used in the proof, because adding general fixpoint machinery would make the code much more complicated. The types P and Psi, of course, correspond to sentences P and Ψ, and "premise" is the premise of the whole theorem, that is, ⊢(☐P→P). The conclusion ⊢P can be seen in the type of step14.

As for the "squished" version, I guess I wrote it just to satisfy my refactoring urge. I don't recommend anyone to try reading that, except maybe to marvel at the complexity :-)

EDIT: in addition to the previous Reddit thread, there's now a new Reddit thread about this post.