F*, Coq, Agda and Idris
I reviewed some of these programming languages in the past. LEAN (sometimes spelled L∃∀N) made my list and got some attention but you left it out. Idris and Coq caught my eye, and I chose to explore Coq more for reasons of ecosystem maturity.
But when I tried to do almost ANYTHING in Coq I found the going very very very slow. Like.. the law of excluded middle isn't in there by default, which I discovered after hours of painful debugging attempts on an early exercise (proving the infinitude of primes) that I thought would be fast.
Do you have a favorite? Do you know of good tutorials for that favorite?
One way in which "if it compiles, it works" can be false is that sometimes "works" is ill-defined. For instance, suppose you're working on audio compression software. "It works" means that (1) the output is much smaller than the input, and (2) when you play it back, to most listeners it sounds just like the input did. #1 could be formalized in a way a compiler could enforce (though in some cases what you need is that the output is small for realistic inputs which isn't so straightforward), but #2 is about what happens in humans' ears and brains and is absolutely not the sort of thing a compiler could check.
Other examples: You're developing a computer game; "works" means that people enjoy playing it. You're designing a word processor; "works" means that it's not too laborious or confusing for users to write documents that look the way they want and continue to look the way they want as they edit them. You're writing a program that plays go; "works" means that it can beat top human players, or that its analysis is useful for top human players looking over their games. You're writing a trading algorithm; "works" means that when you set it loose on real markets it gains rather than loses money.
What all these examples have in common is that "works" involves something about what happens when the program interacts with human beings, and human beings are complicated. (There are other complicated things besides human beings, and there are probably some good examples that don't involve humans, but humans are really complicated, and there are a lot of us, and programs tend to be written for humans to use, and we interact with other humans ... so human interaction of some sort is probably the most important source of hairiness that compilers couldn't enforce handling correctly even in our dreams.)
I also agree with everything you said and want to add to Yair's point that there usually is an identifiable part of the problem that involves often complex models and data structures and their modification and interaction with them via a set of primitives (an API). It would be cool if we could improve that part. It would also be cool if we could improve on the derivation and checking of the real-world-interactive part. But that requires different methods.
I think this is the key insight from this post:
The first thing to realize is that if a programmer writes a program, he thinks it works. In other words he has an informal proof in his mind that the program displays some specific properties.
I wouldn't call it "an informal proof" exactly but I agree what goes on in the mind minds of programmers - modeling, concept processing, pattern matching, simulation - is somewhat close to it.
The difficulty is getting this out and into the program. Types are what seems to have worked best so far to help with it but maybe something better can be found. Or better ways to infer and display types.
Instead of talking about the halting problem, you might want to invoke Rice's Theorem:
All non-trivial, semantic properties of programs are undecidable [Wikipedia].
A semantic property of a program is one that depends only on its behavior, not on it's syntax. So "it's source code contains an even number of for-loops" is not a semantic property; "it outputs 7" is. The two "trivial" semantic properties that the theorem are excluding are very trivial: the semantic property "true" that is by definition true for all programs, and the semantic property "false" that is by definition false for all programs.
I agree with your main point that the halting problem (and Rice's Theorem) are overstated. While there exist programs that halt (or output 7) that can't be proved easily in whatever proof system you're using, they're mostly bad programs. If your programs works but requires transfinite induction to prove that it works, you should rewrite it to be easier to prove correct. Or maybe you're legitimately doing something very complicated.
We need to focus on proving the properties that the user is interested in, rather than the ones the compiler is interested in.
I think existing proof systems (e.g. Coq) already do this, unless I'm misunderstanding what you're asking for?
For example, here's a type signature that only says you're getting an integer out:
Fixpoint mystery (n : nat) : nat
And here's one that says you're getting a prime out (for an appropriate definition of the prime
type):
Fixpoint mystery (n : nat) : prime
If you care that mystery
produces a prime number, you use the second signature and go through the trouble of proving that it produces a prime. If you don't, you use the first signature.
You might object that while prime
is optional, nat
is not. You still have to prove that your program type checks and halts (modulo coinduction, which lets you express non-termination). But the alternative is a program that might crash, and if it crashes then it definitely doesn't work! Or if you really want to allow it to crash, you can express that too. Have everything return an option
, and propagate them up to the toplevel, and print "oops, something went wrong" if it gets there:
Fixpoint mystery (n: nat) : option nat
Tldr; there's a lot of leeway in what you choose to prove about your program in a proof assistant.
I will pick a rather large nit: "for example a web server definitely doesn't halt" is true, but for this to be surprising or interesting or a problem for Turing reasons, it just means you are modelling it incorrectly. Agda solves this using corecursion, and the idea is to use a data type that represents a computation that provably never halts. Think infinite streams, defined as "an infinite stream is a pair, $S_0 = (x, S_1)$, where $S_1$ is an infinite stream". This data type will provably keep producing values forever (it is "productive"), and that's what you want for a web server.
That's fair.
I think a better way of putting my point is, can we express the particular behaviors I'm interested in proving, and only have the compiler care about those, rather than checking everything it might be interested in.
So why don't we have type systems where "if it compiles, it works"?
Because there is no information in the program about what constitutes "working". We do have type systems where "if it compiles, it never crashes", but "not crashing" is a very weak form of "works".
Some languages let the programmer write assertions about what the program should do, even compiler-checkable assertions, but that just pushes the problem up one level: there is no information available to the compiler about whether these are the right assertions.
Nothing stops me from writing A+B when what I wanted was A*B.
Nothing stops me from writing A+B when what I wanted was A*B.
That's perfectly true, but if express at a high level my requirements, it would be theoretically possible for a compiler to tell me that using A*B
will not achieve those requirements, halting problem notwithstanding.
Isn't that just pushing the problem up a level with assertions? It's only an advance if the assertions are somehow more perspicuous to the developer writing them. In fact, finding such assertions, ways of thinking about the task, is a large part of the development process.
A program has necessary complexity proportional to how complex it is to even express the task it needs to carry out, and implementational complexity, related to the difficulty of actually implementing that in practice. The total complexity is the sum of the two.
For example Conways game of life is extremely simple to state, and it should be relatively simple to express the relationship between the current state and the next to a compiler. There is a wealth of literature on how to actually implement it efficiently however, which is far from simple.
By having the compiler prove that your actual code matches the specification you provide, you can safely make the implementation as complex as you like whilst knowing it still matches the spec. For problems with low implementational complexity, that's not a problem that needs solving, but for those with high implementational complexity it is.
Think of it like unit tests - unit tests can often take far longer to write than the program itself, but once you have them you can much more confidently change the program knowing you want adversely affect the behavior you're interested in. Similarly a good spec of a program can take a long time to write, but in our theoretical language, you will now be able to safely change the program knowing the compiler will catch you with certainty if you make any mistakes.
Recently seen on an otherwise excellent blog post:
Now he is correct that:
In other words there will always be programs that work, but you can't prove they work. A simple example would be:
This is a working program if RunSomeProgram runs forever and so the illegal operation is never reached, but the halting problem means that for any compiler there will always be values of
ArbitraryProgram
which run forever but the compiler doesn't know this is the case.But that's not the original statement:
This is trivially false. Consider for example AddLang. AddLang takes a comma separated list of integers as input, and prints the sum as the output. E.g
1,2,3
will output6
. It's trivial to write a compiler for that language such that if it compiles, it works.Of course that's being a bit unfair. As is clear from the followup, he was talking about a general purpose typing system - i.e. a typing system for a Turing complete language, not a DSL.
For that though, we first need to define "works". I think a reasonable definition of "works" is "works as the user intended it to". Now a clever programmer could definitely write a program which works as the client intends it to if and only if the Goldbach conjecture is true, which would be an example of a program that might work, but wouldn't compile. However that doesn't much limit us - after all just write it in a way that doesn't require solving the Goldbach conjecture. We're more interested in whether there are classes of programs we would want to write, but such a type system just wouldn't let us?
The first thing to realize is that if a programmer writes a program, he thinks it works. In other words he has an informal proof in his mind that the program displays some specific properties. Now there's 2 possibilities:
a) He was wrong, and the program doesn't actually work.
b) He was right, and the compiler should be able to validate that proof.
So for any reasonable program a human user would actually write, a compiler could theoretically validate that it does what the user wants it to do.
So the halting problem is kind of a red herring when it comes to the limits of type systems. Yes of course there are programs which work, but you can't prove work. But the exact same applies to humans as well as computers, so the programs humans write tend to provably work.
So why don't we have type systems where "if it compiles, it works"?
Firstly there's been a huge amount of research into such languages - for example F*, Coq, Agda and Idris. They are pretty close to this level, but they aren't perfect yet. If I were to identify the main challenges they would be: