Ganesalingam and Gowers recently released an ArXiV preprint of their article "A fully automatic problem solver with human-style output" about a theorem-proving program they have written that generates human-readable output. Gowers had blogged about the program in April 2013, after soliciting responses from readers regarding answers generated by the program and by humans (without explicitly revealing that one of the answerers was a machine) in March 2013.

There do exist theorem-proving systems such as Coq. In fact, it was announced about a year ago that the proof of the Feit-Thompson odd-order theorem, which originally appeared in a 255-page journal article in 1963, had been completely verified using Coq. But the Gowers-Ganesalingam program differs in that the output of this program is a human-style proof (for better or worse) whereas Coq uses a formal language syntax (see here for a small part of the proof of the Feit-Thompson theorem using Coq).

The ArXiV preprint contains details of the program and how the authors believe it improves on existing programs. One of the ways they believe they are better than past provers is their ability to more explicitly code for picking promising lines of reasoning.

I'd appreciate comments from LessWrong readers about the specifics of the Ganesalingam-Gowers program, and any broader ramifications of their ability to create this program for predictions related to artificial intelligence.

New to LessWrong?

New Comment
14 comments, sorted by Click to highlight new comments since: Today at 3:14 AM

Very interesting. I'm actually the most struck by the potential that this system might offer for math education, with its human-readable and human-like proofs. Math games have so far been focused on teaching how to do relatively mechanical things like arithmetic and algebraic manipulation because it's easy to automatically verify the correctness of those, but maybe something like this could be used to teach the creative side of mathematics as well? Let the student dabble around with various kinds of proofs, and let the system tell them which ones are correct, and provide guidance if the student is stuck.

If theorem proving was as popular as chess, some Deep Blue MathRybka program would have solved a Millennium problem by now.

I suspect you are factually mistaken and would expect to find more resources going into theorem-proving.

Hmm, yeah, you are right. Rybka, Deep Fritz and their predecessors look like a smaller effort than ATPs, actually. And other chess programs are at about the same level (and close to the top human players' level, apparently?). Both areas look from outside like good targets for narrow AI , but apparently there is something about theorem proving that requires some poorly formalized part of human reasoning.

Human chess also involves loads of poorly formalized reasoning, but chess is a vastly smaller search space. There is no game you can play which gets you to Fermat's Last Theorem.

[-]lmm11y60

You start with the axioms, and each turn you perform one of a small set of well defined moves. You win if you get to a certain statement. I don't see any fundamental difference from a game (though admittedly we don't know whether winning is possible if the theorem hasn't been proven yet).

That works for finitely axiomatized theories. If you have axiom schemas as in standard ZF(C) or PA then there is no small set of well defined moves.

The axiom schemas are countable, which is small enough for the procedure described in the grandparent to work. Unless I'm not understanding something?

[-][anonymous]11y10

The size of the search space and lack of any reasonable heuristic.

[-]V_V11y00

Yes.

In particular, generalized chess (chess on an NxN board, with an appropriate number of pieces) is EXPTIME-complete: for any instance of a binary decision problem in EXPTIME, you can reduce it to a generalized chess board position such that the White (Black) player can win iff the answer to the instance of the decision problem is Yes.

The complexity of theorem proving depends on the formal system. There are some logics where theorem proving is PSPACE-complete, which is contained in EXPTIME.
Other interesting logics are undecidable, thus you can't reduce them to a game of generalized chess, although semidecidable logics can be reduced to other, more complex games.

[-]V_V11y30

We know very good heuristics for chess, that's not the case for general theorem proving.

Then what do high school geometry instructors teach?

[-]V_V11y90

Mostly they teach various theorems and their proofs, so that you can "stand on the shoulders of giants", and they try to train your "intuition" (incommunicable heuristics) by exercise in proving simple theorems.

More advanced math courses teach explicit heuristics (in the form of "proof strategies": general patterns that are often found in proofs), which can be programmed in an automated theorem prover to some extent, but they are not good enough to make up for the lack of human-level intuition.

On the other hand we know excellent heuristics for chess, in the form of position evaluation functions that give an higher score to positions which are more likely to lead to a victory. The chess piece relative value heuristic, for instance, is extremely simple and yet very effective. High-level computer chess programs use much more complicated evaluation functions, but chess piece relative value is usually the dominant component of it.

Also, high school geometry is much simpler than mathematics in general, and there are good automatic theorem provers for geometry.