A gentleman by the name of Dan Willard, who recently passed, was a pioneer in a relatively unexamined area of mathematics: Mathematical frameworks that are just trivial enough to bypass Gödel.  That is, mathematical frameworks that can prove their own consistency.  He did so for a reason which may be of interest here: Creating AI that can verify itself.

I stumbled across him for an entirely unrelated reason, related to my username.  (Noticing that, insofar as the universe is a mathematical system, things seem like they might get kind of weird if Gödel applies.  I had this weird idea about a perfectly bijective mathematical framework and started investigating whether or not it would be subject to Gödel - answer: unclear, as I found this guy instead, and started thinking about his ideas.)

Willard's approach is of course one option - creating mathematics that simply lacked the precise mechanics that run into Gödel, namely, multiplication and addition (he has them, but they're defined in a way that avoids the issue).  Thus, the approach is necessarily incomplete - that is, it cannot prove every true statement about the arithmetic of the natural numbers.

But since no finite (compressible to a finite algorithmic basis or set of bases) mathematical system can, we can't exactly hold that against it.  (And also, I'm left with a personally annoyed sense of "Wait, that was always an option?")

Thus, contrary to a naive understanding of Gödel, we can in fact create a mathematical framework that can prove its own consistency!

This doesn't imply an easy solution to the Halting Problem, which would be the next step; we could cripple a language in a similar fashion to avoid the disproofs - say, by assigning a finite integer index N to each function with a lower bound of 0, and only permitting functions to call functions with a lower index.  Is the halting problem decidable?

Well, if you can express the Goldbach Conjecture in it, I'm going to lean towards "No."  The way that addition and multiplication are defined may preclude the expression of the Goldbach Conjecture, but it doesn't quickly look like this is the case to my eye.  (There are some modifications to the Turing Machine I have considered that look like they might make the problem tractable, including the above indices, but they also complicate the heck out of any proofs, as they prevent induction within the framework.  Also they're necessarily Turing-incomplete.)

There is thus a trade-off between something that rhymes with "power", and something that rhymes with "safety".  However, there's not necessarily a reason that the mechanisms responsible for decision-making need to be as powerful as the mechanisms response for reasoning.  It seems preferable to have consistent preferences over the ability to express more powerful preferences - and this isn't necessarily incompatible with more powerful reasoning capabilities.

So, as a path forward, give consideration to Willard's approach, at least with respect to decision theory: Perhaps the most useful mathematical framework in which to express decision theory is not the more powerful one, but the provably self-consistent one.

New Comment
2 comments, sorted by Click to highlight new comments since:

"AI that can verify itself" seems likely doable for reasons wholly unrelated to metamathematics (unlike what you claim offhandedly) since AIs are finite objects that nevertheless need to handle a combinatorially large space. This has the flavor of "searching a combinatorial explosion based on a small yet well-structured set of criteria" (ie the relatively easy instances of various NP problems), which has had a fair bit of success with SAT/SMT solvers and nonconvex optimizers and evolutionary algorithms and whatnot. I don't think constructing a system that systematically explores the exponentially big input space of a neural network is going to be too hard a challenge.

Also, has anyone really constructed a specific self-verifying theory yet? (It seems like from the English Wikipedia article, if I understand correctly, Dan Willard came up with a system where subtraction and division are primitive operations with addition and multiplication defined in terms of them, with it being impossible to prove that multiplication is a total function.)

The issue arises specifically in the situation of recursive self-improvement: You can't prove self-consistency in mathematical frameworks of "sufficient complexity" (that is, containing the rules of arithmetic in a provable manner).

What this cashes out to is that, considering AI as a mathematical framework, and the next generation of AI (designed by the first) as a secondary mathematical framework - you can't actually prove that there are no contradictions in an umbrella mathematical framework that comprises both of them, if they are of "sufficient complexity".  Which means an AI cannot -prove- that a successor AI has not experienced value drift - that is, that the combined mathematical framework does not contain contradictions - if they are of sufficient complexity.

To illustrate the issue, suppose the existence of a powerful creator AI, designing its successor; the successor, presumably, is more powerful than the creator AI in some fashion, and so there are areas of the combinatorially large space that the successor AI can explore (in a reasonable timeframe), but that the creator AI cannot.  If the creator can prove there are no contradictions in the combined mathematical framework - then, supposing its values are embedded in that framework in a provable manner, it can be assured that the successor has not experienced value drift.

 

Mind, I don't particularly think the above scenario is terribly likely; I have strong doubts about basically everything in there, in particular the idea of provable values.  I created the post for the five or six people who might still be interested in ideas I haven't seen kicked around on Less Wrong for over a decade.