Paul Christiano has devised a new fundamental approach to the "Löb Problem" wherein Löb's Theorem seems to pose an obstacle to AIs building successor AIs, or adopting successor versions of their own code, that trust the same amount of mathematics as the original.  (I am currently writing up a more thorough description of the question this preliminary technical report is working on answering.  For now the main online description is in a quick Summit talk I gave.  See also Benja Fallenstein's description of the problem in the course of presenting a different angle of attack.  Roughly the problem is that mathematical systems can only prove the soundness of, aka 'trust', weaker mathematical systems.  If you try to write out an exact description of how AIs would build their successors or successor versions of their code in the most obvious way, it looks like the mathematical strength of the proof system would tend to be stepped down each time, which is undesirable.)

Paul Christiano's approach is inspired by the idea that whereof one cannot prove or disprove, thereof one must assign probabilities: and that although no mathematical system can contain its own truth predicate, a mathematical system might be able to contain a reflectively consistent probability predicate.  In particular, it looks like we can have:

∀a, b: (a < P(φ) < b)          ⇒  P(a < P('φ') < b) = 1
∀a, b: P(a ≤ P('φ') ≤ b) > 0  ⇒  a ≤ P(φ) ≤ b

Suppose I present you with the human and probabilistic version of a Gödel sentence, the Whitely sentence "You assign this statement a probability less than 30%."  If you disbelieve this statement, it is true.  If you believe it, it is false.  If you assign 30% probability to it, it is false.  If you assign 29% probability to it, it is true.

Paul's approach resolves this problem by restricting your belief about your own probability assignment to within epsilon of 30% for any epsilon.  So Paul's approach replies, "Well, I assign almost exactly 30% probability to that statement - maybe a little more, maybe a little less - in fact I think there's about a 30% chance that I'm a tiny bit under 0.3 probability and a 70% chance that I'm a tiny bit over 0.3 probability."  A standard fixed-point theorem then implies that a consistent assignment like this should exist.  If asked if the probability is over 0.2999 or under 0.30001 you will reply with a definite yes.

We haven't yet worked out a walkthrough showing if/how this solves the Löb obstacle to self-modification, and the probabilistic theory itself is nonconstructive (we've shown that something like this should exist, but not how to compute it).  Even so, a possible fundamental triumph over Tarski's theorem on the undefinability of truth and a number of standard Gödelian limitations is important news as math qua math, though work here is still in very preliminary stages.  There are even whispers of unrestricted comprehension in a probabilistic version of set theory with ∀φ: ∃S: P(x ∈ S) = P(φ(x)), though this part is not in the preliminary report and is at even earlier stages and could easily not work out at all.

It seems important to remark on how this result was developed:  Paul Christiano showed up with the idea (of consistent probabilistic reflection via a fixed-point theorem) to a week-long "math squad" (aka MIRI Workshop) with Marcello Herreshoff, Mihaly Barasz, and myself; then we all spent the next week proving that version after version of Paul's idea couldn't work or wouldn't yield self-modifying AI; until finally, a day after the workshop was supposed to end, it produced something that looked like it might work.  If we hadn't been trying to solve this problem (with hope stemming from how it seemed like the sort of thing a reflective rational agent ought to be able to do somehow), this would be just another batch of impossibility results in the math literature.  I remark on this because it may help demonstrate that Friendly AI is a productive approach to math qua math, which may aid some mathematician in becoming interested.

I further note that this does not mean the Löbian obstacle is resolved and no further work is required.  Before we can conclude that we need a computably specified version of the theory plus a walkthrough for a self-modifying agent using it.

See also the blog post on the MIRI site (and subscribe to MIRI's newsletter here to keep abreast of research updates).

This LW post is the preferred place for feedback on the paper.

EDIT:  But see discussion on a Google+ post by John Baez here.  Also see here for how to display math LaTeX in comments.

Moderation Guidelines: Reign of Terror - I delete anything I judge to be annoying or counterproductiveexpand_more

I'm very happy both to see this result and the fact that prominent mathematicians such as John Baez and Timothy Gowers seem to be taking an interest in it. It considerably increases my confidence in that

  • Current MIRI-folk will be capable of producing original, valuable, solid math results
  • By producing such results, they will become capable of attracting more talented mathematicians and producing even more such results

My confidence in MIRI being successful in its mission, while still pretty low, is now much higher than it was a few days ago.

The main theorem of the paper (Theo. 2) does not seem to me to accomplish the goals stated in the introduction of the paper. I think that it might sneakily introduce a meta-language and that this is what "solves" the problem.

What I find unsatisfactory is that the assignment of probabilities to sentences is not shown to be definable in L. This might be too much to ask, but if nothing of the kind is required, the reflection principles lack teeth. In particular, I would guess that Theo. 2 as stated is trivial, in the sense that you can simply take to only have value 0 or 1. Note, that the third reflection principle imposes no constraint on the value of on sentences of L' \ L.

Your application of the diagonalisation argument to refute the reflection scheme (2) also seems suspect, since the scheme only quantifies over sentences of L and you apply it to a sentence G which might not be in L. To be exact, you do not claim that it refutes the scheme, only that it seems to refute it.

You are completely right, and thanks for the correction! A new version with this problem fixed should be up in a bit.

(The actual proof of theorem 2 establishes the correct thing, there is just a ' missing from the statement.)

Actually, I believe you've found a bug in the paper which everyone else seems to have missed so far, but fortunately it's just a typo in the statement of the theorem! The quantification should be over L', not over L, and the proof does prove this much stronger statement. The statement in the paper is indeed trivial for the reason you say.

Given the stronger statement, the reason you can't just have P have value 0 or 1 is sentences like G <=> P('G') < 0.5: if P(G) = 0, by reflection it would follow that P(G) = 1, and if P(G) = 1, then P(not G) = 0, and by reflection it would follow that P(not G) = 1.

Some technical remarks on topological aspects of the paper:

The notion of a "coherent probability distribution" and the use of the product topology on the set of all such distributions are a bit ideosyncratic. Here's a way to relate them to more standard notions: Consider a countable first-order language L and the Stone space of complete theories over it. Equip this with the Borel-σ-algebra. (1.) A coherent probability distribution is (equivalent to) a probability distribution on the resulting measure space. (2.) The (AFAICT) most commonly used topology on the space of all such distributions is the one of weak convergence of probability measures. It turns out that this is equivalent to the product topology on the coherent probability distributions.

(1.) The main unusual feature of coherent probability distributions is that they're only defined on the sentences of L, which don't form a σ-algebra. Their equivalence classes under logical equivalence form a Boolean algebra, though, which is in particular a ring of sets (in the measure-theoretic sense) of complete theories (we identify each equivalence class with the set of complete theories containing the sentences in the equivalence class). Furthermore, this ring generates our σ-algebra: by the definition of "Stone space", the sets in the ring form a base of the Stone space, and since this base is countable, every open set is a countable union of base sets (meaning that the smallest σ-algebra containing the open sets is also the smallest σ-algebra containing the base).

A coherent probability distribution, by the alternative characterization from the paper, is a finitely additive probability measure. But a finitely additive measure on a ring is already a premeasure (i.e., σ-additive on the ring) if for every descending sequence of elements of the ring, %20%3E%200) implies , and a premeasure on a ring extends uniquely to a measure on the generated σ-algebra. Now, by the assumption, we have %20%3E%200) and therefore for all ; since , this means that the family ) has the finite intersection property, and so since Stone spaces are compact and each is clopen, the intersection of all is non-empty as desired.

(2.) The Stone space of a Boolean algebra is metrizable if and only if the Boolean algebra is countable, so since we're interested in countable languages, the notion of weak convergence of probability measures on our space is well-defined. One of the equivalent definitions is that if \ge\mathbb{P}(B)) for all open sets . We want to show that this is equivalent to convergence in the product topology on coherent probability distributions, which amounts to pointwise convergence of ) to ) for all in the base (i.e., for for some sentence ).

Suppose first that \to\mathbb{P}(A)) for all base sets and let be an arbitrary open set. can be written as a countable union of base sets; since the base is closed under Boolean operations, it follows that it can be written as a countable disjoint union (let ). For any , there is an such that \le\mathbb{P}(\bigcup_{i=1}%5Em%20A_i)+\epsilon/2). By pointwise convergence, for sufficiently large we have \le\mathbb{P}_n(A_i)%20+%20\epsilon/2m) for all . Therefore,


Since this holds for all