Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

One way of viewing my recent posts is as an argument against using logic in the first place. Although I had the argument in the back of my mind while writing, I doubt it came across (partly because I don't actually agree with the conclusion). Here's how it would go.

No computable probability distribution can be coherent with respect to first-order logic. This follows from seperability concerns. We're forced to put some probability on inconsistent outcomes. But really, it's much worse: consistent outcomes have measure zero in any computable probability distribution. This doesn't mean that we have to assign probability one to "false" or anything like that; the whole point is that we can't recognize all the consequences of our beliefs. Uncertainty about logical consequences translates into putting probability on logically inconsistent possibilities.

Up until I started thinking about betting on logical coins (a concept Wei Dai illustrated earlier with counterfactual mugging with a logical coin), I thought this problem would be solved by a sequence of probability distributions, which approach logically consistent beliefs. This approach can be augmented by other good properties such as uniform coherence. This is still the most promising overall approach. However:

  1. Problems like counterfactual mugging with a logical coin call for a logically updateless decision theory.
  2. Even for problems which do not need an updateless solution, an agent which notices that the progression from to is not a Bayesian update on the new information will tend to self-modify to "fix" that. Where the future self will predictably vary from Bayesian updates on new knowledge, it will achieve lower expected utility, almost by definition; this is formalized in the Diachronic Dutch Book argument. Hence, the agent will want to get rid of all the wonderful non-Bayesian machinery for logical uncertainty. As Scott put it (in private conversation), this is the two-update problem for decision theory.
  3. If we do use a Bayesian update for logical information, we become trollable; our probability estimates can be manipulated by selective proofs. Furthermore, because the set of provable things has measure zero at any time, we diverge rather than converge as we learn more and more; I have not proved that this happens in all cases yet, but the situation does not look good. The AI might think this is the highest-utility way of doing things, but that's only because the AI assigns probability zero to the way things actually are; it does not expect that a troll can manipulate it with simple proof-search strategies and is not able to learn better as the troll continues to be extraordinarily lucky.

We can solve #1 and #2 together; a way to handle logical uncertainty with a Bayesian update more or less gives us a good prior to use in an updateless way, and vice versa. The result is a probability distribution which represents its uncertainty by assigning probability to impossible possible worlds. This is already a departure from a strict view of logic, committing us to what might be called a weak paraconsistency (though we need not endorse the possibility of true contradictions). This doesn't directly bother me, though. When logical consistency and reflective consistency are in conflict, I'd prefer reflective consistency.

There are problems with this approach, however. Besides giving up other desirable properties for logical uncertainty, this runs straight into problem #3. I don't have a convergent way of handling logical uncertainty with a Bayesian update, and there may well not be one. Even if the AI is doomed to conclude that allowing itself to be manipulable and non-convergent achieves higher expected utility, it doesn't seem like we want to accept that conclusion: from "outside", we can see that it is assigning "unreasonably" low probability to a chain of events which can actually occur. (This is unlike the usual situation with updateless decision theory, where it seems we're forced to agree with the system's estimation that it is better off self-modifying into an updateless version of itself.)

Given all of this (and assuming that certain arguments in #2 and #3 continue to check out in more sophisticated treatments), I think there is an argument which shouldn't be dismissed out of hand for abandoning logic as a representation. If logical constraints on beliefs cause so many problems, perhaps the better course is to not try to enforce them. Perhaps rational agents need not have beliefs over expressive logics of this sort.

The source of the problem is, largely, attempting to use a model whose inference procedure never terminates. Logical consequences keep unfolding forever, in a way which cannot be captured by finite probability distributions. We can't always know all the consequences of a belief, so we have to keep thinking forever in order to give it a really (reliably) appropriate probability. Why would we want to use a model like that? Might we be better off sticking to computable model classes, which we can actually use?

This doesn't automatically solve the problem. It's not clear that it solves anything at all. Problems in bounded Solomonoff induction which were inspired by logical uncertainty can also be motivated by the more mundane problem of empirical uncertainty computed at different levels of processing power. If Solomonoff Induction is going to be the guiding theory, it's also uncomputable; like probability distributions over logic, it needs to be approximated. Depending on details, this could imply a "second update" with similar problems. For example, imagine an approximation which runs a finite ensemble of predictive models while searching for better models in the background. The competition between models in the current pool might constitute Bayesian updates on evidence; however, bringing in an entirely new model does not. Might this create a reflective inconsistency similar to the one inherent in logic? I don't know.

I think logic will most likely continue to be a good target for rational beliefs. It gives us a lot of structure for reasoning about things such as how the agent will reason about itself. It brings interesting difficulties such as this to the forefront, which other formalisms may merely obscure rather than solve. Disowning logic would be a rather dissatisfying way to reconcile the conflict between logic and probability theory. The argument here is enough to make me wonder, however. It could be that logic is somehow the wrong formalism for the job.

New Comment