Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a probabilistic approach to logical counterfactuals inspired by the Demski prior. Given a sentence , we wish to ask what other logical sentences are true in the world where we correctly counterfactually assume . I spoke about this before here and here, but now I am going to approach the problem slightly differently. Instead of insisting that every other sentence is either true or false in the counterfactual world , I allow to assign probabilities to each sentence.

When you try to describe counterfactual worlds as sets of true and false sentences, you necessarily have very sharp boundaries. You will have, for example, counterfactual worlds where and are true, but is false. Probabilities seem to be more realistic by allowing us to smooth the boundaries. As you look at sentences which are more and more logically distinct from , you can gradually change probabilities so that they will represent the truth, rather than representing consequences of .

Let be a measure on logical sentences, for example, , where is the number of bits necessary to encode . Let be the theory containing all sentences true about the natural numbers. Consider the following procedure which computes . This definition is only for which are consistent by themselves.

Let be the theory containing only the sentence . For , compute from by sampling a sentence according to . If both and are consistent, then . Otherwise, . Let be the union of all of the . is the probability that is a consequence of .

As it is, this procedure is not approximable, but you can make a similar thing approximable by replacing with , or a complete theory sampled from your favorite approximable distribution.

Claim: gives a coherent probability assignment which assigns probability 1 to , and thus represents a probability distribution on complete theories.

Proof: The probability distribution on complete theories is exactly the distribution on . All we need to show is that is complete (with probability 1). Take a sentence . Either or is consistent with . WLOG assume is consistent with . The sentence is eventually sampled at some time . Either is added to , or it is inconsistent with . Therefore, either proves , or proves , so proves either or . Note that does not necessarily contain either or as an axiom.

Now I will give the reasons I am considering this proposal. None of the following is stuff I actually know to be true. I think it is plausible that my intuitions about the result of this procedure are very wrong.

It seems that true sentences will generally have high probabilities. Thus, if and are both complex sentences, and there is a simple proof of , it is likely that many true sentences will be sampled and accepted before is considered. Thus, it seems plausible that sufficiently many simple axioms to complete the simple proof of will be accepted before is considered. If this happens, will automatically be included. It seems then that will have a high probability.

Thus it is plausible that this proposal follows the spirit of the conjecture that simple proofs of correspond to legitimate counterfactuals. Note that this informal argument only goes the one direction. If If there is a simple proof of , it seems likely that will be large, but it does not seem likely that will be large. This in consistent with my idea of logical counterfactuals.

Proof-length based definitions of counterfactuals usually have the unfortunate property that they are dependent on the formalities of our proof system, and equivalent proof systems can give very different proof lengths. Perhaps this proposal can get many of the nice properties of proof length based systems, while being independent choosing different equivalent ways to carry out proofs.

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 5:57 AM

I am, as usual, a bit confused. If you require a sentence to be consistent with (e.g.) PA before being added to , this proposal is unable to assign nonzero probability to the trillionth digit of pi being 2 - and conditional on the trillionth digit of pi counterfactually being 2, it is unable to go on to believe in PA.

It seems like some looser condition for adding to the theory is needed. Not just as a sop to practicality, but to get at some important desiderata of logical counterfactuals.

Here's the picture of logical counterfactuals that I'm currently thinking under:

People have some method for generating mental models of math, and these mental models have independencies that the ground truth mathematics doesn't. E.g., when I imagine the trillionth digit of pi being 2, it doesn't change (in the mental model) whether the Collatz conjecture is true. In fact, for typical scenarios I consider, I can continue to endorse (in my mental model) the typical properties of the real numbers, even when considering a collection of statements, most of which are inconsistent with those properties (like assigning a distribution over some digit of pi).

This apparent independence produces an apparent partial causal graph (within a certain family of mental models), which leads to the use of causal language like "Even if one set the trillionth digit of pi to 2, it would not change the things I'm taking for granted in my mental models, nor would it change the things that would not change in my mental model when I change the setting of this digit of pi."