Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.


This is an attempt to build a basis for formalising logical counterfactuals. It's not workable as is, but I think it can potentially solve the problem of "outthinking the counterfactual".

For causal counterfactuals, we build a directed acyclic causal graph. We then modify the value of the node representing the condition, and recompute the value of the descendent nodes. I try to mirror this apporach.

The nodes of our new graph will of course be propositions, and the edges inferences. There are problems however. An inference can have multiple antecedents, and we cant just represent this by giving them all and edge to the consequent, because there can also be multiple ways to infer that proposition, and it makes much more sense for multiple incoming edges to mean that. The graph also won't be acyclic, so even if we found a way to recompute along one edge, we could get different outcomes depending on how we move through the graph. To resolve this, we need to look at logic a bit differently.

I'm using a formulation of logic as outlined by Donald Hobson here. TLDR: All inference rules have only one antecedent. They are substitutions which lead to a logically equivalent statement. Since logical equivalence is symmetrical we will make an undirected graph, which is simpler. Our graphy will have one node for every grammatical proposition of the theory, and an edge between any two conneced directly by an inference rule.

A proof of a proposition  is then a path through the graph from  to .

Logical Counterfactual of a Set Conditional

Now, how do we solve the cyclicality? By cutting of so much that you cant go around the conditioning statement. Its clear that we can't change just that statement, because you could trivially do all the same proofs from it's double negation. So to begin we will define the condition not as a single statements but a set of them that is modified. Some definitions:

: A single proposition.

: A set of propositions such that every proof of  contains at least one element of , and that there is no subset of  that can be removed without losing this property, and which doesn't contain .

: The set of all propositions that have a proof not containing any element of .

: The set of all propositions which are in  or whichs negation is in .

For , a -proof of  is a path from  to  in a modified version of the graph, where for every  and every , if there is an inference from  to , then the inference rule it uses can not be used with the same values for the parameters as in that use, and for every  has been added as an inference rule. I write  for "There is a -proof of .".

(What do I mean by parameters? Say you have the proposition like  and apply the rule  to conclude , then I say that  is a parameter and its value is "". )

Let  be a function which assigns every element of C either itself or its negation. Then if  and , then "If , then (counterlogically) ." is true, and if  , then "If , then (counterlogically) ."


We can now define the counterfactual of single proposition in terms of that of sets. For this we need to define a  and  depending on that proposition. This is where I'm stuck for now. Proceding from here would be some combination of finding restrictions on which values of  and  give defined answers, considerations from the game-theoretic applications that can further limit them (after all, it may well be that just the conditional proposition on its own is insufficient for a unique answer, and that more information about the context is needed to ask for the "correct" one), and a proof of invariance of the result in this sufficiently narrowed-down space.

If something like this is possible, it could explain why running logical inductors for a short time can often give intuitively correct answers to logical counterfactuals. If the conditional is "closer" to the consequent than to the axioms, then the proof representing a path from conditional to consequent is likely found before a proof of the consequent itself. The broad cutting of this method ensures that this second proof is never found.

I hope someone with more mathematical background knowledge (and time) than me can complete this in a fruitful way.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 12:08 AM

Hi Bunthut,

First, I'll encourage you to have a look at material on what I thought this post was going to be about from the title: . I know about this subject primarily from (which is more concrete/mathematical than the Wikipedia article, as it's written by computer scientists rather than philosophers).

Second: If I'm understanding this correctly in my sleep deprived state, you're actually working on the exact same problem that I am in one of my papers, except that we focus on causality/counterfactuals in logic systems which correspond to programming language semantics. I got stuck on similar problemss. I can send you a preprint if you PM me.

Overall, my take is that you're getting stuck in superficial differences of formalisttm, which makes it much harder for sleep-deprived me to find the insight. E.g.: the desire to use graphs to represent proofs; with the proper conceptual toolkit, this "multiple-antecedent problem" is a non-issue. (Two solutions: (1) use hypergraphs [which we do in the current draft of the paper]; (2) join antecedents into one, as is done in proof categories.)

Hello Darmani,

I'm indeed talking about a kind of counterfactual conditional, one that could apply to logical rather than causal dependencies.

Avoiding multiple antecedents isn't just a matter of what my conceptual toolkit can handle; I could well have done two different types of nodes, that would have represented it just fine. However restricting inferences this way makes a lot of things easier. For example it means that all inferences to any point "after" come only from propositions that have come through . If an inference could have multiple antecedents, then there would be inferences that combine a premise derived from with a premise in , and its not clear if the separation can be kept here.

From the paper you linked... Their first definiton of the counterfactual (the one where the consequent can only be a simple formula) describes the causal counterfactual (well, the indeterminstic-but-not-propabilistic protocolls throw things off a bit, but we can ignore those), and the whole "tremble" analysis closely resembles causal decision theory. Now their concept of knowledge is interesting because its defined with respect to the ecosystem, and so is implicity knowledge of the other agents strategy, but the subsequent definition of belief rather kneecaps this. The problem that belief is supposed to solve is very similar to the rederivation problem I'm trying to get around, but its formulated in terms of model theory. This seems like a bad way to formulate it, because having a model is a holistic property of the formal system, and our counterfactual surgery is basically trying to break a few things in that system without destroying the whole. And indeed the way belief is defined is basically to always assume that no further deviation from the known strategy will occur, so its impossible to use the counterfactuals based on it to evaluate different strategies. Or applying it to "strategies" of logical derivation: If you try to evaluate "what of X wasnt derived by the system", it'll do normal logical derivations, then the first time it would derive X it derives non-X instead, and then it continues doing correct derivations and soon derives X and therefore contradiction.

PM is sent.