(Disclaimer: This post was written as part of the CFAR/MIRI AI Summer Fellows Program, and as a result is a vocalisation of my own thought process rather than a concrete or novel proposal. However, it is an area of research I shall be pursuing in the coming months, and so ought to lead to the germination of more ideas later down the line. Credit goes to Abram Demski for inspiring this particular post, and to Scott Garrabrant for encouraging the AISFP participants to actually post something.)
When reading Soares' and Levinstein's recent publication on decision theory, Death in Damascus, I was struck by one particular remark:
Unfortunately for us, there is as yet no full theory of counterlogicals [...], and for [functional decision theory (FDT)] to be successful, a more worked out theory is necessary, unless some other specification is forthcoming.
For some background: an FDT agent must consider counterfactuals about what would happen if its decision algorithm on a given input were to output a particular action. If the algorithm is deterministic (or the action under consideration is merely outside of the range of possible outputs of the agent), then it is a logical impossibility that the algorithm produces a different output. Hence the initial relevance of logical counterfactuals or counterlogicals: counterfactuals whose antecedents represent a logical impossibility.
My immediate thought about how to tackle the problem of finding a full theory of counterlogicals was to find the right logical system that captures counterlogical inference in a way adequate to our demands. For example, Soares and Fallenstein show in Toward Idealized Decision Theory that the principle of explosion (from a contradiction, anything can be proved) leads to problematic results for an FDT solution. Why not simply posit that our decision theory uses paraconsistent logic in order to block some inferences?
Instead, and to my surprise, Soares and Levinstein appear to be more concerned about finding a semantics for logical counterfactuals - loosely speaking, they are looking for a uniform interpretation of such statements which shows us what they really mean. This is what I infer from reading the papers it references on theories of counterlogicals such as Bjerring's Counterpossibles, which tries to extend Lewis' possible-worlds semantics for counterfactuals to cases of logical counterfactuals.
The approaches Soares and Levinstein make reference to do not suffice for their purposes. However, this is not because they give proof-theoretic considerations a wide berth, as I thought previously; I now believe that there is something that the semantic approach does get right.
Suppose that we found some paraconsistent logical system that permitted precisely the counterlogical inferences that agreed with our intuition: would this theory of logical counterfactuals satisfy the demands of a fully-specified functional decision theory? I would argue not. Specifically, this approach seems to give us a merely a posteriori, ad hoc justification for our theory - given that we are working towards an idealised decision theory, we ought to demand that the theory that supports it is fully motivated. In this case, this cashes out as making sure our counterlogical statements and inferences has a meaning - a meaning we can resort to to settle the truth-value of these counterlogicals.
This is not to say that investigating different logical systems is entirely futile: indeed, if the consideration above were true and a paraconsistent logic could fit the bill, then a uniform semantics for the system would serve as the last piece of the puzzle.
In the next year, I would like to investigate the problem of finding a full theory of logical counterfactuals, such that it may become a tool to be applied to a functional decision theory. This will, of course, involve finding the logical system that best captures our own reasoning about logical counterfactuals. Nonetheless, I will now also seek to find an actual motivation for whichever system seems the most promising, and the best way to find this motivation will be through finding an adequate semantics. I would welcome any suggestions in the comments about where to start looking for such a theory, or if any avenues have thus far proved to be more or less promising.