EDIT: Vladimir_Nesov has commented that this idea already exists under the name of "the chicken rule". I think I'll still leave this post up because I personally like the presentation, but just know that I wasn't the first person to think of it.
I believe I have found a solution to spurious counterfactuals based on proof theory. The algorithm is based on Rosser's trick.
In case math isn't your thing, skip ahead to a fun experiment where, as a human, you can try a related, if slightly silly, algorithm.
Since "spurious counterfactual" is a bit of a vague concept, its hard to say definitively that I've solved it (human error aside). However, I can at least prove that any proof of "not action A" must be longer than the proof of "action A leads to utility X", so in particular you at least can not use the principle of explosion. Logic can reach the same conclusions in different ways, but in this case, a spurious counterfactual could only be used if it is shorter than a proof using the principle of explosion. We could even require that it is much shorter at the cost of making the algorithm less time efficient.
Note that the usual solution I am aware of is "choose a random action with some small probability" so that choosing that action is no longer logically impossible. I find the proof theoretic solution I am about to describe much more satisfying though. Choosing the bad option is logically impossible, but the agent can reason about it anyways.
Description of the Algorithm
The action we are going to calculate is (to be defined shortly), represented by a Turing machine. Our options comes from some finite set S. We are trying to maximize , also represented by a Turing machine ( should calculate expected utility, but for the rest of this post I'll assume it returns a rational number).
Note in particular that does not have any parameters. It can mention via its definition (or even logically equivalent definitions), but we do not represent it as a parameter! can only influence "A priori".
We also must fix a formal system; I'll assume that it is Peano arithmetic, but other theories, such as ZFC, should work fine as well.
We have one requirement on . For any action , there must exist a rational number such PA proves that . Basically, if we know we can figure out (using PA). In particular, if halts, so does .
Here is the pseudo-code for :
procedure f: act_table :=  for s in S: i := 0 utility := None loop: for proof in PA_Proof(length := i): if proof == "f = s => u = r" for some rational r: //Comment: use quining when refering to f (including within u if necessary) utility := r if proof == "f ≠ s": return s //Comment: Rosser's trick. Should never execute, but makes it harder for PA to analyze if utility ≠ None: insert(act_table, (s, utility)) break else: i := i + 1 return first component of (max act_table by second component)
This seems like a pretty standard maximizer. It maximizes something to do with proofs, but that isn't anything special. But notice the line with the "Rosser's trick" comment. Why would we return s if we prove that f would not return s?
Assuming PA is consistent, this line will never execute. But PA doesn't know that PA is consistent. This already looks extremely difficult for PA to rule out that it will return s.
And it doesn't just look hard, we can actually quantify exactly how hard. Let's say the shortest proof of a statement (for any rational r) is of length n. Then it is impossible for PA to prove that within n steps, because then would return due to that line, making PA inconsistent (which we are assuming is not the case). This is essentially the same reasoning used in Gödel's speed-up theorem.
If we want to make it even harder for PA to prove that , we just need to have look for a proof of for even longer proofs (like n + 100 or something).
When looking for the proof of , we aren't really doing a counterfactual in the traditional sense. The agent is not considering "What if I am in a world where I choose s?" or "What if I am in a world where I learn that I choose s?". The agent is considering a material conditional. The agent is considering "What if the world I am already in is the world in which I choose s?".
For example, a logical implication might be that similar agents in similar situations would choose similar actions. That's because if they are analogous or isomorphic in certain situations, you might be able to transfer the mathematical knowledge that into a theorem that other agents act in certain ways. This in turn affects the utility: the agent might take certain actions just because it implies logically the other agents will take actions that just so happen to increase the original agent's utility.
Unfortunately, detailed analysis will depend heavily on the details of how formal system is set up. It turns out that doing what is essentially a massive "search and replace" on math itself is "complicated" to say the least. Since is just a definition, not a parameter, any branch of math that just so happens to mention (or something that is mathematically related to it) will get warped, even if the context is something entirely different than decision theory.
Humanly feasible "Algorithm": Highly Functional Decision Theory
Warning: this section contains humor
It is a cardinal sin to conflate human reasoning with mathematical formal proof systems. But what if I want to do it anyways?
Introducing Highly functional decision theory
Did you keep trying to make decisions, but the principle of explosion keeps getting in the way! You can say goodbye to this problem with one weird trick!
Highly functional decision theory: Go through your set of options, considering for each one "Assuming that I will do this option, what will my expected utility be?". However if at *any point* during your reasoning, you come to the unconditional conclusion "I *definitely* would not do option s", even if you stumbled into the conclusion accidentally, you immediately do s without any further deliberation. If this doesn't happen, perform the action that implied the highest utility.
Sounds great, right? Also, if the second clause causes you to do a weird action that causes concern to other people, I suggest an explanation like the following: "I am highly functional when it comes to decision theory; it can be hard for an outsider to appreciate such advanced decision making". Also, leave a comment on this post documenting the event.
The reason for the second clause isn't just to make you do dumb things. It is to force the assumption "I do action s" to be plausible even if s is dumb. Any attempts to dismiss the statement "I do action s" as false before algorithm finishes is rendered inconsistent. Therefore, you can't reasonably dismiss it and you must reason from the statement in a "normal" way (i.e. not using a spurious counterfactual).
Here is an example. Suppose you walking close to a lake in a park, and need to decide on the best way to enjoy the park. One possible option is spontaneously jumping into the lake to go for a swim. Your reasoning might look like this:
Assume that I will jump into the lake:
- I will gain some utility directly from the enjoyment and exercise that comes from swimming.
- This decision to go swimming increases my estimate for how much I like swimming. I must consider how the character trait of "liking swimming" will affect my utility for the rest of my life.
- The people around me might be similar to me in certain ways, such as genetically or in that they are also HFDTs (highly functional decision theorists). If this is the case, it stands to reason that they will also spontaneously jump into the lake at the same time. I must consider the opportunities, drawbacks, and risks that this will bring to my swimming adventure.
- I must also consider take into consideration that I won't choose other options for my recreational activity when calculating my utility. The effects could be similar to the above three bullet points, but inverted.
So there you have it. Feel free to give the algorithm a try and report back.
In practice, S is a finite set of finite lookup tables for how the agent should respond to information. I think there is an easy way to adapt it to functions with infinite domains, but if needs to use the lookup table an unbounded number of times, it won't be guaranteed to halt anyways.
Truth be told, I have tried the algorithm myself, and it has had promising results so far (it seems to be helpful in short-circuiting the logical flaws in "Learned helplessness" style problems). I think a more serious name could be "Logical Decision Theory" (because it is based on logical implication instead of counterfactual implication). To be clear, this would be a variety/implementation of functional decision theory. However, due to how absurd the algorithm is from a human perspective, and the terrible premise of treating humans like a formal proof system, I decided to give this section a whimsical tone.
I recommend choosing a set of acceptable options you will limit yourself too before starting the algorithm. Although it would be extremely funny, I would also feel slightly bad if you come to the conclusion "I would never scream as loud as possible in public right now". You would be forced to choose between breaking my silly algorithm (you pre-committed to follow it before hand, right?) and alerting all your fellow HFDTs in the local area to your presence.
You might say "but wait, I would just be choosing to swim because of the decision theory, not because I wanted to!". If you argued that you committed a crime because a decision theory made you do it, would that hold up in a court of law? If not, why should that style of argument hold up in your mind-courts?
Enumerating these is left as an exercise for the reader, but I will list at least one risk to get you started: one of the other HFDTs could declare war on the local fish population, and you could get caught up in the conflict. (Guess what happens if you say "but HFDTs definitely wouldn't declare war on the local fish population, they are too rational!".)
Quick note: this doesn't trigger the second clause of the algorithm because "I won't choose the other options" is conditioned on "I choose to swim". This is why the algorithm says unconditional conclusion.
That said, feel free to include slightly silly options. You need to punish your brain for failing to follow Gödel's Second Incompleteness Theorem somehow.
This is known as the chicken rule, or playing chicken with the universe.
Ah, makes sense this was discovered before. Thanks! I have added a link to your comment at the top of the post.
The chicken rule mitigated spurious counterfactuals for the original proof-based UDT back in 2011. But in a provability modal logic formulation the chicken rule is not needed (there is now an even nicer decision rule formulation in provability logic). And the result goes through without an oracle using a variant of Löb's theorem for provability with proofs of bounded length.
Oh, very nice!
I thought it was a bit "cheating" to give the programs access to an oracle that the formal system couldn't decide (but that thing with the finite number of options is quite elegant and satisfying).
That paper about who long you need to search is super interesting! I wasn't sure who long you would need to search if you disallowed infinite search.