An Informal Conjecture on Proof Length and Logical Counterfactuals

by Scott Garrabrant 5y15th May 20152 min readNo comments

6

Ω 5


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

During the May 4-6 MIRI workshop on decision theory, I made the following conjecture: If and are sentences such that the length of the shortest proof of is much shorter than the proof of , then is a legitimate counterfactual consequence of .

You may be thinking that this conjecture seems like it cannot be correct. This is the intuition that most of us shared until we started looking for (and failing to find) good counterexamples. After many failed attempts, we started referring to it as "The Trolljecture." Multiple people from the workshop are currently working on writing up posts related to it. However, the conjecture is not very formal. The purpose of this post is just to give some background for those posts and to discuss the why we are keeping the conjecture in this informal state.

There are a three ways in which this conjecture is not formal. I do not see a way to make any of these formal without disrupting the spirit of the conjecture.

  1. What do we mean by "legitimate counterfactual consequence"?

  2. What do we mean by "much shorter"?

  3. What do we mean by "length of the shortest proof"?

I will address these issues one at a time. The term "legitimate counterfactual consequence" does not have a formal definition. If it did, many open problems in decision theory and logical uncertainty would be solved. Instead, it is a pointer to a concept that humans seem to have some intuitions about. The best I can do is give some examples of legitimate counterfactual consequence and hope that we are talking about the same cluster of things.

Regardless of whether or not P=NP, "P=NP" is a legitimate counterfactual consequence of "3SATP," while "PNP" is NOT a legitimate counterfactual consequence of "3SATP."

In Newcomb's problem, "The box is empty" is a legitimate counterfactual consequence of "I two-box." This is true regardless of whether or not I am a two-boxer.

Note that just because , it does not necessarily follow that is a legitimate counterfactual consequence of .

In this post, the notation is used for " is a legitimate counterfactual consequence of ." They mean the same thing.

The term "much shorter" probably could have been defined but most ways to define it seem arbitrary, and the exact definition does not matter much. There are some ways to define it which feel less arbitrary. We could just say that there exists a computable function for which the conjecture is true if we define much shorter by " is much shorter than anything greater than ." We could even put restrictions on , like requiring it to be linear. The reason we do not try to formally define "much shorter" this way is that it would stop us from being able to in theory point at one pair of sentences to demonstrate a counterexample. For now, you can think of "much shorter" to mean "half," and we can reevaluate if a counterexample is found.

The term "length of the shortest proof" is not defined, because I have not specified a formal proof system. This is on purpose. I do not want to allow a counterexample where is not provable in PA, but has a really short proof in a stronger proof system, and that short proof is exactly why we believe is not a legitimate counterfactual consequence of .

It is worth pointing out that this conjecture is not a proposed definition of legitimate counterfactual consequences, it only provides sufficient conditions for something being a legitimate counterfactual consequence. Hopefully much more on this will be posted here by various people in the near future.

6

Ω 5