Counterfactual Induction

by Diffractor6 min read17th Dec 20197 comments

23

Ω 13

Counterfactuals
Frontpage
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

One of the first attempts at defining counterfactuals was the conjecture that is a counterfactual consequence of when there is a short proof of .

This post will present a variant of logical induction that implements these proof-length-based counterfactuals, so there are well-defined answers about what happens in impossible worlds, based on proof length. It required generalizing the notion of a probability distribution, so these generalized probability distributions, which I'll call "valuations", may be a subject of independent interest.

Now, to begin with, why would we want to use proof-length counterfactuals, instead of the existing notion of conditional probability? Well, the first issue is with provability induction. Due to the ability of logical inductors to learn that sequences of statements are false, and start preemptively assigning the statements extremely low probability even though the length of a disproof may rise greatly, it becomes extremely cheap for traders to mess with conditional probabilities, and as stated in the logical induction paper, we don't have good results for what happens to conditional probabilities on low-probability events. The second issue is that humans don't seem to think about what-ifs like "what's the probability of a decrease in the price of pencils along with a supply shock to lumber in Estonia, now what's the probability of the supply shock in Estonia, let me divide...". Humans instead go "in the imaginary world where there's a supply shock to lumber in Estonia, what happens to pencil prices". We seem to take conditional probabilities as primitive and compute the probability of conjunctions from those.

A Note On Booleans:

So, the original logical inductor algorithm just assigns values to statements as strings, while oracle induction and universal inductors operate by assigning an actual probability distribution to worlds (ie, functions from the set of atomic statements to , with the probability of booleans given by the summed probability over all worlds that fulfill that boolean.)

The original logical inductor algorithm snuck the worlds in through the back-door via the worst-case scoring, where you consider only worlds that are propositionally consistent (no disproofs are discoverable by boolean algebra alone), thus forcing the algorithm (in the limit) to behave like an actual probability distribution over worlds. That is, two different booleans corresponding to the same set of possible worlds would have different prices in general, but differences in price let you buy the underpriced one and sell the overpriced one, and all the worlds agree that this earns you money, so the differences get ironed out over time.

By contrast, for oracle induction and universal induction, it doesn't matter which of two equivalent boolean combinations you bet on, they'll both have the same price, because the market can be interpreted as an actual probability distribution over worlds.

Put another way, in the original formulation of logical induction, the prices of equivalent booleans could be different, but their true value in a world would be the same. In oracle/universal induction, both the prices and true value of equivalent booleans are the same.

In the case of proof-length counterfactuals, the specific way of writing down a batch of worlds does affect the value in reality (because you would have to account for the length of the proof that the two sentences are propositionally equivalent). But since we will be considering the market state as a (generalized) probability distribution over worlds to see how this affects the basics of probability theory, equivalent booleans should have the same price.

So in this case, there is a situation where the true value of equivalent booleans may differ, but the price of equivalent booleans is the same.

Attempts At Defining The Value Of A Sentence:

Let's consider a case where the upstream theorem prover iterates through all proofs, and reports all theorems of the form , where is a finite set of statements, along with the length of the proof, as well as results of the form . (ie, there's no proof of from in or fewer symbols). What should be our candidate definition for the "true" value of in the "world" where is true? It should also ideally recover the original formulation of the value of a statement for logical inductors.

The obvious candidate is . Ie, if there's a short disproof of relative to the shortest disproof of , then the value should be low. This becomes , as expected, when is consistent and there is a disproof of from A.

However, notice that this violates the basic property that . This can be seen for that are unrelated-enough to that adding or not-adding them doesn't make the shortest disproof any shorter. This breaks the property that the inductor converges to a particular price for in the limit, because if both of those values are high, there's no way to money-pump the prices fluctuating over time. Also, for the case of unrefutable statements, we get which is undefined.

Ok, time for the second attempt! This previous definition should be an upper bound. We also want the values to add up to 1, and to guarantee this, we also have to do our proof-length accounting s.t. . The cleanest way to do this is to do our proof-length accounting s.t. and imply . Also, this ensures that purchasing a share of is exactly equivalent to selling a share of , so we can just get rid of selling shares and only consider buying them, which will be convenient later. Admittedly, this property alone lets us just do the trivial Brouwer-fixed-point solution to get a market, but two propositionally equivalent statements may have prices that stay different forever since we don't have an analogue of propositional consistency yet, and we'd really like to be able to interpret our prices as something kinda like a probability distribution over impossible worlds, so we'll need something fancier to get closer to the probability theory setting.

With that motivating comment, there can be an issue where, for example, and are astronomically-long booleans that are propositionally equivalent to each other given , and the shortest disproof of isn't astronomically long. This means that the value of and are unconstrained, and since we aren't requiring equivalent statements to have equivalent value (since they may not have equivalent disproof lengths), it's possible that both and have low value due to the "add up to 1" constraint. However, we'll want our prices to be the same on propositionally equivalent statements to give a probability-theory-flavored interpretation of them. So, consider a trader that buys 1 share of " won't happen" and 1 share of " won't happen", and the prices of those are both low. The value of this purchase is . And again, we have a money-pump!

So we also need to impose a constraint that propositionally-identical statements must have equal value.

And now we get another thing going wrong. In particular, let's say that is, by a long proof, propositionally equivalent to when holds. It may be the case that is short, and is short, but is a long proof, because the shortest way to pull it off is by getting the two sub-proofs, and then going through a long chain of reasoning to show that is equivalent to . Then, assume we use the maximal value for and , we get .

Now, if we swapped out the denominator for , by our proof-length accounting, it'd work out. So this indicates that, instead of our denominator being , it should be where is a propositional tautology given .

Also, we will want the following two properties, unitarity and subadditivity. Specifically, , and . The motivation for the first should be obvious. The motivation for the second is, if we take the upper-bound view of corresponding to "length of shortest disproof", the shortest disproof of an or-statement should be upper-bounded by the sum of the shortest disproofs of the two component parts, because you can stick them together into a proof-by-cases disproof of the or-statement.

Now, let's put all the pieces together.

is an arbitrary function of type ( is the set of sentences)s.t. the following properties hold.

1: Proof-length upper bound: where is a propositional tautology given .

2: Propositional equivalence: when is propositionally equivalent to given .

3: Law of the excluded middle: .

4: Unitarity:

5: Subadditivity:

This now works sensibly with the case where is consistent, and also prevents money-pumping, as we will see in the next post.

I'm pretty sure that if is propositionally consistent, there is always an appropriate choice of that fulfills these properties. Or at least I can get that there is always a choice that fulfills properties 1-4, I'm just having trouble showing that 5 is consistent with these. Counterexamples in the comments would be welcome. Assumptions we have to impose on how we count the length of proofs in order to make it work would also be welcome.

Now, some exposition. First, we're lacking the crucial property that if propositionally implies , then must have a lower value than . From a proof-length standpoint, this makes sense, because you could have a very complex boolean that is a subset of a much simpler boolean, and it could be easier to rule out the simple boolean than to rule out the complex boolean due to the overhead of the propositional implication proof. Fortunately, we can construct our prices so they do have that property, even if reality doesn't.

Also, our prices cannot be a probability distribution, in general! Here's an explicit counterexample. Consider 4 mutually exclusive and exhaustive events. For all sets of 1, 2, or 3 events, assigns them a value of about 1/2. This fulfills properties 3, 4, and 5, now just assume it also fulfills 1 and 2. Now consider a trader that buys one share in all 4 of the statements of the form "one of the events from this list of three will happen". No matter what probability distribution is selected, the value of that bet is always going to be +1, because the sum of the probability masses over the four statements must be 3, and the sum of the valuations is 2. It's a situation that's unwinnable for the market if the market restricts itself to outputting a probability distribution.

So proof-length counterfactuals behave sorta like a probability distribution, except that the value assigned to "one of these mutually exclusive events will occur" can in general be less than the sum of the value assigned to the events occurring, with perfect equality corresponding to the shortest disproof of the or-statement being a proof-by-cases disproof.

Specifically, it's a valuation. Given some set and some set that is a magma (ie, there is a function , which in our practical applications will be or ), and also has some distinguished element (, , in our applications) and a function , (translate the specification of the counterfactual to a statement) a valuation over is a function s.t.

(unitarity)

(empty set)

(subadditivity)

These are the bare minimal requirements, but there are some other properties we can add to these, detailed later.





23

Ω 13