Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Mentioned in

Counterfactual Induction

1Gurkenglas

1Diffractor

1Gurkenglas

1Diffractor

1Gurkenglas

1Pattern

2Diffractor

New Comment

a magma [with] some distinguished element

A monoid?

min,ϕ(A,ϕ⊢⊥) where ϕ is a propositional tautology given A

Propositional tautology given A means A⊢ϕ, right? So ϕ=⊥ would make L small.

Yup, a monoid, because and , so it acts as an identitity element, and we don't care about the order. Nice catch.

You're also correct about what propositional tautology given A means.

Then that minimum does not make a good denominator because it's always extremely small. It will pick phi to be as powerful as possible to make L small, aka set phi to bottom. (If the denominator before that version is defined at all, bottom is a propositional tautology given A.)

Oh, I see what the issue is. Propositional tautology given means , not . So yeah, when A is a boolean that is equivalent to via boolean logic alone, we can't use that A for the exact reason you said, but if A isn't equivalent to via boolean logic alone (although it may be possible to infer by other means), then the denominator isn't necessarily small.

So the valuation of any propositional consequence of A is going to be at least 1, with equality reached when it does as much of the work of proving bottom as it is possible to do in propositional calculus. Letting valuations go above 1 doesn't seem like what you want?

One of the first attempts at defining counterfactuals was the conjecture that ϕ is a counterfactual consequence of ψ when there is a short proof of ψ→ϕ.

But what if there is both a proof that ψ→ϕ and a proof that ϕ→ψ?

Trying to understand the math:

A⊬≤L⊥. (ie, there's no proof of ⊥from A in L or fewer symbols).

A⊢ϕ

There is a proof of ϕ from A.

The obvious candidate is VA(ϕ):=minL(A,ϕ⊢L⊥)minL(A⊢L⊥).

So the value of ϕ is 1 when ϕ is the shortest proof of ⊥? (A different formulation is used later, so the answer to this question might not matter.)

VA(ϕ) + VA(¬ϕ)=1.

Also, we will want the following two properties, unitarity and subadditivity. Specifically, VA(A)=1, and VA(ϕ)+VA(ψ)≥VA(ϕ∨ψ).

If the value of equivalent statements are the same then VA(ϕ∨ψ)+VA(ψ) = VA(ϕ)+VA(ψ). (This also works if ϕ and are ψ switched.)

When would VA(ϕ)+VA(ψ)=VA(ϕ∨ψ)?

Let ψ be ¬ϕ. Then VA(ϕ)+VA(¬ϕ)≥VA(ϕ∨¬ϕ). So 1≥VA(ϕ∨¬ϕ). Perhaps VA(ϕ∨¬ϕ) is independent of A, and the same for all ϕ.

Let ψ=ϕ and Z. Then is VA(ϕ)+VA(ψ) = 2*VA(ϕ)+VA(Z)? And what does ϕ∨ψ mean when ϕ∨ψ = ϕ∨(ϕ and Z)? Intuitively, VA(ϕ and Z) > VA(ϕ) because (ϕ and Z) than ϕ has more information, but it's not clear what the value of a proof corresponds to.

If there's a short proof of from and a short proof of from and they both have relatively long disproofs, then counterfacting on , should have a high value, and counterfacting on , should have a high value.

The way to read is that the stuff on the left is your collection of axioms ( is a finite collection of axioms and just means we're using the stuff in as well as the statement as our axioms), and it proves some statement.

For the first formulation of the value of a statement, the value would be 1 if adding doesn't provide any help in deriving a contradiction from A. Or, put another way, the shortest way of proving , assuming A as your axioms, is to derive and use principle of explosion. It's "independent" of A, in a sense.

There's a technicality for "equivalent" statements. We're considering "equivalent" as "propositionally equivalent given A" (Ie, it's possible to prove an iff statement with only the statements in A and boolean algebra alone. For example, is a statement provable with only boolean algebra alone. If you can prove the iff but you can't do it with boolean algebra alone, it doesn't count as equivalent. Unless is propositionally equivalent to , then is not equivalent to , (because maybe is false and is true) which renders the equality you wrote wrong, as well as making the last paragraph incoherent.

In classical probability theory, holds iff is 0. Ie, if it's impossible for both things to happen, the probability of "one of the two things happen" is the same as the sum of the probabilities for event 1 and event 2.

In our thing, we only guarantee equality for when (assuming A). This is because (first two = by propositonally equivalent statements getting the same value, the third = by being propositionally equivalent to assuming , fourth = by being propositionally equivalent to , final = by unitarity. Equality may hold in some other cases, but you don't have a guarantee of such, even if the two events are disjoint, which is a major difference from standard probability theory.

The last paragraph is confused, as previously stated. Also, there's a law of boolean algebra that is the same as . Also, the intuition is wrong, should be less than , because "probability of event 1 happens" is greater than "probability that event 1 and event 2 happens".

Highlighting something and pressing ctrl-4 turns it to LaTeX.

One of the first attempts at defining counterfactuals was the conjecture that ϕ is a counterfactual consequence of ψ when there is a short proof of ψ→ϕ.

This post will present a variant of logical induction that implements these proof-length-based counterfactuals, so there are well-defined answers about what happens in impossible worlds, based on proof length. It required generalizing the notion of a probability distribution, so these generalized probability distributions, which I'll call "valuations", may be a subject of independent interest.

Now, to begin with, why would we want to use proof-length counterfactuals, instead of the existing notion of conditional probability? Well, the first issue is with provability induction. Due to the ability of logical inductors to learn that sequences of statements are false, and start preemptively assigning the statements extremely low probability even though the length of a disproof may rise greatly, it becomes extremely cheap for traders to mess with conditional probabilities, and as stated in the logical induction paper, we don't have good results for what happens to conditional probabilities on low-probability events. The second issue is that humans don't seem to think about what-ifs like "what's the probability of a decrease in the price of pencils along with a supply shock to lumber in Estonia, now what's the probability of the supply shock in Estonia, let me divide...". Humans instead go "in the imaginary world where there's a supply shock to lumber in Estonia, what happens to pencil prices". We seem to take conditional probabilities as primitive and compute the probability of conjunctions from those.

A Note On Booleans:So, the original logical inductor algorithm just assigns values to statements as strings, while oracle induction and universal inductors operate by assigning an actual probability distribution to worlds (ie, functions from the set of atomic statements to {0,1}, with the probability of booleans given by the summed probability over all worlds that fulfill that boolean.)

The original logical inductor algorithm snuck the worlds in through the back-door via the worst-case scoring, where you consider only worlds that are propositionally consistent (no disproofs are discoverable by boolean algebra alone), thus forcing the algorithm (in the limit) to behave like an actual probability distribution over worlds. That is, two different booleans corresponding to the same set of possible worlds would have different prices in general, but differences in price let you buy the underpriced one and sell the overpriced one, and all the worlds agree that this earns you money, so the differences get ironed out over time.

By contrast, for oracle induction and universal induction, it doesn't matter which of two equivalent boolean combinations you bet on, they'll both have the same price, because the market can be interpreted as an actual probability distribution over worlds.

Put another way, in the original formulation of logical induction, the prices of equivalent booleans could be different, but their true value in a world would be the same. In oracle/universal induction, both the prices and true value of equivalent booleans are the same.

In the case of proof-length counterfactuals, the specific way of writing down a batch of worlds does affect the value in reality (because you would have to account for the length of the proof that the two sentences are propositionally equivalent). But since we will be considering the market state as a (generalized) probability distribution over worlds to see how this affects the basics of probability theory, equivalent booleans should have the same price.

So in this case, there is a situation where the true value of equivalent booleans may differ, but the price of equivalent booleans is the same.

Attempts At Defining The Value Of A Sentence:Let's consider a case where the upstream theorem prover iterates through all proofs, and reports all theorems of the form A⊢ϕ, where A is a finite set of statements, along with the length of the proof, as well as results of the form A⊬≤L⊥. (ie, there's no proof of ⊥ from A in L or fewer symbols). What should be our candidate definition for the "true" value of ϕ in the "world" where A is true? It should also ideally recover the original formulation of the value of a statement for logical inductors.

The obvious candidate is VA(ϕ):=minL(A,ϕ⊢L⊥)minL(A⊢L⊥). Ie, if there's a short disproof of A,ϕ relative to the shortest disproof of A, then the value should be low. This becomes 0, as expected, when A is consistent and there is a disproof of ϕ from A.

However, notice that this violates the basic property that VA(ϕ)+VA(¬ϕ)=1. This can be seen for ϕ that are unrelated-enough to A that adding or not-adding them doesn't make the shortest disproof any shorter. This breaks the property that the inductor converges to a particular price for ϕ in the limit, because if both of those values are high, there's no way to money-pump the prices fluctuating over time. Also, for the case of unrefutable statements, we get ∞∞ which is undefined.

Ok, time for the second attempt! This previous definition should be an

upper bound.We also want the values to add up to 1, and to guarantee this, we also have to do our proof-length accounting s.t. minL(A,ϕ⊢L⊥)+minL(A,¬ϕ⊢L⊥)≥minL(A⊢L⊥). The cleanest way to do this is to do our proof-length accounting s.t. A,ϕ⊢L⊥ and A,ψ⊢L′⊥ imply A,ϕ∨ψ⊢L+L′⊥ . Also, this ensures that purchasing a share of ϕ is exactly equivalent to selling a share of ¬ϕ, so we can just get rid of selling shares and only consider buying them, which will be convenient later. Admittedly, this property alone lets us just do the trivial Brouwer-fixed-point solution to get a market, but two propositionally equivalent statements may have prices that stay different forever since we don't have an analogue of propositional consistency yet, and we'd really like to be able to interpret our prices as something kinda like a probability distribution over impossible worlds, so we'll need something fancier to get closer to the probability theory setting.With that motivating comment, there can be an issue where, for example, ϕ and ψ are astronomically-long booleans that are propositionally equivalent to each other given A, and the shortest disproof of A isn't astronomically long. This means that the value of ϕ and ψ are unconstrained, and since we aren't requiring equivalent statements to have equivalent value (since they may not have equivalent disproof lengths), it's possible that both ϕ and ¬ψ have low value due to the "add up to 1" constraint. However, we'll want our prices to be the same on propositionally equivalent statements to give a probability-theory-flavored interpretation of them. So, consider a trader that buys 1 share of "ϕ won't happen" and 1 share of "¬ψ won't happen", and the prices of those are both low. The value of this purchase is PA(ϕ)+PA(¬ψ)−VA(ϕ)−VA(¬ψ)=1−VA(ϕ)−VA(¬ψ)>0. And again, we have a money-pump!

So we also need to impose a constraint that propositionally-identical statements must have equal value.

And now we get

anotherthing going wrong. In particular, let's say that ¬ϕ is, by a long proof, propositionally equivalent to ψ when A holds. It may be the case that A,ϕ⊢⊥ is short, and A,ψ⊢⊥ is short, but A⊢⊥ is a long proof, because the shortest way to pull it off is by getting the two sub-proofs, and then going through a long chain of reasoning to show that ¬ϕ is equivalent to ψ. Then, assume we use the maximal value for ϕ and ψ, we get VA(ϕ)+VA(ψ)=minL(A,ϕ⊢L⊥)+minL′(A,ψ⊢L′⊥)minL(A⊢L⊥)<1.Now, if we swapped out the denominator for minL(A,ϕ∨ψ⊢L⊥), by our proof-length accounting, it'd work out. So this indicates that, instead of our denominator being minL(A⊢L⊥), it should be minL,ϕ(A,ϕ⊢L⊥) where ϕ is a propositional tautology given A.

Also, we will want the following two properties, unitarity and subadditivity. Specifically, VA(A)=1, and VA(ϕ)+VA(ψ)≥VA(ϕ∨ψ). The motivation for the first should be obvious. The motivation for the second is, if we take the upper-bound view of VA corresponding to "length of shortest disproof", the shortest disproof of an or-statement should be upper-bounded by the sum of the shortest disproofs of the two component parts, because you can stick them together into a proof-by-cases disproof of the or-statement.

Now, let's put all the pieces together.

VA is an arbitrary function of type S→[0,1] (S is the set of sentences)s.t. the following properties hold.

1: Proof-length upper bound: VA(ϕ)≤minL(A,ϕ⊢L⊥)minL,ψ(A,ψ⊢L⊥) where ψ is a propositional tautology given A.

2: Propositional equivalence: VA(ϕ)=VA(ϕ′) when ϕ is propositionally equivalent to ϕ′ given A.

3: Law of the excluded middle: VA(ϕ)+VA(¬ϕ)=1.

4: Unitarity: VA(A)=1

5: Subadditivity: VA(ϕ)+VA(ψ)≥VA(ϕ∨ψ)

This now works sensibly with the case where A is consistent, and also prevents money-pumping, as we will see in the next post.

I'm

pretty surethat if A is propositionally consistent, there is always an appropriate choice of VA that fulfills these properties. Or at least I can get that there is always a choice that fulfills properties 1-4, I'm just having trouble showing that 5 is consistent with these. Counterexamples in the comments would be welcome. Assumptions we have to impose on how we count the length of proofs in order to make it work would also be welcome.Now, some exposition. First, we're lacking the crucial property that if A,ϕ propositionally implies ψ, then ψ must have a lower value than ϕ. From a proof-length standpoint, this makes sense, because you could have a very complex boolean that is a subset of a much simpler boolean, and it could be easier to rule out the simple boolean than to rule out the complex boolean due to the overhead of the propositional implication proof. Fortunately, we can construct our prices so they

dohave that property, even if reality doesn't.Also, our prices

cannotbe a probability distribution, in general! Here's an explicit counterexample. Consider 4 mutually exclusive and exhaustive events. For all sets of 1, 2, or 3 events, V assigns them a value of about 1/2. This fulfills properties 3, 4, and 5, now just assume it also fulfills 1 and 2. Now consider a trader that buys one share in all 4 of the statements of the form "one of the events from this list of three will happen". No matterwhatprobability distribution is selected, the value of that bet is always going to be +1, because the sum of the probability masses over the four statements must be 3, and the sum of the valuations is 2. It's a situation that's unwinnable for the market if the market restricts itself to outputting a probability distribution.So proof-length counterfactuals behave sorta like a probability distribution, except that the value assigned to "one of these mutually exclusive events will occur" can in general be

lessthan the sum of the value assigned to the events occurring, with perfect equality corresponding to the shortest disproof of the or-statement being a proof-by-cases disproof.Specifically, it's a valuation. Given some set A and some set S that is a magma (ie, there is a function ∙:S×S→S, which in our practical applications will be ∨ or ∪), and also has some distinguished element (⊥, ∅, in our applications) and a function g:A→S, (translate the specification of the counterfactual to a statement) a valuation V over (A,S,g) is a function A×S→[0,1] s.t.

VA(g(A))=1 (unitarity)

VA(⊥)=0 (empty set)

VA(B∙C)≤VA(B)+VA(C) (subadditivity)

These are the bare minimal requirements, but there are some other properties we can add to these, detailed later.