Counterfactual Induction (Algorithm Sketch, Fixpoint proof)

Diffractor

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

So, to begin with, here's how the algorithm works.

The upstream algorithm iterates through all proofs, and records the lengths of all proofs of the form "a finite collection of sentences implies $⊥$ ". Also the proof length accounting is set up such that if $A, ϕ ⊢_{L} ⊥$ and $A, ψ ⊢_{L} ⊥$ , then $A, ϕ \lor ψ ⊢_{L + L^{'}} ⊥$ . Also, as soon as it checks all proofs of length $n$ or shorter from $A$ +propositional tautology given $A$ , with no contradictions found, it reports that $A$ is contradiction-free for at least $n$ steps. (this doesn't require searching an infinite set because the number of propositional tautologies with length shorter than $n$ is finite)

With $S$ being the set of all math sentences, and $P_{f i n, p c} (S)$ being the set of all finite subsets of sentences that are propositionally consistent, the market $P$ is a partial of type $P_{f i n, p c} (S) \times S \to [0, 1]$ , which fulfills the following four axioms.

(note: $A$ appearing where a sentence would normally go refers to the collection of statements in $A$ expressed as one big boolean and-statement, and $ϕ ⊢_{p c} ψ$ means that given $ϕ$ , $ψ$ is provable using only the rules of inference for propositional calculus)

1: Unitarity. $\forall A : P_{A} (A) = 1$

2: Subadditivity. $\forall A, ϕ, ψ : P_{A} (ϕ) + P_{A} (ψ) \geq P_{A} (ϕ \lor ψ)$

3: Law of Excluded Middle. $\forall A, ϕ : P_{A} (ϕ) + P_{A} (\neg ϕ) = 1$

4: Propositional Monotonicity: $\forall A, ϕ, ψ : A, ϕ ⊢_{p c} ψ \to P_{A} (ψ) \geq P_{A} (ϕ)$

We will also consider a fifth axiom, propositional equivalence, which is implied by axiom 4.

5: Propositional Equivalence: $\forall A, ϕ, ψ : A ⊢_{p c} ϕ \leftrightarrow ψ \to P_{A} (ψ) = P_{A} (ϕ)$

The market fulfills axioms 1-4, the worlds we are defending against fulfill axioms 1-3 and 5. Axioms 1, 3, and 5 suffice to show the empty set property that $P_{A} (⊥) = 0$ , so that one comes for free and doesn't need to be specified.

Traders are poly-time algorithms that output a continuous circuit that takes as input the pricing $P$ and output a nonnegative number for each pair of the form $(A, ϕ)$ . This should be interpreted as a bet against $ϕ$ in the $A$ counterfactual, and has a payoff of $P_{A} (ϕ) - V_{A} (ϕ)$ . Due to law of excluded middle for both $V$ and $P$ , selling these statements is exactly equivalent to betting against $\neg ϕ$ .

The budgeting of the traders works the same way as usual, although dealing with finding the worst-case value for a trader's trades takes a bit more care since the space of allowable valuations isn't discrete. Also, the proof that the supertrader exploits if any component does goes through in the same way.

Let's show that finding the worst-case valuation is computable. We can split the trader's pile of shares into a bunch of subpiles, one for each $A$ , and minimize these individually. For each A-pile, there's a finite collection of sentences $S$ that have been bet on. Then, define the following subset of $[0, 1]^{S}$ . The set of valuations that fulfill unitarity, subadditivity, known proof length bounds, law of excluded middle, and propositional equivalence (assumed to exist by previous post). Since the proof searcher reports the lengths of proofs it has found, as well as when it has verified that no disproofs exist below a certain length, this imposes rational number upper bounds on how much to value a sentence at. The set of valuations that fulfill all the constraints (again, assuming existence) is convex, compact, and defined by finitely many inequalities and equalities. The equation giving the wealth of the trader in a valuation, $\sum_{ϕ, n} c_{ϕ, n} (P_{n, A} (ϕ) - V_{A} (ϕ))$ is linear because the prices just turn into constants, so it's just a "minimize this linear function over this definable convex set" problem, which there are well-known algorithms for. Do this for all $A$ and sum up the answers to get a worst-case value for the trader.

Now, finally, we need to show that given any bet from the supertrader, there exists a nonempty compact convex set of prices that fulfill all our axioms, make the bet unwinnable, and the function from prices to sets of prices has closed graph. Then we just need to apply the Kakutani fixed-point theorem, and argue that there is a neighborhood of points around the fixed point where it is verifiable that only a tiny amount of money could be earned, and we'll have our proof-length inductor and can start analyzing which nice properties it gets.

Definitions To State Fixpoint Proof:

Let $A$ be the set of all finite batches of sentences specified in the supertrader trade. This is basically "all the counterfactuals specified by a share someone purchased".

Let $S^{*}$ be the (infinite) set of sentences specified by: taking all the sentences specified in the supertrader trade (whether as part of some counterfactual $A$ , or as a sentence named in a bet), breaking them down into atomic sentences (see page 12 in the logical induction trade), and making all possible boolean combinations of them. So if a sentence is a boolean combination of atomic sentences that all show up as a subformula of some sentence that some trader bet on, or as a subformula of some sentence in $⋃ A$ , it's in $S^{*}$ .

Let $Ξ$ be the set of all functions of type $A \times S^{*} \to [0, 1]$ that fulfills the four axioms of unitarity, subadditivity, law of excluded middle, and propositional monotonicity. The market prices are in this set.

Let $Λ$ be the set of all functions $A \times S^{*} \to [0, 1]$ that fulfill the four axioms of unitarity, subadditivity, law of excluded middle, and propositional equivalence. The worlds we are defending against are the subset of this space which obeys the proof-length bounds that have been discovered so far.

$T$ is the continuous function that corresponds to the supertrader, a function of type $Ξ \to R_{\geq 0}^{A \times S^{*}}$ , which gives the number of shares purchased for a particular $A, ϕ$ pair, which is non-negative (because selling shares of $ϕ$ is exactly the same as buying shares of $\neg ϕ$ )

Now we can finally state our desired theorem.

Theorem 1:

$\exists P \in Ξ : \forall V \in Λ : \sum_{A, ϕ} T (P)_{A, ϕ} (P_{A} (ϕ) - V_{A} (ϕ)) \leq 0$

Or, in other words, there exists a set of prices which fulfills the four axioms, which ensures that in all possible worlds, the supertrader doesn't gain any money. Due to the continuity of $T$ , a tiny change in $P$ results in the sum being bounded above by $ϵ$ , so we can find such a $P$ by brute-force search.

Time for the proof.

Definitions For Proof Of Theorem 1:

Let $S$ be the set of all atomic sentences specified in the supertrader trade (whether as part of some counterfactual $A$ , or as a sentence named in a bet), produced by breaking them down into atomic sentences. So if an atomic sentence shows up as a subformula of some sentence that some trader bet on, or as a subformula of some sentence in $⋃ A$ , it's in $S$ . This set is finite. Or you can consider it as the set of all atomic sentences that appear in some sentence in $S^{*}$ . As a toy example, if there's just a single bet on $(A, ϕ)$ , it'd be all atomic sentences that appear as part of a sentence in $A$ , or as a part of $ϕ$ .

(yes, we used $S$ before to denote a different set while talking about worst-case world scoring. Sorry about that. We'll be consistent from this point on.)

$P$ is used to denote the powerset. $P (S)$ is the set of "worlds" (assignments of each statement under consideration to true or false). $P (P (S))$ is the set of all subsets of worlds.

$L$ is the finite lattice produced by ordering $P (P (S))$ by set inclusion. In this lattice, inf corresponds to set intersection, sup corresponds to set union.

$f$ is the surjective function of type $S^{*} \to L$ , given by mapping a boolean to the set of worlds where it's true.

$g$ is the function of type $A \to S^{*}$ given by turning the collection of sentences which specify a counterfactual into a single sentence via ordering the sentences somehow and boolean and-ing them all together. Composing $f$ and $g$ lets you turn a collection of sentences defining a counterfactual into a single node in the powerset lattice $L$ .

$Ξ^{'}$ is the set of all functions $A \times L \to [0, 1]$ which fulfill the following four axioms, which are the analogue of the four defining axioms for $Ξ$ , but in the powerset lattice:

1: Unitarity: $\forall A \in A : V_{A} (f (g (A))) = 1$

2: Subadditivity: $\forall A \in A, E, F \in L : V_{A} (E) + V_{A} (F) \geq V_{A} (E \cup F)$

3: Law of excluded middle: $\forall A \in A, E \in L : V_{A} (E) + V_{A} (P (S) / E) = 1$

4: Monotonicity: $\forall A \in A, E, F \in L : f (g (A)) \cap E \subseteq f (g (A)) \cap F \to V_{A} (E) \leq V_{A} (F)$

Let $t$ be the function $Ξ^{'} \to Ξ$ s.t. $t (V)_{A} (ϕ) = V_{A} (f (ϕ))$ . This translates a valuation over the powerset lattice to a valuation over sentences, because each sentence denotes a node in the powerset lattice via $f$ , and $f$ is surjective. The defining axioms for $Ξ^{'}$ carry over to imply the defining axioms for $Ξ$ .

Let $r$ be the function of type $R_{\geq 0}^{A \times S^{*}} \to Δ (A \times L)$ defined by $r (\to T) (A, E) = \frac{\sum_{ϕ \in f^{- 1} (E)} {\to T}_{A, ϕ}}{\sum_{A^{'}, ϕ} {\to T}_{A^{'}, ϕ}}$ .

What this does is renormalize a trade of a supertrader into a probability distribution over $(A, E)$ pairs. Admittedly, the renormalization isn't well-defined in general, because the sum in the numerator may be infinite, or the denominator may be 0. For supertrader trades, the sum in the numerator won't be infinite because only finitely many sentences in $S^{*}$ can be bet on. And to prevent the denominator from being 0, we can just add a trader with a mass of $ϵ$ that purchases 1 share of $(⊤, ⊤)$ . This always results in 0 net value no matter what, by unitarity, so the trader preserves its mass and can do the same trick the next turn, ensuring that the renormalization is always well-defined.

Let $F$ be our function from $Ξ^{'} \to P (Ξ^{'})$ that we'll apply the Kakutani fixed-point theorem to, defined by:

$F (V) := {argmin}_{P^{'} \in Ξ^{'}} (E_{A, E \sim r (T (t (V)))} P_{A}^{'} (E))$

Or, in other words, it maps a potential pricing $V$ to the set of pricings which minimize the expected value. $V$ is converted to a pricing over sentences via $t$ , and then evaluated by the supertrader $T$ , and then $r$ is used to convert the supertrader trade to a probability distribution, which gives the weights for the sum.

Now, we have four lemmas, and we'll prove the first 3 now, and the fourth later.

Lemma 1: $Ξ^{'}$ is a nonempty, compact, convex subset of the finite-dimensional boolean hypercube $[0, 1]^{A \times L}$

Lemma 2: For all $V \in Ξ^{'}, F (V)$ is a nonempty, compact, convex subset of $Ξ^{'}$ .

Lemma 3: $F$ has the closed-graph property.

Lemma 4: $\forall V \in Λ : \exists V \in Ξ^{'} : \forall A, ϕ \in A \times S^{*} : V_{A} (f (ϕ)) \leq V_{A} (ϕ)$

Proof of Theorem 1:

By Lemmas 1, 2, 3 and the Kakutani fixed-point theorem, $\exists P^{'} : P^{'} \in F (P^{'})$ . This is equivalent to $P^{'} \in {argmin}_{V \in Ξ^{'}} (E_{A, E \sim r (T (t (P^{'})))} V_{A} (E))$ . Therefore, for all $V \in Ξ^{'}$ ,

$E_{A, E \sim r (T (t (P^{'})))} P_{A}^{'} (E) \leq E_{A, E \sim r (T (t (P^{'})))} V_{A} (E)$

Then, by linearity of expectation, we can get that for all $V \in Ξ^{'}$

$E_{A, E \sim r (T (t (P^{'})))} (P_{A}^{'} (E) - V_{A} (E)) \leq 0$

By multiplying both sides by $\sum_{A, ϕ} T (t (P^{'}))_{A, ϕ}$ , and referring back to the definition of $r$ , we get that for all $V$

$\sum_{A, E} (\sum_{ϕ \in f^{- 1} (E)} T (t (P^{'}))_{A, ϕ}) (P_{A}^{'} (E) - V_{A} (E)) \leq 0$

And then by the definition of $t$ , this can be rewritten as:

$\sum_{A, ϕ} T (t (P^{'}))_{A, ϕ} (t (P^{'})_{A} (ϕ) - t (V)_{A} (ϕ)) \leq 0$

Now, by applying the definition of $t$ again, and defining $P := t (P^{'})$

$\exists P \in Ξ : \forall V \in Ξ^{'} : \sum_{A, ϕ} T (P)_{A, ϕ} (P_{A} (ϕ) - V_{A} (f (ϕ))) \leq 0$

And by applying Lemma 4, that any $V \in Λ$ has a $V \in Ξ^{'}$ that attains lower value (which can only increase the value of the sum), we get

$\exists P \in Ξ : \forall V \in Λ : \sum_{A, ϕ} T (P)_{A, ϕ} (P_{A} (ϕ) - V_{A} (ϕ)) \leq 0$

and Theorem 1 is proved. Time to take care of showing the lemmas.

Proof of Lemma 1:

Obviously, $Ξ^{'}$ is nonempty, because taking a probability distribution over $P (S)$ , conditioning on $A$ , and extending it to a valuation on $L$ fulfills all 4 defining properties. It's convex because any mixture of elements fulfilling the defining axioms of $Ξ^{'}$ also fulfills the defining axioms. All defining equations use $\geq$ or $=$ , so the resulting set is closed, and since it's bounded by being a subset of a finite-dimensional hypercube, it's compact as well.

Proof of Lemma 2:

Continuous functions from compact spaces to the nonnegative reals have a closed and nonempty argmin set, and by the solution set being a subset of $Ξ^{'}$ which is compact, nonemptiness and compactness of $F (V)$ has been shown. As for convexity, observe that linearity of expectation implies that you can take a probabilistic mixture of any two points in the argmin set and it will preserve expected value, so the argmin set is convex.

Proof of Lemma 3:

For notational convenience, abbreviate $r (T (t (V)))$ as $h (V)$ . Equip $Ξ^{'}$ with the sup norm on the hypercube it is a subset of. A perturbation on $V$ of size $δ$ leads to a perturbation of the price features by at most $δ$ , and by the continuity of the supertrader trade, induces at most a $δ^{'}$ change in the sum of the coefficients for the supertrader trade, and because said sum of coefficients is uniformly bounded away from 0 by that one trader that just makes trivial bets, renormalizing leads to at most a $\frac{ϵ}{3}$ change in the probability distribution $h (V)$ . By letting $δ$ be sufficiently small, this shows that if $V^{'}$ differs from $V$ by less than $δ$ , then $d_{t v} (h (V), h (V^{'})) < \frac{ϵ}{3}$ . Also, under this norm, if $P$ differs from $P^{'}$ by less than $\frac{ϵ}{3}$ , then for any probability distribution $μ \in Δ (A \times L)$ , then $| E_{μ} P - E_{μ} P^{'} | < \frac{ϵ}{3}$ .

Fix two sequences of valuations ${P_{n}}$ , ${V_{n}}$ , which limit to $P_{\infty}$ and $V_{\infty}$ , where for all $n$ , $P_{n} \in F (V_{n})$ . To show closed graph (ie, that $P_{\infty} \in F (V_{\infty})$ ), by the definition of $F$ , we need to show that there is no $P^{'}$ and $ϵ > 0$ where $E_{h (V_{\infty})} P_{\infty} - E_{h (V_{\infty})} P^{'} > ϵ$ .

Assume the opposite, that such a $P^{'}$ and $ϵ$ exist.

Because of the limiting sequences, there is some $n_{0}$ where, for all greater $n$ , the distance between $V_{\infty}$ and $V_{n}$ is below $δ$ , and the distance between $P_{\infty}$ and $P_{n}$ is below $\frac{ϵ}{3}$ .

$d_{t v} (h (V), h (V^{'})) < \frac{ϵ}{3}$ implies that for all $P$ , $| E_{h (V)} P - E_{h (V^{'})} P | < \frac{ϵ}{3}$ .

Since $V_{n_{0}}$ and $V_{\infty}$ differ by less than $δ$ , $d_{t v} (h (V_{n_{0}}), h (V_{\infty})) < \frac{ϵ}{3}$ .

So, $E_{h (V_{\infty})} P_{\infty} - E_{h (V_{\infty})} P^{'} > ϵ$ (by assumption) and $| E_{h (V_{n_{0}})} P_{\infty} - E_{h (V_{\infty})} P_{\infty} | < \frac{ϵ}{3}$ and $| E_{h (V_{n_{0}})} P^{'} - E_{h (V_{\infty})} P^{'} | < \frac{ϵ}{3}$ , which implies $E_{h (V_{n_{0}})} P_{\infty} - E_{h (V_{n_{0}})} P^{'} > \frac{ϵ}{3}$

Finally, because $P_{\infty}$ and $P_{n_{0}}$ differ by less than $\frac{ϵ}{3}$ , $| E_{h (V_{n_{0}})} P_{\infty} - E_{h (V_{n_{0}})} P_{n_{0}} | < \frac{ϵ}{3}$ , we get $E_{h (V_{n_{0}})} P_{n_{0}} - E_{h (V_{n_{0}})} P^{'} > 0$ , which is impossible by $P_{n_{0}} \in F (V_{n_{0}})$ , so $P_{n_{0}}$ has the minimal possible expected value.

Thus, we have a contradiction and our original assumption was wrong, so $P_{\infty} \in F (V_{\infty})$ , and closed-graph for $F$ has been shown.

The proof of Lemma 4 will be deferred to the next post.

[-]Pattern4y10

PA(A)=1

Should PA(X) be thought of as P(X | A)? (Note this is conditional pricing, not conditional probability.)

4: Propositional Monotonicity: ∀A,ϕ,ψ:A,ϕ⊢pcψ→PA(ψ)≥PA(ϕ)

Given a list of statements, their length, and whether they prove each other, can their prices all be determined?

[-]Diffractor4y10

Yup! The subscript is the counterfactual we're working in, so you can think of it as a sort of conditional pricing.

The prices aren't necessarily unique, we set them anew on each turn, and there may be multiple valid prices for each turn. Basically, the prices are just set so that the supertrader doesn't earn money in any of the "possible" worlds that we might be in. Monotonicity is just "the price of a set of possibilities is greater than the price of a subset of possibilities"

LESSWRONG
LW

Counterfactual Induction (Algorithm Sketch, Fixpoint proof)

5

Ω 3

5

Ω 3