Counterfactual Induction (Algorithm Sketch, Fixpoint proof)

by Diffractor 7 min read17th Dec 20192 comments

5

Ω 2


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

So, to begin with, here's how the algorithm works.

The upstream algorithm iterates through all proofs, and records the lengths of all proofs of the form "a finite collection of sentences implies ". Also the proof length accounting is set up such that if and , then . Also, as soon as it checks all proofs of length or shorter from +propositional tautology given , with no contradictions found, it reports that is contradiction-free for at least steps. (this doesn't require searching an infinite set because the number of propositional tautologies with length shorter than is finite)

With being the set of all math sentences, and being the set of all finite subsets of sentences that are propositionally consistent, the market is a partial of type , which fulfills the following four axioms.

(note: appearing where a sentence would normally go refers to the collection of statements in expressed as one big boolean and-statement, and means that given , is provable using only the rules of inference for propositional calculus)

1: Unitarity.

2: Subadditivity.

3: Law of Excluded Middle.

4: Propositional Monotonicity:

We will also consider a fifth axiom, propositional equivalence, which is implied by axiom 4.

5: Propositional Equivalence:

The market fulfills axioms 1-4, the worlds we are defending against fulfill axioms 1-3 and 5. Axioms 1, 3, and 5 suffice to show the empty set property that , so that one comes for free and doesn't need to be specified.

Traders are poly-time algorithms that output a continuous circuit that takes as input the pricing and output a nonnegative number for each pair of the form . This should be interpreted as a bet against in the counterfactual, and has a payoff of . Due to law of excluded middle for both and , selling these statements is exactly equivalent to betting against .

The budgeting of the traders works the same way as usual, although dealing with finding the worst-case value for a trader's trades takes a bit more care since the space of allowable valuations isn't discrete. Also, the proof that the supertrader exploits if any component does goes through in the same way.

Let's show that finding the worst-case valuation is computable. We can split the trader's pile of shares into a bunch of subpiles, one for each , and minimize these individually. For each A-pile, there's a finite collection of sentences that have been bet on. Then, define the following subset of . The set of valuations that fulfill unitarity, subadditivity, known proof length bounds, law of excluded middle, and propositional equivalence (assumed to exist by previous post). Since the proof searcher reports the lengths of proofs it has found, as well as when it has verified that no disproofs exist below a certain length, this imposes rational number upper bounds on how much to value a sentence at. The set of valuations that fulfill all the constraints (again, assuming existence) is convex, compact, and defined by finitely many inequalities and equalities. The equation giving the wealth of the trader in a valuation, is linear because the prices just turn into constants, so it's just a "minimize this linear function over this definable convex set" problem, which there are well-known algorithms for. Do this for all and sum up the answers to get a worst-case value for the trader.

Now, finally, we need to show that given any bet from the supertrader, there exists a nonempty compact convex set of prices that fulfill all our axioms, make the bet unwinnable, and the function from prices to sets of prices has closed graph. Then we just need to apply the Kakutani fixed-point theorem, and argue that there is a neighborhood of points around the fixed point where it is verifiable that only a tiny amount of money could be earned, and we'll have our proof-length inductor and can start analyzing which nice properties it gets.

Definitions To State Fixpoint Proof:

Let be the set of all finite batches of sentences specified in the supertrader trade. This is basically "all the counterfactuals specified by a share someone purchased".

Let be the (infinite) set of sentences specified by: taking all the sentences specified in the supertrader trade (whether as part of some counterfactual , or as a sentence named in a bet), breaking them down into atomic sentences (see page 12 in the logical induction trade), and making all possible boolean combinations of them. So if a sentence is a boolean combination of atomic sentences that all show up as a subformula of some sentence that some trader bet on, or as a subformula of some sentence in , it's in .

Let be the set of all functions of type that fulfills the four axioms of unitarity, subadditivity, law of excluded middle, and propositional monotonicity. The market prices are in this set.

Let be the set of all functions that fulfill the four axioms of unitarity, subadditivity, law of excluded middle, and propositional equivalence. The worlds we are defending against are the subset of this space which obeys the proof-length bounds that have been discovered so far.

is the continuous function that corresponds to the supertrader, a function of type , which gives the number of shares purchased for a particular pair, which is non-negative (because selling shares of is exactly the same as buying shares of )

Now we can finally state our desired theorem.

Theorem 1:

Or, in other words, there exists a set of prices which fulfills the four axioms, which ensures that in all possible worlds, the supertrader doesn't gain any money. Due to the continuity of , a tiny change in results in the sum being bounded above by , so we can find such a by brute-force search.

Time for the proof.

Definitions For Proof Of Theorem 1:

Let be the set of all atomic sentences specified in the supertrader trade (whether as part of some counterfactual , or as a sentence named in a bet), produced by breaking them down into atomic sentences. So if an atomic sentence shows up as a subformula of some sentence that some trader bet on, or as a subformula of some sentence in , it's in . This set is finite. Or you can consider it as the set of all atomic sentences that appear in some sentence in . As a toy example, if there's just a single bet on , it'd be all atomic sentences that appear as part of a sentence in , or as a part of .

(yes, we used before to denote a different set while talking about worst-case world scoring. Sorry about that. We'll be consistent from this point on.)

is used to denote the powerset. is the set of "worlds" (assignments of each statement under consideration to true or false). is the set of all subsets of worlds.

is the finite lattice produced by ordering by set inclusion. In this lattice, inf corresponds to set intersection, sup corresponds to set union.

is the surjective function of type , given by mapping a boolean to the set of worlds where it's true.

is the function of type given by turning the collection of sentences which specify a counterfactual into a single sentence via ordering the sentences somehow and boolean and-ing them all together. Composing and lets you turn a collection of sentences defining a counterfactual into a single node in the powerset lattice .

is the set of all functions which fulfill the following four axioms, which are the analogue of the four defining axioms for , but in the powerset lattice:

1: Unitarity:

2: Subadditivity:

3: Law of excluded middle:

4: Monotonicity:

Let be the function s.t. . This translates a valuation over the powerset lattice to a valuation over sentences, because each sentence denotes a node in the powerset lattice via , and is surjective. The defining axioms for carry over to imply the defining axioms for .

Let be the function of type defined by .

What this does is renormalize a trade of a supertrader into a probability distribution over pairs. Admittedly, the renormalization isn't well-defined in general, because the sum in the numerator may be infinite, or the denominator may be 0. For supertrader trades, the sum in the numerator won't be infinite because only finitely many sentences in can be bet on. And to prevent the denominator from being 0, we can just add a trader with a mass of that purchases 1 share of . This always results in 0 net value no matter what, by unitarity, so the trader preserves its mass and can do the same trick the next turn, ensuring that the renormalization is always well-defined.

Let be our function from that we'll apply the Kakutani fixed-point theorem to, defined by:

Or, in other words, it maps a potential pricing to the set of pricings which minimize the expected value. is converted to a pricing over sentences via , and then evaluated by the supertrader , and then is used to convert the supertrader trade to a probability distribution, which gives the weights for the sum.

Now, we have four lemmas, and we'll prove the first 3 now, and the fourth later.

Lemma 1: is a nonempty, compact, convex subset of the finite-dimensional boolean hypercube

Lemma 2: For all is a nonempty, compact, convex subset of .

Lemma 3: has the closed-graph property.

Lemma 4:

Proof of Theorem 1:

By Lemmas 1, 2, 3 and the Kakutani fixed-point theorem, . This is equivalent to . Therefore, for all ,

Then, by linearity of expectation, we can get that for all

By multiplying both sides by , and referring back to the definition of , we get that for all

And then by the definition of , this can be rewritten as:

Now, by applying the definition of again, and defining

And by applying Lemma 4, that any has a that attains lower value (which can only increase the value of the sum), we get

and Theorem 1 is proved. Time to take care of showing the lemmas.

Proof of Lemma 1:

Obviously, is nonempty, because taking a probability distribution over , conditioning on , and extending it to a valuation on fulfills all 4 defining properties. It's convex because any mixture of elements fulfilling the defining axioms of also fulfills the defining axioms. All defining equations use or , so the resulting set is closed, and since it's bounded by being a subset of a finite-dimensional hypercube, it's compact as well.

Proof of Lemma 2:

Continuous functions from compact spaces to the nonnegative reals have a closed and nonempty argmin set, and by the solution set being a subset of which is compact, nonemptiness and compactness of has been shown. As for convexity, observe that linearity of expectation implies that you can take a probabilistic mixture of any two points in the argmin set and it will preserve expected value, so the argmin set is convex.

Proof of Lemma 3:

For notational convenience, abbreviate as . Equip with the sup norm on the hypercube it is a subset of. A perturbation on of size leads to a perturbation of the price features by at most , and by the continuity of the supertrader trade, induces at most a change in the sum of the coefficients for the supertrader trade, and because said sum of coefficients is uniformly bounded away from 0 by that one trader that just makes trivial bets, renormalizing leads to at most a change in the probability distribution . By letting be sufficiently small, this shows that if differs from by less than , then . Also, under this norm, if differs from by less than , then for any probability distribution , then .

Fix two sequences of valuations , , which limit to and , where for all , . To show closed graph (ie, that ), by the definition of , we need to show that there is no and where .

Assume the opposite, that such a and exist.

Because of the limiting sequences, there is some where, for all greater , the distance between and is below , and the distance between and is below .

implies that for all , .

Since and differ by less than , .

So, (by assumption) and and , which implies

Finally, because and differ by less than , , we get , which is impossible by , so has the minimal possible expected value.

Thus, we have a contradiction and our original assumption was wrong, so , and closed-graph for has been shown.

The proof of Lemma 4 will be deferred to the next post.



5

Ω 2