Bounded Oracle Induction

byDiffractor 9mo28th Nov 2018No comments

29

Ω 12


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Introduction:

Because logical induction relies on the Brouwer fixed-point theorem, and reflective oracles rely on the Kakutani fixed-point theorem which Brouwer is a special case of, it's possible that logical induction could have been derived for the first time from reflective oracles. Attempting to do this in the most obvious way by having traders output a circuit that takes a binary-search approximation of the market as input doesn't produce any insights in particular. However, by attempting to redevelop the logical induction algorithm from scratch with the aid of a bounded reflective oracle, we arrive at a new way of looking at logical induction, with the following interesting features.

1: It collapses the distinction between the algorithm outputting the trading circuit, and the trading circuit itself.

2: All trades can be naturally interpreted as probability measures over bitstrings, with the reward given by a simple betting game, instead of shares that pay off if some boolean combination is true. However, the betting game doesn't incentivize true belief reporting, just moving the probability distribution of the market in the right direction. This turns out to be isomorphic to the original formulation of the value of a trade, subject to one extra restriction on worst-case trade value.

3: The market prices are also a probability measure over bitstrings/worlds, exactly like a universal inductor.

4: It does the reflective-solomonoff thing of "draw a turing machine with probability , use it to predict future bits"

First, notation will be established, and the algorithm will be discussed. Then the connection between OI-trader scoring and LI-trader scoring will be established, and questions of how many of the nice properties of LI carry over will be discussed. We will finish with basic discussion of what this result means, and then there will be an appendix which contains the definitions of auxiliary algorithms, and the next post will contain the proof that the algorithm is an Oracle Inductor.

As a quick refresher, a bounded reflective oracle takes as input a query of the form , and returns if has bounded runtime, and only makes oracle calls to algorithms with the same runtime bound, and outputs with probability greater than , and if has bounded runtime and only makes oracle calls to algorithms with the same runtime bound, and outputs with probability less than . If the probability of a is exactly , or if exceeds runtime bounds, the oracle is allowed to randomize arbitrarily.

Notation:

and are the sets of all bitstrings of length , and the set of all finite bitstrings, respectively. Elements of these sets are denoted by . refers to the empty string.

is the set of all satisfiable booleans with all variables having an index , and is the set of all satisfiable booleans. is the set of all booleans (including unsatisfiable ones), and is the set of all booleans with all variables having an index . Elements of these sets are denoted by . is the trivial boolean which is satisfied by all bitstrings. Note that a boolean induces a function .

is a function that is time-constructible, monotonically increasing, and . It will give the runtime bound for the traders.

is some function upper-bounded by , that is time-constructible, monotonically increasing, and . It gives the most distant bit the oracle inductor thinks about on turn .

is the function given by , giving the number of iterations of binary-search deployed on turn .

is the function given by . It gives the proportion of the inductor distribution that is made up by the uniform distribution, to ensure that all bitstrings have a probability high enough for binary-search to accurately approximate their probability.

consults the bounded reflective oracle about whether returns with probability greater than .

rounds down to if it is greater than , and then flips a coin that returns with probability . In our model of computation, this operation takes unit time.

The OI Algorithm:

In short, the distribution induced by the oracle induction algorithm has a small portion of the probability mass composed of the uniform distribution, and otherwise the algorithm selects a turing machine according to the universal distribution, and a "budget", with of the probability mass on a budget of , much like the logical inductor algorithm. In this setting, all trades can be interpreted as a mixture of probability distributions of the form "condition the oracle induction distribution on some boolean being true", so the budgeted trader is consulted to get a boolean (note that the trader may be randomized!), and is used to approximately replicate the probability distribution produced by conditioning the oracle induction distribution on the resulting boolean.

This starts with the empty bitstring and repeatedly concantates it with which gets the next bit, until a bitstring of length is produced. , and are just passed on to .

This defines a boolean constraint that says that must be true, and the initial prefix of the bitstring must equal . Then two more boolean constraints are generated, which are the same except they specify that the next bit is a or . If only one of the two is a satisfiable boolean, the next bit is forced to be or in compliance with the satisfiability requirement, otherwise, binary search for iterations on the probability of outputting a bitstring that satisfies or is used to figure out the probability of the next bit being a .

This uses the oracle to implement binary-search for rounds on some algorithm, and output either a lower-bound, average, or upper-bound estimate of the probability of the algorithm outputting .

This uses the oracle to test whether the trader is possibly over-budget, and if so, returns the null boolean, otherwise, it returns the boolean that the trader returns. randomly selects a day and a world/bitstring, and returns if the world/bitstring hasn't been ruled out by that day and the trader is over-budget relative to that world/bitstring, so the oracle call is testing whether there's any combination of worlds and days where the trader is over-budget.

This randomly picks a day and bitstring, and returns if the bitstring is plausible (hasn't been ruled out by the deductive process) on that day, and the trader might be over-budget. Note that the strength of the approximation rises as a function of . This means that at later days, more accurate retroactive estimates are used for the value of a trade on previous days. The mess of parentheses in the numerator essentially clips the bitstring to the appropriate length, and uses binary search to find the probability that (an approximation of (the distribution produced by conditioning the OI distribution on (the boolean outputted by the trader))) assigns to the bitstring.

This takes a trader, and returns the null constraint if it times out or makes an "out of bounds" oracle call or outputs an unsatisfiable boolean. Otherwise, it takes the bitstring the trader outputs, interprets it as some boolean constraint, clips the constraint to the appropriate length, and outputs that constraint. Note that because algorithms can be randomized, this won't necessarily output the same boolean every time, so the distribution produced by should actually be thought of as a probabilistic mixture of conditional distributions.

To put all this together, the inductor selects a turing machine and a budget with the appropriate probability, and queries the turing machine about what boolean combination it thinks the true sequence of bits fulfills. As the turing machine can only report one boolean combination, mixtures of various booleans are implemented by the turing machine randomizing. If the past history of the turing machine has it possibly going over-budget on the next "trade", or violates conditions by running over time, or making illegal oracle calls, or providing an impossible-to-fulfill boolean, then the market just defaults to asking (an approximation of) itself about its own probabilities. Otherwise, the market outputs (an approximation of) its own probability distribution, conditioned on being in conformance with the trader. A bounded reflective oracle is exploited as an NP-oracle, and to effectively narrow down the probability of itself outputting a specific bitstring.

Interpretation of a Trade:

The interpretation of a trade from the logical induction paper was that you'd lose dollars in order to acquire a share that would be worth dollar in worlds where ,and dollars if .

The interpretation of a trade in this setting is that a trader and a market spread dollar amongst various bitstrings/worlds ,(which will be lost), giving a probability measure, and if the world is revealed to be , the trader earns dollars in return. (where is the probability the trader assigned to , and is the probability the market assigned to ). The value of a trader at time in world is then .

Surprisingly enough, these two interpretations are actually equivalent! We can take (some) LI trades, and convert them into an OI trade with the same value in all worlds, and also do the reverse.

First, let's say our OI trader spreads its dollar according to (it copies the market probability distribution, conditional on the world fulfilling the constraint ). Then, in all worlds that don't fulfill the boolean, it will lose dollar because it assigned measure to , and in all that fulfill the boolean, because , it gets dollars in return. If we multiply these two values by , we get dollars when it fails, and dollars when it succeeds, which is the exact same as a logical-induction trader buying one share in . So, the value of the OI trader outputting the distribution has the exact same value in all worlds as buying shares of , which is the same as spending dollar buying shares of .

We interpret a randomized choice of a boolean as the trader spending ( is an ordinary probability, not the market price of anything)of the probability mass on the conditional distribution . This sums up to produce a probability measure, where .

Generalizing a bit, all OI trades are equivalent to a LI trade where it spends dollars buying shares of the boolean corresponding to , and spends dollar in total.

Also, if the trader outputs the null boolean, , so the value of that trade in all worlds is 0 and equivalent to not buying or selling anything. This can be used to have the equivalent LI trade spend less than dollar buying shares, if some portion of is composed of .

Going in the reverse direction, from a LI trade to a OI trade, is more difficult, because there is both buying and selling of shares. From before, buying share of is equivalent to the OI trader distributing of its probability mass according to the distribution. It's slightly more involved to show, but selling share of has equivalent value in all worlds as distributing of the OI trader mass on the distribution.

Now, if, at the end of translating the LI trade into OI terms, less than all of the probability mass has been spent on various conditional distributions, the rest of the probability mass can be spent on copying since it has value in all worlds. But what if the measure of the resulting trade sums to greater than ? In that case, the extreme-worst-case value of the LI trade (all purchased shares were worth , all sold shares were worth )is below .

Therefore, if there's a LI trader with a circuit that's evaluatable in polynomial time (not just writable in poly-time), and it never makes a trade with extreme worst-case value below (ie, it doesn't blow too much money at any given time), there's a corresponding OI trader! This includes almost all the traders from the proofs of the logical induction paper. However, the traders from 4.7.2 (conditionals on theories) and 4.5.10 (affine unbiasedness from feedback) need to be adapted.

Conjecture 1: There is an LI trader with extreme-worst-case value on each turn of that exploits the market if it violates Affine Unbiasedness from Feedback, and another such trader that exploits the market if the market conditional on is exploitable by a trader with extreme-worst-case value on each turn of .

If this conjecture holds, oracle induction would inherit all the nice properties of logical induction.

Exploitation is defined in the standard way, as the set of plausible values according to worlds consistent with the deductive process at time , over all , being bounded below and unbounded above.

Note that if we consider the market as composed of all the traders probability-distributions aggregated together, the payoff to everybody corresponds to taking everyone's money, and distributing that money back to everyone according to the fraction of the probability-mass in the winning world that was contributed from their distribution.

Also note that because the OI-traders can implement very long and complex combinations of distributions by randomizing, OI-traders are able to make some trades that LI-traders can't, because they can output trades that are too long and complicated for the LI-trader to write down in polynomial time. An OI-trader, converted to an LI-trade, may even have purchases in every single boolean, which LI-traders definitely can't replicate.

Conjecture 2: There is an LI trader that runs in poly-time, and an Oracle Inductor that is inexploitable by poly-time adversaries, such that the trader exploits the Oracle Inductor, and there is an OI trader that runs in poly-time and a Logical Inductor that is inexploitable by poly-time adversaries, such that the trader exploits the Logical Inductor.

Facts About OI:

Just like logical inductors, there is no trader that runs in time less than that exploits the market. This is shown by the following theorems which are analogous to the theorems establishing that the logical induction algorithm is inexploitable.

Theorem 1: If a trader that can be simulated on a UTM in less than time exploits , there is some finite budget such that the budgeted trader exploits .

Theorem 2: If a budgeted trader exploits , the supertrader exploits .

Theorem 3: The supertrader doesn't exploit .

The proofs will be deferred to the next post.

As for the strength of the bounded reflective oracle needed to guarantee that all oracle calls are well-defined, it is . Again, the proof will be deferred to the next post.

Future Directions:

The two conjectures would be interesting to settle, although only the first is truly critical to showing that this is as powerful as a logical inductor.

The interpretation of trades as probability measures over bitstrings, with payoff given by the proportion of probability mass in the winning string contributed by a trader is useful, although the lack of an incentive for accurate belief reporting is slightly worrying, and prevents us from truly attributing beliefs to individual traders.

The close parallel between this and Reflective-Oracle Solomonoff Induction may prove to be fruitful, and potentially lead to a variant of AIXI in this setting, which may be ported back to logical induction.

APPENDIX: Auxiliary Algorithms

In our model of computation, assume that this takes unit time.

This generates a random bitstring of length .

This randomly selects an integer in , with equal probability for each integer.

This is used to ensure that the traders output boolean constraints that aren't about unreasonably distant digits.

This applies a boolean circuit to a bitstring, and pads it with randomly selected bits if the bitstring is too short. Note that since traders are allowed to call the oracle on this function applied to the oracle inductor, this implicitly assigns 50% probability to all bits that are too distant for the oracle inductor to have thought about them yet.

This converts a bitstring to a boolean which requires that the prefix of a bitstring equals . The equality should be understood as if , and if .

This uses the oracle as a SAT-solver, by using to randomly generate a bitstring of length . If there is a bitstring which fulfills the boolean, there is a probability of generating that bitstring, so the oracle can perfectly discriminate between satisfiable and unsatisfiable booleans.

This just turns a bitstring into the boolean it encodes, given some efficient encoding scheme.

This is the deductive process, a blackbox deterministic algorithm which outputs booleans of index at most in at most time s.t (ie, the constraints on the true environment get more stringent over time but always stay satisfiable).

29

Ω 12