# Ω 2

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I’d like to credit Daniel Demski for helpful discussion.

So, the Eisenstat prior mentioned earlier seemed like a rather fundamental advance on the state of the art in logical uncertainty. Given that it was essentially a modification of the Demski prior, the obvious next question was “what’s the analogue of the modified Demski prior in this setting?”

As a quick refresher, the Demski prior sampled sentences, while the modified Demski prior sampled turing machines that would spit out a (potentially infinite) list of sentences. The key advance of the modified Demski prior over the Demski prior is that the modified Demski prior enables assigning positive probability to certain computable infinite sequences of sentences. The prototypical example was assigning positive probability to first-order PA with its infinite axiom schema, but this property also permitted the result in the logical induction paper where a turing machine could be asymptotically trusted to only output true things, even if the proof length of the statements grows ever-further beyond our ability to check.

The Demski and modified Demski prior, in their true forms, involve calling a halting oracle to determine whether a contradiction is present when a sentence/infinite sequence of sentences is added to the “sentence pool”.

A rather simple and general (though not very efficient) process for approximating the probability of an event under some prior involves specifying some sort of “prior-generating process”, and then monte-carlo sampling it a whole lot of times and seeing how many runs generated the event. In the case of the Demski and modified Demski prior, the use of a halting oracle means the prior-generating process can’t be emulated. However, since a halting oracle is limit-computable, throwing a whole lot of computing resources at consistency checking yields a better and better ability to emulate what the true version of the Demski/modified Demski priors are doing.

The key property of the Eisenstat prior that makes it -approximable instead of just limit-computable is that there’s a particular time by which you’re nearly guaranteed to run into either or . Any sort of “modified Eisenstat prior” should preserve this property.

So, without further ado, I’ll present the algorithm, go into the philosophy of why it is the way it is, handwave at a few nice properties it probably has, and point out future directions.

DEFINITIONS:

is a probability measure over sentences, s.t. . In short, you’re as likely to run into as .

is a probability measure over turing machines.

If x is a string of sentences, is the string of sentences concantated with .

is the empty string.

means y is a prefix of x (a prefix in the sense of a substring of sentences, not a prefix of x interpreted as a bitstring).

is the runtime of turing machine on input .

is the bitstring that is the output of turing machine on input .

is an arbitrary function that bounds the runtime of a turing machine in terms of the length of the input sentence string as measured in bits (so a long sentence would increase more than a short one would).

In short, we sample a turing machine, check to make sure it obeys our runtime bounds and accepts the past history of sentences as valid and hasn’t been blacklisted yet, and throw it into the pool of “possibly valid constraints”. Then we generate a sentence , test all turing machines in the pool of “possibly valid constraints” against and and blacklist every TM that times out or rejects both extensions. Then we divide the remaining turing machines into “team A” and “team B”, which favor and as the extensions, use the ratio of measures of the teams to add either or as a sentence, and blacklist all members of the losing team. Then just iterate.

In pseudocode, this is:

FOR

. . Sample a turing machine from

. . IF

. . . . Failurebool:=0

. . . . . . FOR all

. . . . . . . . IF

. . . . . . . . . . Failurebool:=1, END

. . . . IF Failurebool=0

. . . . . .

. . . . . . ELSE

. . Sample a sentence from

. . FOR all

. . . . IF

. . . . . .

. . . . . .

. .

. .

. . FOR all

. . . . IF

. . . . . .

. . . . ELSE IF

. . . . . .

. . FLIP with probability

. . . . IF heads

. . . . . .

. . . . . .

. . . . ELSE

. . . . . .

. . . . . .

Ok, so why is this the way it is?

PHILOSOPHY:

Well, we want to have a certain time by which we are pretty much guaranteed to either run into or , as this property is what permits -approximation.

Consider the standard modified Demski prior, except that we just sample one turing machine and that’s it, and also it’ll magically stop cold as soon as it hits the first sentence that’s propositionally inconsistent with all those that came before. is the probability that x will be output by this process. Note that as described, . Even in this really simple case where you don’t have to worry about inconsistency testing or multiple turing machines, there are sufficiently small where it’s impossible to -approximate these probabilities within a known-ahead-of-time time bound, because of the (much greater than ) probability of sampling a turing machine that outputs a propositionally consistent list of sentences, and also outputs ahead of whatever time bound you named.

Thus, we seem to have a fully general problem with trusting in turing machines that enumerate infinite, or even very big, lists of sentences, because we have uncertainty about whether they’ll ever output some . Is this a fundamental obstacle to having a prior that assigns positive probability to some infinite list of sentences?

In my opinion, yes. Also, I don’t think the lack of this property is fatal, and if we look at what “positive probability to an infinite list of sentences” actually buys us, we can get something just about as good.

The “positive probability to an infinite list of sentences” property seems to cash out in practice as “if we verify a bunch of sentences spat out by a particular turing machine, we should boost our probability assigned to the rest of them”. It is what allows inductively learning that something in the environment has superior computing powers, and can be trusted.

Imagine the concrete case where you see a sentence get spat out by a trusted turing machine. You go from uncertainty in it to near-certainty that it’s true. You’ve just updated on the logical fact and it boosted your probability in . But you don’t know in advance what the turing machine will output on larger inputs of n. So it sure seems like it’s possible to believe that a turing machine outputs true things, but the prior assigns 0 probability to the whole infinite list of sentences because you don’t know in advance what it will output, so you’d have to update on infinitely many things in order to assign positive probability to the whole conjunction, ie, the sequence of observations of the form “turing machine outputted ”.

You don’t actually need positive probability to infinite conjunctions to have abstract trust in the output of things with long runtime! Or at least it sure seems that way from our thought experiment.

Ok, but what sort of thing could the constraint “trust in turing machine t” correspond to? Well, maybe, in addition to a propositional consistency checker, we also have another checker that throws out all strings of theorems that contain both and . This wouldn’t take much runtime at all to implement.

Wait, put that way, propositional consistency and trust in a turing machine both seem like the same sort of thing! The original Eisenstat prior was just the one where propositional consistency was the only constraint.

Also, if we’re using the “turing machine outputs a list of sentences” thing, we may be waiting arbitrarily long, but if we instead consider turing machines that take a string of sentences and accept or reject them, and operate within a known time (both constraints we came up with are of this form), we’re back to being able to generate strings of sentences and have a known time by which we are nearly guaranteed to hit or ! So -approximation is preserved if all your (finite) set of constraints have a known runtime bound.

We want to learn all constraints. So, the obvious next step is having a pool of constraints/turing machines that is added to by random sampling, and turing machines are kicked out of it for running over time, rejecting the past history of the string of sentences (this condition suggested by Daniel Demski), or rejecting both possible extensions of a string (this is to nearly guarantee hitting or after known time).

And, by doing Bayesian updates on sentences that are more likely to be present when certain constraints are present, it’s possible to inductively learn which constraints apply to the true environment!

There is a bit of weirdness here, because in full generality, these turing machines can care about order, unlike the P.C. and turing-machine-trust constraints. Also, since the list of constraints gets added to over time, this prior expects the future to be more ordered/obey more constraints. Also, since a “pseudoconstraint” that only times out or rejects both extensions in very rare circumstances can hang around for a long time before being blacklisted, this prior has parts that have a certain constraint apply for a long time, and then it lifts.

Still to do: What happens when constraints disagree? (Daniel Demski pointed out that my initial stab at this was wrong, and I generated this next stretch as a fix)

Well, the intended use case of this is to update on a list of theorems. Image that you know that either or is the next theorem coming in, and the constraints disagree. You can split them into three teams. Team A says that is the proper extension, Team B says that is the proper extension, and Team C accepts both as valid, and we can just ignore them.

Although it steps outside the intended use case, if you imagine generating constraints way faster than the theorems come to you, it’s generically expected that your constraints will disagree on the upcoming thing because you used enough search power to find some constraints that got the entire past history right, but disagree on the upcoming sentence. If you decide between the teams by flipping a coin, then you’d get half of all problems wrong, forever. So, deciding by the probability mass of the turing machines was used instead.

If we imagine updating on the true list of theorems, the true constraints that reality follows will stick around forever. Call that team A. And define team as the turing machines that disagree with team A on turn n, and team as the union of all of them. Each turn, everyone in team dies. So the total probability mass that team can muster limits to 0 as time goes on and more and more theorems get seen, while team A’s probability mass remains constant. Thus, in the limit, the constraints on team A will be trusted and used to generate extensions that follow the “rules of reality” for increasingly long times.

HANDWAVING:

It should be possible to -approximate the probability of a sentence in knowable time.

It should be possible to implement Bayesian updates on either sentences or constraints by the same path mentioned in the original post.

In the limit of updating on theorems, all “constraints” on sets of sentences that are computable in [insert time bound here] time that apply to reality should be learned by this prior.

I’m pretty dang sure you don’t need the full strength of propositional consistency to get a coherent distribution in the limit of updating on evidence. Every propositionally inconsistent world has a disproof of the form when is true, so a sufficient (though maybe not minimal) constraint that imposes a coherent distribution would be “the string cannot contain , and , and ... and and if it does, reject it” because that should take out every propositionally inconsistent world in the limit.

Note that the above permits logical uncertainty over very complicated Booleans that would take a long time to evaluate, like hard 3SAT instances. Full propositional consistency renders logical uncertainty over those sorts of things impossible, while it very plainly exists in reality.

NEXT STEPS:

Find a hole in this thing if one is present.

How can this sort of framework be made to interact sensibly with sequence prediction? Does it already interact sensibly with sequence prediction, or do important modifications need to be made?

Actually, in general, sequence prediction and “what is true?” prediction seem quite hard to marry in this framework, because a near-guarantee that you’ll hit either or after a known amount of time doesn’t interact well with “I believe that neither or will show up in my observed list of theorems, unless an astronomical amount of time has gone by”. Some sort of resolution must be found.

What should it be named if it does turn out to be important? I was considering the “Eisenstat-Appel Prior”, or EA Prior for short, but apparently it’s bad taste to name things after yourself and someone probably has a better idea.