Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Mentioned in

An Untrollable Mathematician

4SamEisenstat

New Comment

I at first didn't understand your argument for claim (2), so I wrote an alternate proof that's a bit more obvious/careful. I now see why it works, but I'll give my version below for anyone interested. In any case, what you really mean is the probability of deciding a sentence outside of *by having it announced by nature*; there may be a high probability of sentences being decided indirectly via sentences in .

Instead of choosing as you describe, pick so that the probability of sampling something in is greater than . Then, the probability of sampling something in is at least . Hence, no matter what sentences have been decided already, the probability that repeatedly sampling from selects before it selects any sentence outside of is at least

as desired.

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering!

Oh, I didn't notice that aspect of things. That's pretty cool.

Follow-up to All Mathematicians are Trollable.

It is relatively easy to see that no computable Bayesian prior on logic can converge to a single coherent probability distribution as we update it on logical statements. Furthermore, the non-convergence behavior is about as bad as could be: someone selecting the ordering of provable statements to update on can drive the Bayesian's beliefs arbitrarily up or down, arbitrarily many times, despite only saying true things. I called this wild non-convergence behavior "trollability". Previously, I showed that if the Bayesian updates on the

provabililyof a sentence rather than updating on the sentence itself, it is still trollable. I left open the question of whether some other side information could save us. Sam Eisenstat has closed this question, providing a simple logical prior and a way of doing a Bayesian update on it which (1) cannot be trolled, and (2) converges to a coherent distribution.Major parts of this post were cribbed from Sam's notes, with some modifications.## Construction

Set some prior on sampling individual sentences, μ(ϕ), which gives positive probability to every sentence. Now sample an infinite sequence of sentences, ϕ1,ϕ2,... by at each step rejection-sampling from μ with the requirement of propositional consistency with the sentences sampled so far. The prior probability of a sentence, P(ψ), is the probability that it appears anywhere in this sequence.

This is nothing new -- it's just the old Demski prior with propositional consistency only. However, Sam's idea is to interpret

the sequence of sentencesas the sequence of things Nature tells us / the sequence of things which is proved. (Why didn't I think of that?) Thus, in addition to the raw probability distribution over sentences, we use the probability distribution over sequence locations. Start with n=1. When a sentence ψ is proved (or otherwise observed), we perform a Bayesian update on ψ=ϕn, and increment n.This distribution can be computed to any desired accuracy ϵ in finite time. The probability of a sequence prefix ϕ1,ϕ2,...ϕn can be computed to within ϵ by computing the normalization factor for μ at each step in the sequence to within sufficient accuracy to ensure the accuracy of the whole, by enumerating sentences and checking which are propositionally consistent with the sequence so far. The joint probability of any finite set of ordering assertions ψ=ϕn and unordered sentence assertions ψ can be computed to within ϵ by enumerating the sequences of sentence selections by which these can become jointly true and by which they can become false, and calculating their probabilities with increasing accuracy, until the sum of the probability of ways they can become true and ways they can become false is less than epsilon with certainty.

## Untrollability

So long as the probability of a sentence ψ is not yet 1 or 0, it has probability at least μ(ψ), since it could be sampled next. Similarly, its probability probability is at most μ(¬ψ). Hence, it is not possible to drive the probability arbitrarily up or arbitrarily down, no matter what order we prove things in.

## Convergence

Note that P expects nature to eventually decide every proposition, in which case convergence to a single distribution is trivial; beliefs converge to 0 or 1 on every proposition. However, the posterior probabilities also converge to a single distribution even if some sentences are never updated on -- as is the case when we restrict ourselves to updating on provable sentences.

To see this, take some sentence ϕ and some ϵ>0. We want to show that for all sufficiently large n, we have |P(ψ|ϕ1,ϕ2,...ϕn)−P(ψ|ϕ1,ϕ2,...ϕm)|<ϵ for all m>n. If ψ is eventually decided, this is trivially true. Otherwise, let Phi be a large finite collection of sentences which will never be decided by the environment, chosen so that the probability of sampling a sentence that will never be decided but that is outside of Φ

beforesampling psi or ¬ψ (as we continue adding to the sequence by sampling) is less than ϵ/4 no matter what other sentences have been decided already. Then, pick N large enough that (1) after time N, the sentences announced after time N do not announce any new logical relations between the sentences in Φ; and, (2) the probability of deciding any new sentence not in Φ before deciding ψ is less than ϵ/2. (1) is possible since there are only 2|Φ| joint assignments of truth values to sentences in Φ, so after finite time all joint assignments which will ever be ruled out have already been. (2) is possible since the probability of sampling a sentence that willneverbe decided but that is outside of Φ is already small enough, and the probability of deciding all the rest of the sentences outside of Φ only goes down as more sentences get decided, approaching zero, so that it is small enough for some N.So, for m>N, the probability that ψ is decided before any sentences outside of Φ is 1−(ϵ/2), ensuring that any dependence is less than ϵ.

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering! For any ordering of assertions, we can find an N as specified, and the joint probability distribution on Φ (and ψ) will be the same to within epsilon.

We can also see that if the environment eventually asserts every provable sentence of PA, then the limit which we converge to must be a distribution on completions of PA: if PA⊢ψ, then there is some n such that ϕn=ψ, so the posterior probability is 1 then and beyond. Similarly, although we only require propositional coherence of P, the limit will be fully coherent so long as the environment eventually asserts all logical truths (whatever else it may assert).