Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a problem which Scott Garrabrant and I were thinking about quite a bit last fall, but moved on from as other problems in logical uncertainty captured our interest. As far as I know, it remains unsolved!

Scott and I were working on the problem of probabilistic priors, or as we've started calling them now, random extensions. Suppose you have a set (possibly empty) of axioms in a logic, and you want a "reasonable" probability distribution over the rest of the sentences, which are neither proved nor disproved by your set of sentences. This probability assignment is required to be coherent in the sense given by Paul Christiano or me: they "respect the logic", namely (using Paul's version):

For any ϕ and ψ, B(ϕ)=B(ϕ∧ψ)+B(ϕ∧¬ψ).

For any tautology ϕ, B(ϕ)=1.

For any contradiction ϕ, B(ϕ)=0.

Paul proves that this is equivalent to requiring P to be derived from a probability distribution on models (so that the probability of a sentence is the probability that the sentence is true).

Denote the random extension starting with theory T as PT(ϕ). We additionally require that for ϕ∈T, B(ϕ)=1.

We also wanted these probability distributions to be approximable, among other desirable properties.

The two-update problem began as the observation that, for my prior, adding a sentence ϕ to the initial theory is quite different from performing a Bayesian update on ϕ. As our research progressed, we noticed that this seemed to be true of all the proposals we came up with. There were two distinct ways to "update" on a sentence: either do a Bayesian update to make its probability 1, or add it to the initial set of axioms which are true before the prior's construction.

Stated as a formal problem, this may look easy to "solve". For finite non-empty T, we can simply re-define PT(ϕ)=P∅(ϕ|T). (There is a complication if you try to do this for infinite T, which is explained here. ) However, this seems to miss the point. Despite re-defining things in this way, if you look at the actual specification of the priors we were considering, a kind of non-bayesian update was quite visible. The non-bayesian update doesn't go away if we take away the set of axioms (AKA the theory we're extending into a probability distribution). Rather, it is visible in how we update on proofs.

As we approximate P to increasing degree, we observe proofs of one thing or another and incorporate this into the distribution. This "incorporation" step was different depending on the prior P, but all of our options seemed to have it, and none of them looked like a Bayesian update. This "incorporation" step is a different way of understanding the meaning of the two-update problem, but harder to formalize exactly. It seems to us that this is really the same two-update problem, if you don't make the ad-hoc patch mentioned above.

To put it simply: updating on an observed proof does not look like a Bayesian update. This seems potentially concerning.

Paul Christiano hit upon the same idea in his paradox of ignorance (Section 6.1.5 of the non-omniscience paper). He presented the following scenario:

Suppose two agents are trying to determine the truth of ∀x:ϕ(x) for computable ϕ. These agents are both using the same definition of P, and the same set of axioms (we can suppose these are PA). However, one agent has a large amount of computing power with which to approximate P, and the other has relatively little. Both agents have been given a resource: a machine which computes ϕ(x) for any value of x. Unfortunately, this machine gets slower and slower the larger the value of x is asked for.

Suppose that ∀x:ϕ(x) is actually true, but that this fact is not provable in PA. Now, it naively seems that the agent with a large amount of computing power is at an advantage. However, it's not quite so clear. What that computing power does is allow P to be computed more accurately. In particular, the strong agent will have time to prove many more cases of ϕ(x) than the weak agent. This means that the strong agent cannot update on those cases of ϕ(x) when computed by the machine: the small examples already have probability 1, because the agent has proved them. To improve its knowledge, this agent must wait for large examples to slowly compute. The small agent, on the other hand, can look at many small values very quickly.

This is far from a formal proof of the paradox, and Paul suggests several possible ways out. It's possible that the approximation of P raises the probability of ∀x:ϕ(x) just as much when it computes a new case of ϕ(x) as it's raised if we do a Bayesian update on its provability (observing the machine compute it). However, this is harder than it may initially sound.

This is a problem which Scott Garrabrant and I were thinking about quite a bit last fall, but moved on from as other problems in logical uncertainty captured our interest. As far as I know, it remains unsolved!

Scott and I were working on the problem of probabilistic priors, or as we've started calling them now, random extensions. Suppose you have a set (possibly empty) of axioms in a logic, and you want a "reasonable" probability distribution over the rest of the sentences, which are neither proved nor disproved by your set of sentences. This probability assignment is required to be

coherentin the sense given by Paul Christiano or me: they "respect the logic", namely (using Paul's version):Paul proves that this is equivalent to requiring P to be derived from a probability distribution on models (so that the probability of a sentence is the probability that the sentence is true).

Denote the random extension starting with theory T as PT(ϕ). We additionally require that for ϕ∈T, B(ϕ)=1.

We also wanted these probability distributions to be approximable, among other desirable properties.

The two-update problem began as the observation that, for my prior, adding a sentence ϕ to the initial theory is quite different from performing a Bayesian update on ϕ. As our research progressed, we noticed that this seemed to be true of all the proposals we came up with. There were two distinct ways to "update" on a sentence: either do a Bayesian update to make its probability 1, or add it to the initial set of axioms which are true before the prior's construction.

Stated as a formal problem, this may look easy to "solve". For finite non-empty T, we can simply re-define PT(ϕ)=P∅(ϕ|T). (There is a complication if you try to do this for infinite T, which is explained here. ) However, this seems to miss the point. Despite re-defining things in this way, if you look at the actual

specificationof the priors we were considering, a kind of non-bayesian update was quite visible. The non-bayesian update doesn't go away if we take away the set of axioms (AKA the theory we're extending into a probability distribution). Rather, it is visible in how we update onproofs.As we approximate P to increasing degree, we observe proofs of one thing or another and incorporate this into the distribution. This "incorporation" step was different depending on the prior P, but all of our options seemed to have it, and none of them looked like a Bayesian update. This "incorporation" step is a different way of understanding the meaning of the two-update problem, but harder to formalize exactly. It seems to us that this is really the same two-update problem, if you don't make the ad-hoc patch mentioned above.

To put it simply: updating on an observed proof does not look like a Bayesian update. This seems potentially concerning.

Paul Christiano hit upon the same idea in his

paradox of ignorance(Section 6.1.5 of the non-omniscience paper). He presented the following scenario:Suppose two agents are trying to determine the truth of ∀x:ϕ(x) for computable ϕ. These agents are both using the same definition of P, and the same set of axioms (we can suppose these are PA). However, one agent has a large amount of computing power with which to approximate P, and the other has relatively little. Both agents have been given a resource: a machine which computes ϕ(x) for any value of x. Unfortunately, this machine gets slower and slower the larger the value of x is asked for.

Suppose that ∀x:ϕ(x) is actually true, but that this fact is not provable in PA. Now, it naively seems that the agent with a large amount of computing power is at an advantage. However, it's not quite so clear. What that computing power does is allow P to be computed more accurately. In particular, the strong agent will have time to prove many more cases of ϕ(x) than the weak agent. This means that the strong agent

cannotupdate on those cases of ϕ(x) when computed by the machine: the small examples already have probability 1, because the agent has proved them. To improve its knowledge, this agent must wait for large examples to slowly compute. The small agent, on the other hand, can look at many small values very quickly.This is far from a formal proof of the paradox, and Paul suggests several possible ways out. It's possible that the approximation of P raises the probability of ∀x:ϕ(x) just as much when it computes a new case of ϕ(x) as it's raised if we do a Bayesian update on its provability (observing the machine compute it). However, this is harder than it may initially sound.

Continued in part 2.