Does Probability Theory Require Deductive or Merely Boolean Omniscience?

bypotato4y3rd Aug 201510 comments


It is often said that a Bayesian agent has to assign probability 1 to all tautologies, and probability 0 to all contradictions. My question is... exactly what sort of tautologies are we talking about here? Does that include all mathematical theorems? Does that include assigning 1 to "Every bachelor is an unmarried male"?1 Perhaps the only tautologies that need to be assigned probability 1 are those that are Boolean theorems implied by atomic sentences that appear in the prior distribution, such as: "S or ~ S".

It seems that I do not need to assign probability 1 to Fermat's last conjecture in order to use probability theory when I play poker, or try to predict the color of the next ball to come from an urn. I must assign a probability of 1 to "The next ball will be white or it will not be white", but Fermat's last theorem seems to be quite irrelevant. Perhaps that's because these specialized puzzles do not require sufficiently general probability distributions; perhaps, when I try to build a general Bayesian reasoner, it will turn out that it must assign 1 to Fermat's last theorem. 

Imagine a (completely impractical, ideal, and esoteric) first order language, who's particular subjects were discrete point-like regions of space-time. There can be an arbitrarily large number of points, but it must be a finite number. This language also contains a long list of predicates like: is blue, is within the volume of a carbon atom, is within the volume of an elephant, etc. and generally any predicate type you'd like (including n place predicates).2 The atomic propositions in this language might look something like: "5, 0.487, -7098.6, 6000s is Blue" or "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant." The first of these propositions says that a certain point in space-time is blue; the second says that there is an elephant between two points at one second after the universe starts. Presumably, at least the denotational content of most english propositions could be expressed in such a language (I think, mathematical claims aside).

Now imagine that we collect all of the atomic propositions in this language, and assign a joint distribution over them. Maybe we choose max entropy, doesn't matter. Would doing so really require us to assign 1 to every mathematical theorem? I can see why it would require us to assign 1 to every tautological Boolean combination of atomic propositions [for instance: "(1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant OR ~((1, 1, 1, 1s), (-1, -1, -1, 1s) contains an elephant)], but that would follow naturally as a consequence of filling out the joint distribution. Similarly, all the Boolean contradictions would be assigned zero, just as a consequence of filling out the joint distribution table with a set of reals that sum to 1. 

A similar argument could be made using intuitions from algorithmic probability theory. Imagine that we know that some data was produced by a distribution which is output by a program of length n in a binary programming language. We want to figure out which distribution it is. So, we assign each binary string a prior probability of 2^-n. If the language allows for comments, then simpler distributions will be output by more programs, and we will add the probability of all programs that print that distribution.3 Sure, we might need an oracle to figure out if a given program outputs anything at all, but we would not need to assign a probability of 1 to Fermat's last theorem (or at least I can't figure out why we would). The data might be all of your sensory inputs, and n might be Graham's number; still, there's no reason such a distribution would need to assign 1 to every mathematical theorem. 


A Bayesian agent does not require mathematical omniscience, or logical (if that means anything more than Boolean) omniscience, but merely Boolean omniscience. All that Boolean omniscience means is that for whatever atomic propositions appear in the language (e.g., the language that forms the set of propositions that constitute the domain of the probability function) of the agent, any tautological Boolean combination of those propositions must be assigned a probability of 1, and any contradictory Boolean combination of those propositions must be assigned 0. As far as I can tell, the whole notion that Bayesian agents must assign 1 to tautologies and 0 to contradictions comes from the fact that when you fill out a table of joint distributions (or follow the Komolgorov axioms in some other way) all of the Boolean theorems get a probability of 1. This does not imply that you need to assign 1 to Fermat's last theorem, even if you are reasoning probabilistically in a language that is very expressive.4 

Some Ways To Prove This Wrong:

Show that a really expressive semantic language, like the one I gave above, implies PA if you allow Boolean operations on its atomic propositions. Alternatively, you could show that Solomonoff induction can express PA theorems as propositions with probabilities, and that it assigns them 1. This is what I tried to do, but I failed on both occasions, which is why I wrote this. 

[1] There are also interesting questions about the role of tautologies that rely on synonymy in probability theory, and whether they must be assigned a probability of 1, but I decided to keep it to mathematics for the sake of this post. 

[2] I think this language is ridiculous, and openly admit it has next to no real world application. I stole the idea for the language from Carnap.

[3] This is a sloppily presented approximation to Solomonoff induction as n goes to infinity. 

[4] The argument above is not a mathematical proof, and I am not sure that it is airtight. I am posting this to the discussion board instead of a full-blown post because I want feedback and criticism. !!!HOWEVER!!! if I am right, it does seem that folks on here, at MIRI, and in the Bayesian world at large, should start being more careful when they think or write about logical omniscience.