Foundations of Inference

by amcknight1 min read31st Oct 201117 comments

16

Personal Blog

I've recently been getting into all of this wonderful Information Theory stuff and have come across a paper (thanks to John Salvatier) that was written by Kevin H. Knuth:

Foundations of Inference

The paper sets up some intuitive minimal axioms for quantifying power sets and then (seems to) use them to derive Bayesian probability theory, information gain, and Shannon Entropy. The paper also claims to use less assumptions than both Cox and Kolmogorov when choosing axioms. This seems like a significant foundation/unification. I'd like to hear whether others agree and what parts of the paper you think are the significant contributions.

If a 14 page paper is too long for you, I recommend skipping to the conclusion (starting at the bottom of page 12) where there is a nice picture representation of the axioms and a quick summary of what they imply.

17 comments, sorted by Highlighting new comments since Today at 10:25 AM
New Comment

The paper is somewhat inconsistence in its assumptions.

First, we say that we have only a finite number of atoms without anything known about them. Then in Appendix A we say that we can have an arbitrary amount of atoms with precisely-equal valuation. This is already suspicious.

On closer look, Theorem from Appendix A is simply wrong. Trying to emulate polynomial multiplication inside real line pays off. Theorem from Appendix B lacks some additional requirements (maybe monotonicity? at least?) and without these requirements it is false (and counterexample is of the same preversion weight class as the counterexamples in Appendix A that show that all axioms are needed). I have not chacked the Theorem in Appendix C because I am lazy.

All-in-all, some of the "proven" formulas are actually pre-assumed in the paper. While it can be useful to some people to build a highly-connected mental map of the concepts related to inference and probability (which is always good), it doesn't prove things from the axioms.

I did downvote the post from +9 to +8, although I appreciate the caution of the post's author in not calling anything in the paper true (only "claimed"), because, well, it is just a link to one more subtly-false to its claims paper.

Well that sucks. The last thing I want to do is post a subtly wrong paper. Just to make sure it actually is subtly wrong:

On closer look, Theorem from Appendix A is simply wrong. Trying to emulate polynomial multiplication inside real line pays off.

I don't understand this. Can you elaborate a bit?

All-in-all, some of the "proven" formulas are actually pre-assumed in the paper.

This was done on purpose. From the beginning, the author is trying to find nice axioms that will prove the things he wants to. I'm not sure this is a fair criticism (if I'm understanding you correctly).

The last thing I want to do is post a subtly wrong paper.

With a correct disclaimer (a nice construction, but skip the proofs as they are wrong) it could be still useful.

On closer look, Theorem from Appendix A is simply wrong. Trying to emulate polynomial multiplication inside real line pays off. I don't understand this. Can you elaborate a bit?

"a + b = [a] + [b] + arctg(tg(0.5 π {a}) + tg(0.5 π {b}))/(0.5 π)"

"[a]" is the floor of a, or the integer part of a, i.e. maximum integer no more than a. "{a}" is the fractional part of a, i.e. "a-[a]".

The addition of valuations between 0 and 1 is isomorphic to normal addition, but occupies only part of the real line; the addition of integer parts is simple and honest addition.

This is not isomorphic to addition because any sum of repetitions of 0.1 is smaller than 1.

This addition is not continious, but the paper claims that it doesn't need addition to be continious.

All-in-all, some of the "proven" formulas are actually pre-assumed in the paper. This was done on purpose. From the beginning, the author is trying to find nice axioms that will prove the things he wants to. I'm not sure this is a fair criticism (if I'm understanding you correctly).

Of course we wanted axioms to prove our favourite theorems. The problem is that these axioms are not enough to prove them, a proof with unstated additional assumptions is given and then the paper proceeds as if the theorems it needs were actually proven from the axioms.

I didn't mean epistemiological sense, I meant logical sense - once we picked the axioms, we should stop assuming our theorems are true and recheck them.

a + b = [a] + [b] + arctg(tg(0.5 π {a}) + tg(0.5 π {b}))/(0.5 π)

This looks promising. At least, I don't yet see a general way to regrade it into normal addition. (I haven't digested Knuth and Skilling's purported general proof.) Can you make the counterexample more concrete? For example, what is a concrete set of images in R of atoms under the valuation such that this works as a counterexample? What is a minimal such set with respect to cardinality?

This is not isomorphic to addition because any sum of repetitions of 0.1 is smaller than 1.

It's not clear to me why this is a problem. You need to show that there is no regrading Θ satisfying Θ(ab) = Θ(a) + Θ(b) for all a, b in the image of the valuation. (Here I use ⊕ to denote your addition above.) It's true that the ⊕-sum of any number of 0.1s is smaller than 1, but the image of the ⊕-sum under the regrading might be larger than 1.

I haven't digested Knuth and Skilling's purported general proof.

One of the many subtle problems with their proof is that they don't understand what is wrong with saying "n times a can be declared equal to na, we will regrade later". The problem is, of course, covering all the line.

Can you make the counterexample more concrete? For example, what is a concrete set of images in R of atoms under the valuation such that this works as a counterexample? What is a minimal such set with respect to cardinality?

Well, their proof asks for lots of equally-valued atoms, so I gave an example that is optimized towards their best case.

For small number of atoms my example can be trivially regraded by using straight addition on fraction parts after dividing fraction parts by the tripled sum of all fraction parts of all atoms.

I want also to say that axioms are not even compatible, because 0+0=0, and not strictly more than 0.

You need to show that there is no regrading Θ satisfying Θ(a ⊕ b) = Θ(a) + Θ(b) for all a, b in the image of the valuation

If we consider my addiditon on R, there is no escape.

There are a,b such that sum of any number of copies of a is smaller than b. No monotonous regrading can break this property. For normal addition, it is nonsense.

Well, their proof asks for lots of equally-valued atoms, so I gave an example that is optimized towards their best case.

I admit that I still haven't fully digested their purported proof. But does the proof require lots of equally-valued atoms? Or does it just accommodate lots of equally-valued atoms? (The regrading Θ doesn't have to be unique.)

I want also to say that axioms are not even compatible, because 0+0=0, and not strictly more than 0.

Yes, I think that they forgot to say that axiom 1 only applies when y is non-null. Axiom 1 is based on equation (1): "x < xy". When they state equation (1), they do remember to require that y be non-null. So, let us charitably read axiom 1 as assuming that y is non-null.

Furthermore, in the sentence following axiom 3, they say "These equations are to hold for arbitrary values m(x), m(y), m(z) assigned to the disjoint x, y, z." A fair criticism is that they never defined "disjoint". I think that they mean for x and y to be disjoint if, when you write each as a join of atoms, no atom appears in both expressions. Strangely, they forgot to require that x and y be disjoint when they state equation (1). But it seems clear that they intended to require this, because they mention the disjointness condition explicitly when they give their "grounds" for equation (1).

The upshot is that I'd read both equation (1) and axiom 1 as assuming that y is non-null and disjoint from x.

You need to show that there is no regrading Θ satisfying Θ(a ⊕ b) = Θ(a) + Θ(b) for all a, b in the image of the valuation

If we consider my addiditon on R, there is no escape.

Yes, and I see now that they do claim to regrade ⊕ to be + "over positive reals". Suppose we charitably weaken their statement to the claim that ⊕ is + up to regrading when restricted to the image of the lattice under the valuation. That is, consider the claim that there is a regrading Θ satisfying Θ(xy) = Θ(x) + Θ(y) for all x, y in the image of the valuation (which is a finite set). Is this claim false? If I understand their goals correctly, this claim is all that they really need, anyway. Why should they care about real numbers that never appear as values of elements in the lattice?

Of course, if I saw an unfixable inconsistency, I would start my first comment with that. It is just a minor inconsistency.

As for amount of needed atoms - they use unlimited number of atoms to prove the existence of Θ. Maybe they don't really need it, but well... What they write is not a proof, what they prove is technically false.

Their claim may be accidentally true for a finite lattice. So far I fail to prove it or to find a counterexample.

Regardless of whether their axioms are sufficient they provide a very good motivation for a well-connected concept map. This is a good thing even if the proofs are all wrong.

I guess that fixing their proofs is a futile work; I am not even sure that the theorems are true in some reduced form still powerful enough for the rest of the article. I think that fixing the proof part would mean simply writing it from scratch.

As for amount of needed atoms - they use unlimited number of atoms to prove the existence of Θ.

In which line of the proof do they first require that assumption for their argument? Are they assuming an unlimited number, or just a "sufficiently large" number? If they really require an unlimited number, then it is contrary to the entire spirit of the paper, because they start out in the Introduction as committed finitists.

Sufficiently large to be larger than inverse of our precision requirements.

Are you referring to the line with "to arbitrary precision" on the bottom of page 17?

Although they don't express themselves as clearly as they could, I don't think that they mean anything like, "and hence we arrive at the exact regrading Θ in the limit by sending the number of atoms with the same valuation to infinity." Rather, I think that they mean that a larger number of atoms with the same valuations puts stronger constraints on the regrading Θ, but it is never so constrained that it can't exist.

In other words, their proof accommodates arbitrarily many atoms with the same valuation, but it doesn't require it.

The more closely I've read their proof, the more confident I've become that they prove the following:

Let L be a finite lattice satisfying equations (0)–(3), and let a valuation m: LR and a binary operation ⊕ on R satisfying axioms (0)–(3) be given. (Here, I take the equations and axioms to be corrected as described here).

Then there exists a strictly monotonically increasing function Θ: RR such that Θ(ab) = Θ(a) + Θ(b) for all a, bR such that a, b, and ab are in the range of m.

Well, they claim that the interleaving of a and b is linear. When they prove that if a:b>5/3 then a/b>3/2 (this implication is necessary to be consistent once we have 5 copies of a), they use 9 copies of a. It is easy to prove this particular case without appealing to more than 5 copies of a, but you need to do the things that the authors seem to specifically avoid.

The worst part is that they seem to use "to arbitrary precision" argument to prove how adding a and b would work.

Let me look up the paper once more..

OK. I give up.

Do they require commutativity or not? They say they don't need to preassume it.

a=1; b=1.01; a+b=2.02; b+a=2.03;

That's all. Two atoms. We have no other additions to check. There is not enough atoms to check associativity. Axiom 2 coincides with Axiom 1. It is impossible to regrade it with a strictly increasing Θ into commutative addition.

a=1; b=1.01; a+b=2.02; b+a=2.03;

Ha! Nice. I'd forgotten that they didn't require join to be commutative. But they very clearly and intentionally do not. I don't see any way to wriggle out of your counterexample if join isn't commutative. (If join is commutative, then, even if a+b and b+a were distinct, they wouldn't both be in the image of the lattice. My formulation of their theorem might still hold.)

I still have no idea whether your statement is true. It requires checking. But I hope now it is clear that no part of their proof can be trusted without some editing.

If you have enough interest to try to write a claim and a proof without references to the paper, I guess it would be nice to post it as a direct comment to the post.

btw, I mentioned the work you two have been doing here to the author and tried to get him to respond here but unfortunately he hasn't agreed to.

First, we say that we have only a finite number of atoms without anything known about them. Then in Appendix A we say that we can have an arbitrary amount of atoms with precisely-equal valuation. This is already suspicious.

Please elaborate. I don't see anything suspicious in your paraphrase. For example, it makes sense to me that, if we don't know anything about the atoms, then we have the same knowledge about all of them, which corresponds to assigning equal valuation to all of them.

It makes sense, but it is no obligation.

The statement of the theorem is "whatever our function does, if it is consistent with the axioms, it is addition". This is used in the context of finite and quite imaginable amount of atoms. We could ascribe all of them equal valuation, but we can have some knowledge why some are more probable than other ones. But the proof requires us to have a lot of atoms and to be able to find as many equally-valued atoms as we need. Proving some inequalities with existing amount of atoms may need more atoms than we initially considered. Also, it may be that we know enough to give every atom a distinct valuation, in which case the proof just stops being applicable.

I have a counterexample even if we grant the existence of arbitrarily many atoms with the same valuation (by the way, it means that sum exceeds 1); I will describe it in the answer to another comment - I hope it is correct.

Sets and quantities are deliberately finite because we have never encountered infinite quantity, infinite precision, or infinite information in our scientific endeavors, and neither do we expect to do so.

Has this been catching on in recent years? I remember both Wolpert and Jaynes had their own Declarations of Independence from Infinity, and it was novel to me back in the 90s, but I didn't really get out much.