Raven paradox settled to my satisfaction

by Manfred2 min read6th Aug 201424 comments


Personal Blog

The raven paradox, originated by Carl Gustav Hempel, is an apparent absurdity of inductive reasoning. Consider the hypothesis:

H1: All ravens are black.

Inductively, one might expect that seeing many black ravens and no non-black ones is evidence for this hypothesis. As you see more black ravens, you may even find it more and more likely.

Logically, a statement is equivalent to its contrapositive (where you negate both things and flip the order). Thus if "if it is a raven, it is black" is true, so is:

H1': If it is not black, it is not a raven.

Take a moment to double-check this.

Inductively, just like with H1, one would expect that seeing many non-black non-ravens is evidence for this hypothesis. As you see more and more examples, you may even find it more and more likely. Thus a yellow banana is evidence for the hypothesis "all ravens are black."

Since this is silly, there is an apparent problem with induction.



Consider the following two possible states of the world:

Either 100 black ravens, or 99 black 1 yellow

Suppose that these are your two hypotheses, and you observe a yellow banana (drawing from some fixed distribution over things). Q: What does this tell you about one hypothesis versus another? A: It tells you bananas-all about the number of black ravens.

One might contrast this with a hypothesis where there is one less banana, and one more yellow raven, by some sort of spontaneous generation.

Observations of both black ravens and yellow bananas cause us to prefer 1 over 3, now!

The moral of the story is that the amount of evidence that an observation provides is not just about whether it whether it is consistent with the "active" hypothesis - it is about the difference in likelihood between when the hypothesis is true versus when it's false.

This is a pretty straightforward moral - it's a widely known pillar of statistical reasoning. But its absence in the raven paradox takes a bit of effort to see. This is because we're using an implicit model of the problem (driven by some combination of outside knowledge and framing effects) where nonblack ravens replace black ravens, but don't replace bananas. The logical statements H1 and H1' are not alone enough to tell how you should update upon seeing new evidence. Or to put it another way, the version of induction that drives the raven paradox is in fact wrong, but probability theory implies a bigger version.


(Technical note: In the hypotheses above, the exact number of yellow bananas does not have to be the same for observing a yellow banana to provide no evidence - what has to be the same is the measure of yellow bananas in the probability distribution we're drawing from. Talking about "99 ravens" is more understandable, but what differentiates our hypotheses are really the likelihoods of observing different events [there's our moral again]. This becomes particularly important when extending the argument to infinite numbers of ravens - infinities or no infinities, when you make an observation you're still drawing from some distribution.)


24 comments, sorted by Highlighting new comments since Today at 9:15 PM
New Comment

If we were sampling random non-black objects and none of them were ravens, that really would be evidence that all ravens are black.

The reason it seems silly to take a yellow banana as evidence that all ravens are black is that 'sampling the space of nonblack things' is not an accurate description of what we're doing when we look at a banana. When we see a raven, we do implicitly think it's more or less randomly drawn from the (local) population of ravens.

If you had grown up super-goth and only ever seen black things, you would have no idea what things have nonblack versions. If you went outside one day and saw a bunch of nonblack things and none of them were ravens, you might indeed start to suspect that all ravens were black; the more nonblack things you saw, the stronger this suspicion would get.

I agree. In the first example, it's because if our probability distribution only encompasses two categories, any increase in one is a decrease in the other. In the second example, it's because the ex-super-goth's hypothesis space includes all sorts of relationships between number of black things and number of nonblack things - their preconceptions about the world are different, rather than you just stipulating that they sample non-black things.

Perhaps a Bayesian approach would be illuminating. There are four kinds of objects in the world: black ravens, nonblack ravens, black nonravens, and nonblack nonravens. Call these A, B, C, and D. Let the probability you assign to the next object that you encounter being in one of these classes be p, q, r, and s respectively. Rather than having two competing hypotheses about the blackness of ravens, there is a prior distribution of the parameters p, q, r, and s.

(Note that the way I've set this up removes any concept of blackness common to black ravens and black nonravens. The astute -- more astute than me, for whom this is the last paragraph written -- may guess at once that P naq Q ner tbvat gb or rkpunatrnoyr va guvf sbezhyngvba, naq gurersber arvgure zber guna gur bgure pna or rivqrapr eryngvat gb gur inyhrf bs c naq d. I come back to this at the end.)

In a state of total ignorance, a reasonable prior for the distribution of (p,q,r,s) is that they are uniformly distributed over the tetrahedron in four-dimensional space defined by these numbers being in the range 0 to 1 and their sum being 1.

After observing numbers a, b, c, and d of the four categories, the posterior is (after a bit of mathematics) p^a q^b r^c s^d/K(a,b,c,d), where K(a,b,c,d) = a!b!c!d!/(N+3)!, where N = a+b+c+d. (The formula generalises to any number of categories, replacing 3 by the number of categories minus 1.)

The expectation value of p is K(a+1,b,c,d)/K(a,b,c,d) = (a+1)/(N+4), and similarly for q, r, and s. (Check: these add up to 1, as they should.)

How does the expectation value of p change when you observe that the N+1'th object you draw is an A, B, C, or D?

If it's an A, the ratio of the new expectation value to the old is (a+2)(N+4)/(a+1)(N+5). For large N this is approximately 1 + 1/(a+1) - 1/(N+5) > 1.

If it's a B (and the cases of C and D are the same) then the ratio is (N+4)/(N+5) = 1 - 1/(N+5) < 1.

So observing an A increases your estimate of the proportion of the population that are A, and observing anything else decreases it, as one would expect. That was just another sanity check.

Now consider the ratio q/p, the ratio of non-black to black ravens. The expectation of this, assuming a>0 (you have seen at least one black raven), is K(a-1,b+1,c,d)/K(a,b,c,d) = (b+1)/a. This increases to (b+2)/a when you observe a nonblack raven, and decreases to (b+1)/(a+1) when you observe a black one. (I would have calculated the expectation of q/(q+p), the expected proportion of ravens that are nonblack, but that is more complicated.)

If you have seen a thousand black ravens and no nonblack ones, the increase is from 1/1000 to 2/1000, i.e. a doubling, but the decrease is from 1/1000 to 1/1001, a tiny amount. On the log-odds scale, the first is 1 bit, the second is about 0.0014 bits.

On this analysis, observations of nonravens, whether black or not, have no effect on the expectation of the proportion of nonblack ravens.

If we reformulate the original hypothesis that all ravens are black as "q/p < 0.000001", then observing the 1001th raven to be green will pretty much kill that hypothesis, until we see of the order of a million black ravens in a row without a nonblack one. But the nonraven objects will continue to be irrelevant: C and D are exchangeable in this formulation of the problem.

Now reconsider the original paradox on its own terms. I will draw a connection with the grue paradox.

Suppose we accept the paradoxical argument that "All ravens are black" and "all nonblack things are nonravens" are logically equivalent, and therefore everything that is evidence for one is evidence for the other.

Let "X is bnonb" mean "X is a black raven or a nonblack nonraven." Consider the hypothesis that all ravens are bnonb, and its contrapositive, that all non-bnonb things are nonravens. In effect, we have exchanged C and D, but not A and B. Every argument that nonblack nonravens are evidence for all ravens being black is also an argument than nonbnonb nonravens are evidence for all ravens being bnonb. But substituting the definition of bnonb in the latter, it claims that black nonravens are evidence for the blackness of ravens. Hence both black and nonblack nonravens support the blackness of ravens.

But there's more. Swapping black and nonblack in all of the above would imply that both black and nonblack nonravens are evidence for the nonblackness of ravens.

At this point we appear to have proved that all nonravens are evidence for every hypothesis about ravens. I don't think the original paradox can be saved by arguing that yes, nonblack nonravens are evidence, just an utterly insignificant amount, as some do.

A further elaboration then occurred to me. If non-ravens are, as the above argument claims, not evidential for the properties of ravens, then neither are non-European ravens evidential for the properties of European ravens, which does not seem plausible. This amount of confusion suggests that some essential idea is missing. I had thought causality or mechanism, but the Google search suggested by that turned up this paper: "Infinitely many resolutions of Hempel's paradox" by Kevin Korb, which takes a purely Bayesian approach, which I think has something in common (in section 4.1) with the arguments of the original post. His conclusion:

We should well and truly forget about positive instance confirmation: it is an epiphenomenon of Bayesian confirmation. There is no qualitative theory of confirmation that can adequately approximate what likelihood ratios tell us about confirmation; nor can any qualitative theory lay claim to the success (real, if limited) of Bayesian confirmation theory in accounting for scientific methodology.

ETA: Another paper with a Bayesian analysis of the subject.

And then there is the Wason selection task, where you do have to examine both the raven and the non-black object to determine the truth of "all ravens are black". But with actual ravens and bananas, when you pick up a non-black object, you will already have seen whether it is a raven or not. Given that it is not a raven, examination of its colour tells you nothing more about ravens.

"A further elaboration then occurred to me. If non-ravens are, as the above argument claims, not evidential for the properties of ravens, then neither are non-European ravens evidential for the properties of European ravens, which does not seem plausible." - Wait so you're saying that the argument you just made in the post above is incorrect? Or that the argument in main is incorrect?

I am saying that I am confused.

Hempel gave an argument for a conclusion that seems absurd. I first elaborated a Bayesian argument for arriving at the opposite of the absurd conclusion, and because the conclusion (non-black non-ravens say nothing about the blackness of ravens) seems at first sight reasonable, one might think the argument reasonable (which is not reasonable, because there is nothing to stop a bad argument giving a correct conclusion).

Then I showed that combining Hempel's argument with the grue-like concept of bnonb yielded a Hempel-style argument for non-ravens of all colours being evidence for the blackness of ravens, and further extended it to show that all properties of non-ravens are evidence for all properties of ravens.

Then I took my original argument and observed that it still works after replacing "raven" and "non-raven" by "European raven" and "non-European raven".

At this point both arguments are producing absurd results. Hempel's has broadened to proving that everything is evidence for everything else, and mine to proving that nothing is evidence for anything else.

I shall have to work through the arguments of Korb and Gilboa to see what they yield when applied to bnonb ravens.

Meanwhile, the unanswered question is, when can an observation of one object tell you something about another object not yet observed?

Having now properly read Korb's paper, the basic problem he points out is that to do a Bayesian update regarding a hypothesis h in the presence of new evidence e, one must calculate the likelihood ratio P(e|h)/P(e|not-h). Not-h consists of the whole of the hypothesis space excluding h. What that hypothesis space is affects the likelihood ratio. The ratio can be made equal to anything at all, for some suitable choice of the hypothesis space, by constructions similar to those of the OP.

It makes the same negative conclusion when applied to bnonb ravens, or to European and non-European ravens.

Although this settles Hempel's paradox, it leaves unanswered a more fundamental question: how should you update in the face of new evidence? The Bayesian answer is on the face of it simple mathematics: P(e|h)/P(e|not-h). But where does the hypothesis space that defines not-h come from?

In "small world" examples of Bayesian reasoning, the hypothesis space is a parameterised family of distributions, and the prior is a probability distribution on the parameter space. New evidence will shift that distribution. If the truth is a member of that family, evidence is likely to converge on the correct parameters.

I have never seen a convincing account of how to do "large world" Bayesian reasoning, where the hypothesis space is "all theories whatsoever, even yet-unimagined ones, describing this aspect of the world". Solomonoff induction is the least unconvincing, by virtue only of being precisely defined and having various theorems provable about it, but one of those theorems is that it is uncomputable. Until I see someone make some sort of Solomonoff-based method work to the extent of becoming a standard part of the statistician's toolkit, I shall continue to be sceptical of whether it has any practical numerical use. How should you navigate in a large-world hypothesis space, when you notice that P(e|h) is so absurdly low that the truth, whatever it is, must be elsewhere?

Given the existence of polar bears, arctic foxes, and snow leopards, I wondered if there might be any white-feathered ravens in the colder parts of the world. A Google search indicates that while ravens are found there, they are just as black as their temperate relatives. I guess you don't need camouflage to sneak up on corpses. Now that looks like good evidence for all ravens being black: looking in places where it is plausible that there could be white ravens, and finding ravens, but only black ones. The not-h hypothesis space has room for large numbers of white ravens in a certain type of remote place. That part of the space came from observing polar bears and the like, and imagining a similar mechanism, whatever it might be, in ravens. Finding that even there, all observed ravens are black, removes probability mass from that part of the space.

An excellent quote! If Stefan had found that one I should have been honor-bound to add it to the post :P

The "Raven paradox" was used as a starting point to the famous article "Natural Kinds" by W.V.O. Quine; it is one of the two articles by Quine that set the anthology Naturalizing Epistemology in motion, as mentioned in my article immediately previous to this one at http://lesswrong.com/r/discussion/lw/kp1/from_natural_or_naturalized_to_social_epistemology/

It seems to have motivated Quine's perhaps throwing up his hands on formal methods of epistemology, and suggesting we "settle for psychology" (not sure if he used that phrase -- if not, it's a commonly used characterization of his position).

At least part of the trouble seems to be that he proposes non-black non-ravens isn't a natural kind. Non-ravens would seem to be all "things" that aren't ravens, but consider what an incoherent concept that is. Do "things" include every atom in the universe? For quite a lot of "things" (atoms included, I think) the quality of blackness makes no sense.

So maybe there are around 100,000,000 ravens in the world, and as I examine Ravens and find N black ones and no non-black ones, I can say N down, 100,000,000-N to go, and that might seem like progress. Whereas when I pick one atom (does it have a color?), one H2O molecule, one green leaf, and one blue eye of newt, I have no meaningful concept of how many more "non-ravens" there are to sample.

Now if very hypothetically, ravens belonged to a genus with just one other species, also having 100,000,000 members, and the whole universe of ravenoids was frozen in time instead of multiplying and dying as we tried to sample them, we might say upon selecting one non-black non-raven, "That's one bit of evidence that doesn't contradict my hypothesis, and when I've sampled the whole 200,000,000 in the ravenoid universe with no contradiction of the hypothesis and a number all black ravens, I can say the hypothesis is true. A black non-raven also doesn't contradict the hypotheses and is also "one more down" and goes towards the ultimately complete sampling of the 200,000,000 entities during which we hope that every raven we find will be black.

I.e. our intuition, if we have one, that {{the equivalent logical proposition "All non-black non-ravens" really should have an analogous method for gathering evidence}} might be less ridiculous if only "non-black non-ravens" actually meant something coherent.

For what it's worth there is also a 48 page 2010 article "How Bayesian Confirmation Theory Handles the Paradox of the Ravens" by Branden Fitelson and James Hawthorne (fitelson.org/ravens.pdf -- actually it's only 29 pages in this PDF due to different layout I suppose.). I've been meaning to read it, but think I'll have to work my way up to it.

Ravens is a tighter cluster in thing-space than non-ravens, so we'd expect a tighter correlation of color. Thus, it takes a lot more non-black non-ravens to convince me that all non-blacks are non-ravens than it does ravens to convince me that all ravens are black.

[-][anonymous]7y 3

H1: All ravens are black.

H1': If it is not black, it is not a raven.

Inductively, just like with H1, one would expect that seeing many non-black non-ravens is evidence for this hypothesis. As you see more and more examples, you may even find it more and more likely. Thus a yellow banana is evidence for the hypothesis "all ravens are black."

Since this is silly, there is an apparent problem with induction.

Question: H1 and H1' appear to be logically equivalent to:

H1'' There do not exist any things which are both not black, and a raven.

And this seems to have different implications in a finite universe and an infinite universe.

For instance, in a finite universe of 10,000 things, if you've found 99 yellow bananas and 1 black raven, there are 9,900 things which could potentially disprove H1''. If you then observe an additional 100 yellow bananas, there are now only 9,800 things that could potentially disprove H1'', so it would make sense that H1'' becomes a small amount more likely, since if all of the remaining untested things were yellow bananas, and you tested them all, at the point at which you tested the last thing you would be much more confident about H1'', and presumably that confidence grows as you get closer to testing the last thing as opposed to coming all at once at only the last thing.


In a infinite universe of infinite things, if you've found 99 yellow bananas and 1 black raven, there are infinite things which could potentially disprove H1''. If you then observe an additional 100 yellow bananas, there are still an infinite number of things that could potentially disprove H1'', so H1 would not necessarily become a small amount more likely because of the argument I just gave since there is no 'last thing' to test.

When I looked at http://en.wikipedia.org/wiki/Raven_paradox , I'm not sure if anything I just said is any different from the Carnap approach, except that the Carnap approach described in the article does not appear to mention infinities, so I'm not sure if I'm making an error or not.

One very simple resolution: observing a white shoe (or yellow banana, or indeed anything which is not a raven) very slightly increases the probability of the hypothesis "There are no ravens left to observe: you've seen all of them". Under the assumption that all observed ravens were black, this "seen-em-all" hypothesis then clearly implies "All ravens are black". So non-ravens are very mild evidence for the universal blackness of ravens, and there is no paradox after all.

I find this resolution quite intuitive.

It took me a bit to understand what you were saying. I think I'd have gotten it more clearly with some mathematical notation:

H1: The hypothesis that there exists at least one non-black ravens.
H2: The hypothesis that there exist zero non-black ravens.
YB: Observing a yellow banana when randomly picking an object to observe.


So if we assume that our priors for the hypotheses H1 and H2 are the same then if we also assume the additional constraint that P(YB|H1)=P(YB|H2) (both hypotheses refers to possible worlds with the same number of yellow bananas), then P(H1|YB) = P(H2|YB), meaning it doesn't provide more evidence for one hypothesis than the other.

However given possible worlds where e.g. the number of black ravens remains fixed, but the number of yellow bananas is reduced, the argument that observing a yellow banana increases the possiblity of the existence of black ravens becomes true.

Well, it's not really the number of yellow bananas that matters. It's their measure in the probability distribution we're drawing from. In fact, I was unclear about that in the post, let me go add a note.

I think it's just wrong that "H1': If it is not black, it is not a raven" predicts that you will observe non-black non-raven objects, under the assumption/prior that the color distributions within each type of object (chairs, ravens, bananas, etc.) are independent of each other.

The intuition comes from implicitly visualizing the observation of an unknown non-black object O; then, indeed, H1 predicts that O will turn out to not be a raven. Then point is, even observing that O is non-black would decrease your credence in H1; and then increase it again when you saw that O was not a raven. Since H1 is only about ravens, by the independence assumption, H1 says nothing about non-ravens and whether you will see non-black ones. (I.e., its likelihood ratio for "observe a non-black non-raven object" is 1.)

This model of independence between shapes is what I'm calling the implicit model that people use to say that the conclusion of the raven paradox is absurd.

Right, I should have written, "I agree. Also, ...". I just wanted to find the source of the intuition that seeing non-black non-ravens is evidence for "non-black -> non-raven".

I'd prefer if you referred to some of the vast existing literature on this topic when you post on it.

Tell you what, if you find an example in the literature of someone clarifying the situation using the concept of probabilistic evidence, I'll add it to the post. Not that I doubt such a thing exists, finding it just sounds like no fun.

Here is the Wikipedia article on Raven's paradox. It makes it clear how big the literature on the topic is. In my view, you should situate your proposal relative to at least some of those proposals when writing a post like this. It is hard to evaluate the value of your proposal (and whether it is at all worth reading) when you haven't done that.

If reading through the thoughts of people who don't know how to apply likelihood ratios is no fun to me, I don't want to inflict it on my readers either.

Aw, jeez, the wikipedia list is worse than I thought. The stanford encyclopedia of philosophy made the mainstream look more reasonable, if still bad at using probability.

Will you accept me situating my proposal as "the one that shows how basic probability theory implies that induction has more degrees of freedom than one might first think?"

I'm sorry but no, that is not enough. I want clear and reasonably detailed reasons for why the mainstream is wrong in posts like this. Some very smart people have worked on this problem and you need to at least comment on their views in order to be taken seriously by me.

Thank you for helping me understand where you're coming from, but since this is a simple application of probabilistic reasoning I think it stands fine on its own merits.