Related to: Where Recursive Justification Hits Bottom, Priors as Mathematical Objects, Probability is Subjectively Objective

Follow up to: A Proof of Occam's Razor

In my post on Occam’s Razor, I showed that a certain weak form of the Razor follows necessarily from standard mathematics and probability theory. Naturally, the Razor as used in practice is stronger and more concrete, and cannot be proven to be necessarily true. So rather than attempting to give a necessary proof, I pointed out that we learn by induction what concrete form the Razor should take.

But what justifies induction? Like the Razor, some aspects of it follow necessarily from standard probability theory, while other aspects do not.

Suppose we consider the statement S, “The sun will rise every day for the next 10,000 days,” assigning it a probability p, between 0 and 1. Then suppose we are given evidence E, namely that the sun rises tomorrow. What is our updated probability for S? According to Bayes’ theorem, our new probability will be:

P(S|E) = P(E|S)P(S)/P(E) = p/P(E), because given that the sun will rise every day for the next 10,000 days, it will certainly rise tomorrow. So our new probability is greater than p. So this seems to justify induction, showing it to work of necessity. But does it? In the same way we could argue that the probability that “every human being is less than 10 feet tall” must increase every time we see another human being less than 10 feet tall, since the probability of this evidence (“the next human being I see will be less than 10 feet tall”), given the hypothesis, is also 1. On the other hand, if we come upon a human being 9 feet 11 inches tall, our subjective probability that there is a 10 foot tall human being will increase, not decrease. So is there something wrong with the math here? Or with our intuitions?

In fact, the problem is neither with the math nor with the intuition. Given that every human being is less than 10 feet tall, the probability that “the next human being I see will be less than 10 feet tall” is indeed 1, but the probability that “there is a human being 9 feet 11 inches tall” is definitely not 1. So the math updates on a single aspect of our evidence, while our intuition is taking more of the evidence into account.

But this math seems to work because we are trying to induce a universal which includes the evidence. Suppose instead we try to go from one particular to another: I see a black crow today. Does it become more probable that a crow I see tomorrow will also be black? We know from the above reasoning that it becomes more probable that all crows are black, and one might suppose that it therefore follows that it is more probable that the next crow I see will be black. But this does not follow. The probability of “I see a black crow today”, given that “I see a black crow tomorrow,” is certainly not 1, and so the probability of seeing a black crow tomorrow, given that I see one today, may increase or decrease depending on our prior – no necessary conclusion can be drawn. Eliezer points this out in the article Where Recursive Justification Hits Bottom.

On the other hand, we would not want to draw a conclusion of that sort: even in practice we don’t always update in the same direction in such cases. If we know there is only one white marble in a bucket, and many black ones, then when we draw the white marble, we become very sure the next draw will not be white. Note however that this depends on knowing something about the contents of the bucket, namely that there is only one white marble. If we are completely ignorant about the contents of the bucket, then we form universal hypotheses about the contents based on the draws we have seen. And such hypotheses do indeed increase in probability when they are confirmed, as was shown above.

 

New to LessWrong?

New Comment
36 comments, sorted by Click to highlight new comments since: Today at 11:31 PM

The issue is too involved to give a full justification of induction here, but I will try to give a very general idea. (This was on my mind a while back as I got asked about it in an interview.)

Even if we don't assume that we can apply statistics in the sense of using past observations to tell us about future observations, or observations about some of the members of a group to tell us about other members of a group, I suggest we are justified in doing the following.

Given a reference class of possible worlds in which we could be, in the absence of any reason for thinking otherwise, we are justified in thinking that any world from the reference class is as likely as any other to be our world. (Now, this may seem an attempt to sneak statistics in - but, really, all I said was that if we have a list of possible worlds that we could be in, and we don't know, then we our views on probability merely indicate that we don't know.)

The next issue is how this reference class is constructed - more specifically, how each member of the reference class is constructed. It may seem to make sense to construct each world by "sticking bits of space-time together", but I suggest that this itself implies an assumption. After all, many things in a world can be abstract entities: How do we know what appear to be basic things aren't? Furthermore, why build the reference class like that? What is the justification? It also forces a particular view of physics onto us. What about views of physics were space-time may not be fundamental? They would be eliminated from the referenc class.

The only justifiable way of building the reference class is to say that the world is an object, and that the reference class of worlds is "Every formal description of a world". Rather than make assumptions about what space is, what time is, etc, we should insist that the description merely describes the world, including its history as an object. Such a description is out situation at any time. At any time, we live in a world which has some description, and all I am saying is that the reference class is all possible descriptions. Now, it may seem that I am trying to sneak laws of nature and regular behavior in by the backdoor here, but I am not: If we can't demand that a world be formally describable we are being incoherent. If we can't demand that the reference class contains every such formal description, surely the most general idea we could have of building a reference class, we are imposing something more specific, with all kinds of ontological assumptions, on it.

Now, if we see regular patterns in a world, this justifies expecting those patterns to continue. For a pattern to be made by the description specifiying each element individually will take a lot of information. Therefore, the description must be highly specific and only a small proportion of possible world-descriptions in the reference class will comply. On the other hand, if the pattern is made by a small amount of information in the world-description, which describes the entire pattern, this is much less specific and a greater proportion of possible worlds will comply: We are demanding less specific information content in a possible world for it to be ours. Therefore, if we see a regular pattern, it is much more likely that our world is one of the large proportion of worlds where that pattern results from a small amount of information in the description that one of the much smaller proportion of worlds where it results from a much greater amount of information in the description.

A pattern which results from a small amount of information in the world description should be expected to be continued, because that is the very idea of a pattern generated by a small amount of information. For example, if you find yourself living in a world which looks like part of the Mandelbrot set, you should think it more likely that you live in a world where the Mandelbrot rule is part of the description of that world and expect to see more Mandelbrot pattern in every places.

Therefore, patterns should be expected to be continued.

I also suggest that Hume's problem of induction only appears in the first place because people have the misplaced idea that the reference class should be built up second by second, from the point of view of a being inside time, when it should ideally be built from the point of view of an observer not restricted in that way.

I also suggest that Hume's problem of induction only appears in the first place because people have the misplaced idea that the reference class should be built up second by second, from the point of view of a being inside time, when it should ideally be built from the point of view of an observer not restricted in that way.

That's a great observation! Thanks!

I will add something more to this.

Firstly, I should have made it clear that the reference class should only contain worlds which are not clearly inconsistent with ours - we remove the ones where the sun never rose before, for example.

Secondly, some people won't like how I built the reference class, but I maintain that way has least assumptions. If you want to build the reference class "bit by bit", as if you are going through each world as if it were an image in a graphics program, adding a pixel at a time, you are actually imposing a very specific "construction algorithm" on the reference class. It is that that would need justifying, whereas simply saying a world has a formal description is claiming almost nothing.

Thirdly, just because a world has a formal description does not mean it behaves in a regular way. The description could describe a world which is a mess. None of this implies an assumption of order.

ETA 2: this tile reads as if it is justifying induction as a principle, rather than when it is justifying induction in particular. Insifar as it is not addressing the problem o induction, it is on-point. I just don't think that point is terribly interesting, as it basically says "what counts as evidence depends on what we want to prove." My original comment follows, and applies if this was supposed to be about the problem of induction. I think you fundamentally misunderstand the problem of induction. The very concept of causation and updating on evidence rests on the assumption induction is possible. Without induction, there is no Bayes's theorem, because the very concept of evidence presupposes induction.

The problem of induction is, in short, how do we know that the future will be like the past? How do we know that our current observations at truly evidence of what the future will be like? The only evidence we have is that the future has always been like the past. Thus, the only evidence to justify induction presumes induction is possible.

This post does not appear to have anything to do with that. You can't use Bayesian evidence for induction, because the very concept of Bayesian evidence presupposes induction to be valid. ETA: stating that Bayes theorem presupposes induction may have been a bit strong. The point is that the thing that feeds into Bayes' presumes induction. Without induction, you cannot update on evidence, because "evidence" is a hollow concept. Bayes may not technically depend on induction; it's just that any actual application of it to the real world does.

Without induction, there is no Bayes's theorem, because the very concept of evidence presupposes induction.

I strongly disagree. Bayes theorem is a theorem of mathematics. It does not presuppose induction. See, for example, Jaynes, where Bayes's theorem is established in the first couple chapters and then used throughout the book. Induction, on the other hand, is something which Jaynes is a little puzzled by. He thinks the justification of induction is related to the justification of MAXENT priors, and he thinks that both can be rationally justified, but neither is iron-clad like Bayes theorem.

As Unknown's post points out, given some priors, your updating in response to evidence is induction-like, whereas given other priors, your updating may appear contrary to induction. But Bayes is applicable in both cases.

So how do we characterize this difference in priors? One thing we can say is that naive induction works (to some extent) whenever our prior regarding a population is such that a sample from the population provides information about the population.

When sampling without replacement from an urn which we know a priori contains 5 white and 5 red balls, we are in an anti-inductive situation. Sampling tells us nothing about the population - we already know (a priori) everything there is to know about the population. So if we draw a white ball, Bayes theorem tells us to reduce the probability that the next ball will be white.

But when sampling from an urn where our prior regarding the urn is something less well informed, sampling works better. When our prior is not well informed at all - when it is MAXENT - then induction works correctly. Each white ball drawn increases the probability that the next draw will be white.

So, according to Jaynes, the justification of induction is equivalent to the justification of using MAXENT priors in cases of no information. Not quite as well-founded in reason as Bayes theorem, but still pretty reasonable.

It occurs to me that not only is Bayes theorem more obviously correct than induction, it is also more general than induction.

Bayes theorem applies to all cases of updating beliefs upon receipt of evidence.

Induction is limited to a subset collection of cases - specifically those cases in which we wish to update our beliefs about a population using evidence which consists of a sample drawn from that population.

Edit to reply to your edit: Yes, I think that it is true that for many problems Bayes theorem isn't useful, and that for all problems where induction works, it is the fact that induction does work that makes Bayes theorem useful. These are all cases of updating based on a sample from a population. But there clearly are also problems where Bayesian reasoning is useful, but induction just doesn't apply. Problems where there is no population and no sample, but you do have an informative prior. Problems like Jaynes's burglar alarm.

Response to ETA 2.

I agree with your point that the question of the validity of Bayesian inference and the question about whether the future will be "like" the past are two logically independent questions. I also agree that it is convenient to use "the problem of induction" as a label for questions related to the "similarity" of past and future. Hence, I agree that the posting does not really help to solve "the problem of induction".

What does help? Well one argument goes like this: "If you believe the future is not like the past, then you must believe that the present is very special; that it is a boundary point between a stretch of time which is one way, and a stretch of time which is quite different. What is your justification for your belief in the special quality of the present? Perhaps your only possible move is to claim that the present is not special; that all points in time are boundary points. But we have evidence that this is not the case; the first half of the past is visibly similar to the second half of the past, for example."

This, of course is not an airtight argument. But it does show that someone who denies the validity of induction is likely to be forced into much more convoluted explanations of the evidence than would someone who adopts the simple hypothesis that "all points in time are pretty much alike". So, in a sense, induction reduces to Occam's razor.

Bayes's theorem depends on degrees of belief, not on induction, nor on the belief that the future will be like the past. In order for updating not to be valid, you have to exclude the possibility of degrees of belief. In fact, if you read through a proof of Bayes' theorem, you will see that it nowhere assumes the future will be like the past. In the same way, I could write this whole post so that it is entirely in the past or entirely in the future; the similarity of past and future is not relevant.

Replying to your edit:

Without induction, you cannot update on evidence, because "evidence" is a hollow concept.

Still wrong. As I pointed out in the main post, "The sun will rise tomorrow" has a probability of 1.0 given that "The sun will rise for the next 10,000 days." This means that the sun rising tomorrow is evidence that the sun will rise for the next 10,000 days, without presupposing induction.

For example, I might originally be convinced that the probability of the sun rising tomorrow is one in a billion, and the probability of it rising for the next 10,000 days, one in a google; i.e. I am convinced that induction is wrong and the future will not be like the past. Nonetheless Bayes' theorem inexorably forces me to update in favor of the sun rising for the next 10,000 days, if it rises tomorrow.

Wait, what? "The sun will rise tomorrow" also has a probability of 1.0 given that "The sun will rise tomorrow and then never rise again", so the sun rising tomorrow should make you update in favor of that hypothesis too. Why did you choose to focus on a hypothesis saying the future (starting from the day after tomorrow) will be like the past (tomorrow)? This is circular - the problem of induction all over again.

Yes, the probability of that will also increase after the first day-- it is perfectly consistent for the probability of both hypotheses to increase. But the day after that, the probability of the sun rising the next 10,000 days has increased even more, and the probability of your hypothesis has dropped to zero.

As I said in the post, people do in fact formulate universal hypotheses, namely ones which will suggest that the future is like the past. But you don't have to assume the future will be like the past to do this; as I said in the previous comment, you might even be assuming that the future will NOT be like the past. The Bayesian reasoning will work just the same.

But the day after that, the probability of the sun rising the next 10,000 days has increased even more, and the probability of your hypothesis has dropped to zero.

Who says this will happen in the first place? Even if you personally know that the sun will rise tomorrow like it always did, you're not allowed to use that fact while solving the problem of induction.

The only reason that I'm using it is because in real life we update more than once.

But if you want to focus on the single update, yes, both hypotheses become more probable.

Maybe an important insight into why the justification of induction remains so puzzling. We should be justifying induction as a policy, not as a magic-bullet formula which works in each and every instance.

Oh, and not to be nitpicky, but if you're relying on any kind of metric (e.g. your vision) to ascertain that the sun does rise tomorrow, you rely on induction. Without induction, there is simply no way of establishing that your observations correlate with anything meaningful. "The sun will rise tomorrow" cannot actually be confirmed without assuming induction; without the evidence confirming their reliability from past experience, our sensory data are meaningless. This is getting into a nightmarish level of abstraction for a relatively simple point, though.

Believing the sun will rise tomorrow with P=10^-9 is not failing to believe in induction. It's making a serious mistake, or being privy to some very interesting evidence. Without induction, no probability estimate is possible, because we simply have no idea what will happen.

I suspect this argument stems from different definitions of "induction."

If you define believing in induction as believing that the future will be like the past, it is possible to believe that the future will not be like the past, and one example of that would be believing that the sun will not rise tomorrow. Similarly, someone could suppose that everything that will happen tomorrow will be totally different from today, and he could still use Bayes' theorem, if he had any probability estimates at all.

You say, "Without induction, no probability estimate is possible, because we simply have no idea what will happen." Probability estimates are subjective degrees of belief, and it may be that there is some process like induction that generates them in a person's mind. But this doesn't mean that he believes, intellectually, that the future will be like the past, nor that he actually uses this claim in coming up with an estimate; as I just pointed out, some claims explicitly deny that induction will continue to work, and some people sometimes believe them (i.e. "The world will end tomorrow!")

In any case, it doesn't matter how a person comes up with his subjective estimates; a prior probability estimate doesn't need to be justified, it just needs to be used. This post was not intended to justify people's priors, but the process of induction as an explicit reasoning process-- which is not used in generating priors.

I suspect that the argument arises because, deep down, you don't yet accept that Bayes theorem is more fundamental than induction and that it shows us how to use evidence other than inductive evidence.

That said, you may well be correct in your "nitpick" to the effect that we wouldn't even be able to interpret sense data as ordinary everyday evidence without induction. That may well be, which would mean that we have to use induction and Bayes theorem at the sense data level before we can use Bayes at the ordinary everyday evidence level. But that does not make induction as fundamental as Bayes.

Since my original point was amended to indicate that my original point about Bayes was overstated, and that the true problem is that Bayes is quite useless without assuming induction is justified (i.e. any observation about the real world or prediction about the future presumes the principle of induction to be justified), I would hardly call this nitpicking. It is my point. Insofar as Bayes' theorem is purely mathematical, it is quite fundamental. I don't dispute that. You can't apply math to the real world without having a real world, and without assuming induction, you can't really have a concept of a real world.

It has occurred to me that the concept of "induction" upon which I rely may be different in nature from that being used by the people arguing with me. This is unsurprisingly causing problems. Induction, as I mean it, is not simply, "the future will be like the past," but, "the correlation between past observations and future observations is nonzero." That is so fundamental I do not think the human mind is capable of not essentially believing it.

If induction means "the correlation between past observations and future observations in nonzero," then not assuming induction could mean one of two things:

1)I might think there is some chance that the correlation is non-zero, and some chance that the correlation is zero. In this case Bayesian reasoning will still work, and confirms that the correlation is non-zero.

2) I might think the correlation is certainly zero. But in this case most people would not describe this as "not assuming induction", but as making a completely unjustified and false assumption instead. It is not negative (not assuming) but positive (assuming something.)

A universe in which every kind of past observation is uncorrelated with future observations of the same kind would be a world in which animals could not evolve. Hence, not the kind of universe I would care to (or be able to) contemplate. However, I am quite capable of believing that there are some kinds of observations which do not correlate with future instances of themselves. Random noise exists.

Assuming you have no objection to that, I suppose you can go on preaching that the key mystery of the universe and the basis of all epistemology is induction. I have no objection. But I do think you ought to read Jaynes. Who knows? You might find something there to change your mind or perhaps a clue to dissolving the mystery.

[Edit: Removed opening snark.]

I don't think induction is of particular importance. We can't function without assuming its validity. Thus, entertaining the idea that it is invalid is not constructive. I'd be very curious to see someone solve the problem of induction (which I briefly thought this was an attempt at), but it's hardly an urgent matter.

Picking up on animals not evolving makes about as much sense as picking up on the fact that, if it weren't for gravity, it would be tough to play badminton. This reinforces my suspicion that our concept of what we're arguing about is so vastly different that a productive resolution is impossible.

I suppose the origin of this whole digression could be summarized by saying I thought the post was about (the problem of) induction, and was a useless point about a moderately interesting topic. Instead, it's about (the practice of) induction, making it a decent but not terribly useful point about a rather uninteresting (or at least simple) topic. It is perhaps even less salient than the observation that, if we assume infinite sets of possibilities, then at some point Occam's razor must work by sheer force of the nature of finite sum infinite sets having to have some arbitrary point after which they decrease.

It is perhaps even less salient than the observation that, if we assume infinite sets of possibilities, then at some point Occam's razor must work by sheer force of the nature of finite sum infinite sets having to have some arbitrary point after which they decrease.

Ouch. Burn

"The sun will rise tomorrow" has a probability of 1.0 given that "The sun will rise for the next 10,000 days."

True. Not a counter example. Not an example, actually. There is no evidence in, "the sun will rise tomorrow if the sun will rise every day for the next 10,000 years." That statement derives its truth from the meaning of the words. It would be equally true if there were no sun and there were no tomorrow. Furthermore, without induction, you can never arrive at a probability estimate of the sun rising tomorrow in the first place. With no such thing as evidence, there's simply no way to construct a probability estimate. Your examples are assuming themselves past the exact problem I'm trying to point out.

A world where induction ceased to hold up is literally unimaginable. I don't think a sane (or even insane) person could conceive of a world in which induction does not work and the future does not have any relation to the past. That is likely why the problem is so difficult to understand.

I don't think you know how Bayes' theorem works... if I say "A & B will both be true," and it turns out that A is true, this is evidence for my original claim, despite the fact that it is implied by the meaning of the words... or rather in this case, because it is so implied.

Also, without induction, we can construct a probability estimate: take all the possibilities we know of, add the possibility "or something else", and then say that each of possibilities is equally likely. Yes, this probability estimate isn't likely to be calibrated but it will be the best we can do given the condition of not knowing anything about the relation of past and future.

and it turns out that A is true.

How can one confirm that this happens without induction?

That said, I'm becoming somewhat more convinced of your point; I'm just not sure it's of any practical value.

The issue is that any statement made about the real world requires induction. If we are speaking in the hypothetical, yes, if the sun rises tomorrow, then that increases the chance that the sun will rise for the next ten thousand days, in the same sense that if the next flozit we quivel is bzorgy, that increases the chance that the next 10,000 flozits we quivel will be bzorgy. It tells us nothing new or interesting about the actual world. Furthermore, we can never confirm that the conditional is fulfilled without sense data, the reliability of which presumes induction.

Just as you seem to think I underestimate the primacy of Bayes, I think you significantly underrate the depth of induction. A world without induction being valid is not so much one where the sun does not rise tomorrow as one that would make a drugged-out madman look like a Bayesian superintelligence. Without assuming the validity of induction, that there will be an earth or a sun or a tomorrow are all completely uncertain propositions. Without induction, far more is possible than we can ever hope to imagine. We can't construct any meaningful probability estimate, because without induction we would expect the world to look something like more colorful TV static.

The fact that your senses are reliable may "presume induction", just as it may also presume that you are not being deceived by Descartes' evil demon. But when you say "that grass is green," you don't consider that the only reason you know it isn't red is because there isn't any demon... instead you don't think of Descartes' demon at all, and likewise you don't think of induction at all. In any case, in whatever way "induction" might be involved in your senses, my article doesn't intend to consider this, but induction considered just as a way of reasoning.

My point is that a year ago, you could have had a 90% probability estimate that the world would look like TV static, and a 10% probability of something else. But the 10% chance has been confirmed, not the 90% chance. Or if you say that there was no basis for the 10% chance, then maybe you thought there was a 100% chance of static. But in this case you collapse in Bayesian explosion, since the static didn't happen.

In other words, "not assuming induction" does not mean being 100% certain that there will not be a future or that it will not be like the past; it means being uncertain, which means having degrees of belief.

Still wrong. As I pointed out in the main post, "The sun will rise tomorrow" has a probability of 1.0 given that "The sun will rise for the next 10,000 days." This means that the sun rising tomorrow is evidence that the sun will rise for the next 10,000 days, without presupposing induction.

Right. Induction only comes in when I infer, from the sun's rising tomorrow, that it will rise on the 9,999 days after tomorrow.

Thanks, I've added those links.

[-][anonymous]14y00

I thunk you fundamentally misunderstand the problem of induction. The very concept of causation and updating on evidence rests on the assumption induction is possible. Without induction, there is no Bayes's theorem, because the very concept of evidence presupposes induction.

Basically, the

Dangit, that's what I get for not reading thoroughly, my response was poorly relevant.

[-][anonymous]14y00

Exactly, this is basically what I went on to say (but you say it with more detail.)

Ummm, maybe I should undelete it?

This is why I should wake up and eat before posting...

[-][anonymous]14y00

It was correct but perhaps unnecessary.