# 15

Well, you can.  It's just oxymoronic, or at least ironic.  Because belief is contrary to the Bayesian paradigm.

You use Bayesian methods to choose an action.  You have a set of observations, and assign probabilities to possible outcomes, and choose an action.

Belief in an outcome N means that you set p(N) ≈ 1 if p(N) > some threshold.  It's a useful computational shortcut.  But when you use it, you're not treating N in a Bayesian manner.  When you categorize things into beliefs/nonbeliefs, and then act based on whether you believe N or not, you are throwing away the information contained in the probability judgement, in order to save computation time.  It is especially egregious if the threshold you use to categorize things into beliefs/nonbeliefs is relatively constant, rather than being a function of (expected value of N) / (expected value of not N).

If your neighbor took out fire insurance on his house, you wouldn't infer that he believed his house was going to burn down.  And if he took his umbrella to work, you wouldn't (I hope) infer that he believed it was going to rain.

Yet when it comes to decisions on a national scale, people cast things in terms of belief.  Do you believe North Korea will sell nuclear weapons to Syria?  That's the wrong question when you're dealing with a country that has, let's say, a 20% chance of building weapons that will be used to level at least ten major US cities.

Or flash back to the 1990s, before there was a scientific consensus that global warming was real.  People would often say, "I don't believe in global warming."  And interviews with scientists tried to discern whether they did or did not believe in global warming.

It's the wrong question.  The question is what steps are worth taking according to your assigned probabilities and expected-value computations.

A scientist doesn't have to believe in something to consider it worthy of study.  Do you believe an asteroid will hit the Earth this century?  Do you believe we can cure aging in your lifetime?  Do you believe we will have a hard-takeoff singularity?  If a low-probability outcome can have a high impact on expected utility, you've already gone wrong when you ask the question.

# 15

New Comment

As a young rationalist, I considered "belief" to refer to certainty, and proudly procalimed my lack of belief in anything. "I believe" - became a term of mockery - pronounced with the intonation of a preacher: "aaaah believe..." .

However, not everyone has adopted this meaning. Since it renders the term practically useless, this seems understandable.

As a young rationalist, I considered "belief" to refer to certainty

Yes, I remember thinking that as well. I grew out of it. What puzzles me is why I, or anyone, could ever have thought that, since the word is not actually used that way, nor defined so in any dictionary. In actual use, it means to take as true; in a religious context, to have faith, i.e. to take as true in despite of the absence of evidence. And to take as true is to be willing to act on the premise that it is true. Which in turn is very like the thresholding spoken of in the original post: when 1-p is an epsilon too small to be worth tracking.

Colloquially, "I believe" even expresses a certain positive degree of doubt, a step below saying "I know".

In my case, I think it came from religion - where belief and unquestioning faith are concepts which are freely intermingled - and doubt is the start of the path to damnation.

I avoid saying "I believe" to this day - since the connotations of faith still seem to be present.

People would often say, "I don't believe in global warming." And interviews with scientists tried to discern whether they did or did not believe in global warming.

I note the following:

1) You claim that there's a common term that makes its surrounding question wrong 2) BUT it would be corrected by a trivial substitution, 3) and the people using the term seem to understand the issues you raise (i.e. you needn't be certain global warming catastrophes will happen in order to justifiably take countermeasures)

Together, that suggests that you went wrong somewhere, and people were actually using the term differently than you thought. Remember, words are hidden inferences. They arise whenver people identify a usefully-clumped cluster of thingspace. In the above, "believe" grabs a set of things in thingspace, and one attribute that makes it fall into the above "believe" category is "we should act as if it will happen".

So it's true that people need to carefully distinguish the separate issues you raised, but that's not the same as saying it's a wrong question.

P.S. Oops, I guess I fell into the same trap I just accused you of. Using a word that doesn't naturally make the distinctions critical for the problem you're looking at, does make it a wrong question.

Any given word can't make the distinction critical to all problems. The word "tiger" is useless for that problem as well. A word itself is wrong if it's not useful enough or if it provokes abuse, that is wrong usage. The usage of a word is wrong if it confuses human thinking.

Nobody uses the word "believe" to only refer to the statements of certainty. In some situations it might be implied, but in others the opposite is implied, particularly when the statement is second-order, like "I believe it's possible that I'll die in a car accident". The distinction between first-order claims of certainty and second-order qualified claims of uncertainty is moot in the natural language. Whenever the probability passes the threshold, the unqualified "believe", or even "certain" gets used, and whenever the probability is below the threshold, a different construction is used to express that.

I don't see how it's a source of confusion, something that needs to be fixed.

I don't know how I can re-explain it other than just repeating the examples in my post. People see that proposition X implies action A. They then try to decide whether they believe X. If they don't, they don't take action A. This is wrong.

Also, "I believe it's possible that I'll die in a car accident" is a statement of certainty. Parse it.

The solution to this isn't to reject the very useful concept of belief (which is already generally used to mean "probability 1 minus epsilon" by many people), but to

• get people to see the fatal error in preparing for only the most probable outcome each time, and
• convince them it's sometimes OK to be unsure about which branch of a disjunction holds.

Yes. Belief is still useful. It's mainly in situations where a low-probability outcome has a high cost or benefit that it causes problems.

It looks like I agree with you but disagree with your original post. What's the problem with saying we believe Bayes' Theorem, and clarifying if asked that we ascribe probability 1 minus epsilon to it?

The rest of your post is of value, but the "You can't believe in Bayes' Theorem" hook goes awry.

Fair enough.

[-]loqi15y10

Fantastically concise summary to a great post. I've tried to explain this to others a few times, and came nowhere near such a direct statement of the problem.

Phil, nothing is a statement of absolute certainty, natural language doesn't express anything precisely. It's wrong to read even "I'm absolutely certain that 2+2=4" as a statement of absolute certainty.

Um, how is that relevant? You're the one who introduced the word 'certainty'.

"I believe it's possible that I'll die in a car accident" is a statement of uncertainty in the event "I'll die in a car accident", so how is it relevant that the statement as a whole is a statement of certainty? I misjudged, trying to find the cause of you mentioning that, which now opens that question explicitly.

"I believe it's possible that I'll die in a car accident" is a statement of uncertainty in the event "I'll die in a car accident"

Nitpick: No, it's not. Things that are necessary are all also possible. For instance, it is possible that 2+2=4, because it is not impossible that 2+2=4. It's not as strong a statement as someone who believed that death by car accident was inevitable could make, but it's not an expression of uncertainty all by itself unless the speaker is doing something with tone of voice ("Sure, I guess I think it's possible that I could die in a car accident...")

[-][anonymous]15y00

Actually, it's just as strong a statement of certainty; but it is expressing certainty that the proposition "it is possible that I will die in a car accident" is true, not that "I will die in a car accident" is true.

That's not what I was talking about, interpreting "It's possible that X will happen" as "the event X is non-empty" is as wrong as interpreting "I believe X will happen" as "negation of even X is empty". Uncertainty is just lack of certainty, "it's possible" expresses probability lower than that of "it's probable", way below "it's certain". See also the references from the Possibility article on the wiki.

No it's not. It's an assertion about someone's understanding and expectations. You're confusing the subject of the sentence with the subject of the subordinate clause.

"I believe it's possible that I'll die in a car accident" is a statement of uncertainty in the event "I'll die in a car accident"

No; it's a statement of certainty; but it is expressing certainty that the proposition "it is possible that I will die in a car accident" is true, not that "I will die in a car accident" is true.

natural language doesn't express anything precisely

Of course it can and does. People just don't care much about precision.

The problem lies in making the precision explicit. That's why various non-natural languages like the conventions of mathematics were generated.

Well, you can. It's just oxymoronic, or at least ironic. Because belief is contrary to the Bayesian paradigm. You use Bayesian methods to choose an action. You have a set of observations, and assign probabilities to possible outcomes, and choose an action.

If you're always using Bayesian methods to choose an action, it doesn't matter what value of P(Bayes' theorem) is set in your skull; it may as well be 1. If Bayes' theorem is built into your very thought processes, if it's false you're fucked.

You might be able to get around this by following Bayesian methods to choose an action as long as P(Bayes' theorem)>.5, and then scrapping your entire decision-making algorithm and building a new one from scratch when this stops being true. But how do you decide on a new decision-making algorithm once your old decision-making algorithm has failed you?

Black Belt Bayesian had a similar post, Assuming Is Not Believing.

That post and this one both remind me of Eliezer's accounts of how the most humble are those who actually prepare for their own worst mistakes, even if they don't really anticipate making them.

I think this is a very useful dichotomy, one that could use more attention. Preparing for an event, and expecting an event. Different' things.

Black Belt Bayesian had a similar post, Assuming Is Not Believing.

That's not what I usually mean by assume. (But I'm not a native English speaker, so maybe I just somehow picked up the wrong meaning.)

[-][anonymous]15y00

Great post!

"believe x" approximately means "think it is probable/highly probable that x" to me. It would seem you don't share this definition, as it makes your post into gibberish; what does "believe x" mean?

Certain religions use this word to mean faith or certainty, whichever is convenient at the time. This makes the word somewhat meaningless in those contexts, but this would not appear to be one of those contexts, and neither would the political question you mention.

"believe x" approximately means "think it is probable/highly probable that x" to me. It would seem you don't share this definition, as it makes your post into gibberish; what does "believe x" mean?

That is approximately what "believe x" means, and the post would be gibberish if it did not mean that.

It's the wrong question. The question is what steps are worth taking according to your assigned probabilities and expected-value computations.

This appears to be a non-sequitar to me; it seems entirely natural to me that both whether you think X is probable AND what you ought to in the case that you think X is probable are both reasonable (and, depending, necessary) questions.

Or maybe I completely don't understand what you wrote. Sorry if I came off brusquely.

The probability you assign to X is relevant. The point is that once you use the "belief" frame, you're throwing away that probability in favor of a "believe / don't believe" duality.

Of course you throw out the details when you choose a word. The same happens when you choose any other word. The same argument seems to chasten the use of a word "tiger" when describing a tiger, since that throws away an exact probability estimate of the apparition being a tiger.

That's a problem that's very difficult to avoid. But the more general case, which I am discussing here, is often easy to avoid.

Ah, that explains it.

In my head, when I hear people they say "believe" something, I take that to mean they think it is 55% to 85% probable (numbers not exact, obviously) (Outside of religious contexts, of course). It somehow didn't occur to me that that's probably a weird thing to do.

[-]loqi15y20

I don't think it's particularly weird. For trivial or everyday propositions, the word often seems to denote that kind of interval:

"I believe the show is on the 4th. Let me check."

"I believe it stars Philip Seymour Hoffman."

"Hold on. I believe my phone is ringing."

In statements like these, "believe" plays the role of a qualifier. To express (1-epsilon) certainty, we just omit all qualifiers: "It's on the 4th".

Well, PhilGoetz is claiming (if I am finally understanding him) that casting things in the light of believe/disbelieve loses information. To me--and to you also, it would seem--it gains information. It could be context dependent, but I can't think of a context* in which I would take it to mean something other than a statement about how probable something is, including the examples Phil gave in is post. We can't all be right...

In general I agree with the premise that things can be forced into bad terms by a less-than-helpful question, but I'm not at all convinced that this is a good example. However, I know that when I think to, I use the word "think" instead of "believe" because I think it's clearer, so on some level I must agree that "believe" leaves some sort of ambiguities.

*I'm completely excluding religious usages from consideration and will not mention this caveat again.

Well, PhilGoetz is claiming (if I am finally understanding him) that casting things in the light of believe/disbelieve loses information. To me--and to you also, it would seem--it gains information.

I agree. Compare this with computation of a factorial function. You start with knowing that the function is f(n)=if(n>1) n*f(n-1) else 1. Then you find out that f(1)=1, then that f(2)=2, etc. With each step, you are not taking new data out of environment, you are working from what you already have, simply juggling the numbers, but you gain new information.

For more on this view, see S. Abramsky (2008). `Information, processes and games' (PDF). In P. Adriaans & J. Benthem (eds.), Handbook of the philosophy of information. Elsevier Science Publishers.

I agree. Compare this with computation of a factorial function. You start with knowing that the function is f(n)=if(n>1) n*f(n-1) else 1. Then you find out that f(1)=1, then that f(2)=2, etc. With each step, you are not taking new data out of environment, you are working from what you already have, simply juggling the numbers, but you gain new information.

That's an invalid comparison. That's a mathematical operation that doesn't involve information loss, and hence has nothing to do with this discussion.

The problem is when people decide that they believe / do not believe some proposition P, and then consider only the expected utility of the case where P is true / false.

Reducing a probability to a binary decision clearly loses information. You can't argue with that.

Reducing a probability to a binary decision clearly loses information. You can't argue with that.

No, I can't. But I can argue that no reduction occurs.

To be fair, I see your point in the case of politicians or people who are otherwise indisposed to changing their minds: once they say they believe something there are costs to subsequently saying they don't. That effectively makes it a binary distinction for them.

However, for people not in such situations, if I hear they believe X, that gives me new information about their internal state (namely, that they give X something like 55-85% chance of being the case). This doesn't lose information. I think this comprises most uses of believe/disbelieve.

So I would argue that it's not the believe/disbelieve distinction that is the problem; it's the feedback loop that results from us not letting people change their minds that causes issues to be forced into yes/no terms, combined with the need for politicians/public figures to get their thought to fit into a soundbite. I don't see how using other terms will ameliorate either of those problems.

The problem is when people decide that they believe / do not believe some proposition P, and then consider only the expected utility of the case where P is true / false.

Agree that this is widespread, and is faulty thinking. And my \$.02, which you should feel free to ignore: your main post would be clearer, I think, if you focused more on the math of why this is so: find an example where different actions are appropriate based on the probability, and collapsing the probability into a 1 or 0 forces the choice of an inappropriate action; explain the example thoroughly; and only then name the concept with the labels believe/disbelieve. Hearing them right from the start put me on the wrong trail entirely.

I thought this was a post about language usage, but it's actually a post about how not to do math with probabilities.

Right. I'm not talking about the effect of saying "I believe X" vs. "X".

It probably would have been clearer to use an example.

[-]loqi15y00

Well, PhilGoetz is claiming (if I am finally understanding him) that casting things in the light of believe/disbelieve loses information. [...] We can't all be right...

I'm pretty sure both of us are right in this case. I agree that "casting things in the light of" believe/disbelieve can be unacceptably lossy. I was responding to you claiming it's a "weird thing to do" to infer a 55-85 interval based on common uses of the word "believe". Same word, but context seems to derive different concepts. AFAIK, people simply don't tend to use the word in the 55-85 sense when they're talking about "important" things (e.g., you don't often hear things in the tone of, "I believe global warming is a serious problem, let me get back to you on that").

However, I know that when I think to, I use the word "think" instead of "believe" because I think it's clearer, so on some level I must agree that "believe" leaves some sort of ambiguities.

In common usage, "think" and "believe" seem only to differ by degrees. For me, re-reading my above examples under s/believe/think/ seems to weaken the connoted confidence.

I was responding to you claiming it's a "weird thing to do" to infer a 55-85 interval based on common uses of the word "believe".

I thought I must be weird since I seem to have been the only one that completely didn't understand the post initially. But perhaps I just lack this other usage entirely, or perhaps I still don't agree. (See my response to Phil above: http://lesswrong.com/lw/10a/you_cant_believe_in_bayes/ssc)

In common usage, "think" and "believe" seem only to differ by degrees. For me, re-reading my above examples under s/believe/think/ seems to weaken the connoted confidence.

Agree. I don't like to give the impression that I'm more confident than I am.

Hmm, I think that lavalamp was probably thinking about the title, while I was thinking about the contents. The title is a rhetorical hook. You can believe in Bayes' theorem in the ordinary sense of the word 'believe'.

"If they exist, you don't have to believe in them." - Terry Pratchett, Small Gods

I understand your point: Setting a threshold of probability for saying that one believes P imposes a distinction that probably shouldn't make a difference in one's actions. Therefore, one plausibly shouldn't impose such thresholds at all. However, I don't understand this line

Belief in an outcome N means that you set p(N) = threshold(p(N)), so now p(N) is 0 or 1.

It would certainly be absurd to set thresholds if doing so had this consequence. But why does it?

Hmm, what I said was not quite right. I'll edit it.

Belief isn't merely a recognition / assertion that something is probable. It's a categorical difference from non-belief.

It can be used to emulate Bayesian reasoning, and Bayesian reasoning can be used to emulate it, so the two are ultimately compatible. But not from our point of view.

Where can the rook go that the knight cannot? Yet the two are not the same.

Belief isn't merely a recognition / assertion that something is probable. It's a categorical difference from non-belief.

And that's the problem with it. Basing actions on the outcomes of categorical decisions throws away information.

And that's the problem with it. Basing actions on the outcomes of categorical decisions throws away information.

It throws away data. From a certain perspective, all intelligence does is throws away data that is not useful, to find information that is.

[-]loqi15y00

There's a big difference between summarizing data with a probability distribution, and practically flushing it away by thresholding it. The former is an associative, composable operation: You can keep doing it without really committing to any particular application of the data. The latter hands you a terminal object which is useless in the face of further information.

You don't flush away your data, you are just running a request to it. You are not exterminating your own brain, you are only forming a categorical judgment. Like with tigers.

Sure. As I noted in the original post and also in response to your comments, sometimes it's fine to do this. Sometimes it isn't.

You are throwing away information when you threshold, period, end of that discussion. If that information was important, you made a mistake. It is easy to identify these situations because they have aberrant expected values in low-probability outcomes. And it is easy to avoid them, unlike with tigers.

You are throwing away information when you threshold, period, end of that discussion.

If the 'information' was noise, then it's not information, it's just data. Whether it counts as information depends upon relevance, which is partially what the threshold is for.

Of course, this is a jargon use in information theory. But that seems like the relevant domain.

It's not useless, not by any means.

A threshold can be contradicted by a contrary-to-expectation observation. A statement of probability cannot, as long as it's not an absolute.

If you believe that there's only one-in-a-hundred chance of something happening, and it happens, were you right or wrong?

Neither, you just take a hit according to your scoring rule; but if you're properly calibrated, it'll be compensated for on average by 99 times that your 1-in-a-hundred chances don't happen.

If I repeatedly claim I have a 1-in-2 chance of rolling snake eyes, you'd probably see me take repeated blows by a logarithmic scoring rule, which would suffice for bystanders to put less trust in my probability estimates. Eventually I should admit I'm poorly calibrated.

Of course if you're talking about a probability 1 minus epsilon threshold, a miss on that claim is a huge penalty by a log scoring rule.

The probability value, which is what is being thrown away, is what is needed for a Bayesian analysis. Saying that that data is not useful means asserting that Bayesian analysis is not useful.

Precise Bayesian analysis is often impossible, and impossible can't be useful. A notch down, take the difficulty of analysis as a counterbalance. Only if you can show that in a given situation you can do better by including that additional info, that is justified, not in general.

It often isn't. Bayesianism isn't the be-all, end-all of logic. It's just another tool in the toolbox.