This is great. I hope other people aren't hesitating to make posts because they are too "elementary". Content on Less Wrong doesn't need to be advanced; it just needs to be Not Wrong.
In my defense, we've had other elementary posts before, and they've been found useful; plus, I'd really like this to be online somewhere, and it might as well be here.
It's quite interesting that people feel a need to defend themselves in advance when they think their post is elementary, but almost never feel the same obligation when the post is supposedly too hard, or off-topic, or inappropriate for other reason. More interesting given that we all have probably read about the illusion of transparency. Still, seems that inclusion of this sort of signalling is irresistible, although (as the author's own defense has stated) the experience tells us that such posts usually meet positive reception.
As for my part of signalling, this comment was not meant as a criticism. However I find it more useful if people defend themselves only after they are criticised or otherwise attacked.
A piece of advice I heard a long time ago, and which has sometimes greatly alleviated the boredom of being stuck in a conference session, is this: If you're not interested in what the lecturer is talking about, study the lecture as a demonstration of how to give a lecture.
By this method even an expert can learn from a skilful exposition of fundamentals.
I don't have a very advanced grounding in math, and I've been skipping over the technical aspects of the probability discussions on this blog. I've been reading lesswrong by mentally substituting "smart" for "Bayesian", "changing one's mind" for "updating", and having to vaguely trust and believe instead of rationally understanding.
Now I absolutely get it. I've got the key to the sequences. Thank you very very much!
Sigh. Of course I upvoted this, but...
The first part, the abstract part, was a joy to read. But the Monty Hall part started getting weaker, and the Two Aces part I didn't bother reading at all. What I'd have done differently if your awesome idea for a post came to me first: remove the jarring false tangent in Monty Hall, make all diagrams identical in style to the ones in the first part (colors, shapes, fonts, lack of borders), never mix percentages and fractions in the same diagram, use cancer screening as your first motivating example, Monty Hall as the second example, Two Aces as an exercise for the readers - it's essentially a variant of Monty Hall.
Also, indicate more clearly in the Monty Hall problem statement that whenever the host can open two doors, it chooses each of them with probability 50%, rather than (say) always opening the lower-numbered one. Without this assumption the answer could be different.
Sorry for the criticisms. It's just my envy and frustration talking. Your post had the potential to be so completely awesome, way better than Eliezer's explanation, but the tiny details broke it.
this does seem like the type of article that should be a community effort.. perhaps a wiki entry?
This is a fantastic explanation (which I like better than the 'simple' explanation retired urologist links to below), and I'll tell you why.
You've transformed the theorem into a spatial representation, which is always great - since I rarely use Bayes Theorem I have to essentially 'reconstruct' how to apply it every time I want to think about it, and I can do that much easier (and with many fewer steps) with a picture like this than with an example like breast cancer (which is what I would do previously).
Critically, you've represented the WHOLE problem visually - all I have to do is picture it in my head and I can 'read' directly off of it, I don't have to think about any other concepts or remember what certain symbols mean. Another plus, you've included the actual numbers used for maximum transparency into what transformations are actually taking place. It's a very well done series of diagrams.
If I had one (minor) quibble, it would be that you should represent the probabilities for various hypotheses occuring visually as well - perhaps using line weights, or split lines like in this diagram.
But very well done, thank you.
(edit: I'd also agree with cousin_it that the first half of the post is the stronger part. The diagrams are what make this so great, so stick with them!)
I don't get it really. I mean, I get the method, but not the formula. Is this useful for anything though?
Also, a simpler method of explaining the Monty Hall problem is to think of it if there were more doors. Lets say there were a million (thats alot ["a lot" grammar nazis] of goats.) You pick one and the host elliminates every other door except one. The probability you picked the right door is one in a million, but he had to make sure that the door he left unopened was the one that had the car in it, unless you picked the one with a car in it, which is a one in a million chance.
Wonderful. Are you aware of the Tuesday Boy problem? I think it could have been a more impressive second example.
"I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"
(The intended interpretation is that I have two children, and at least one of them is a boy-born-on-a-Tuesday.)
I found it here: Magic numbers: A meeting of mathemagical tricksters
I always much prefer these stated as questions - you stop someone and say "Do you have exactly two children? Is at least one of them a boy born on a Tuesday?" and they say "yes". Otherwise you get into wondering what the probability they'd say such a strange thing given various family setups might be, which isn't precisely defined enough...
When I tried to explain Bayes' to some fellow software engineers at work, I came up with http://moshez.wordpress.com/2011/02/06/bayes-theorem-for-programmers/
Thank you Komponisto,
I have read many explanations of Bayesian theory, and like you, if I concentrated hard enough I could follow the reasoning , but I could never reason it out for myself. Now I can. Your explanation was perfect for me. It not only enabled me to "grok" the Monty Hall problem, but Bayesian calculations in general, while being able to retain the theory.
Thank you again, Ben
Thank you very much for this. Until you put it this way, I could not grasp the Monty Hall problem; I persisted in believing that there would be a 50/50 chance once the door was opened. Thank you for changing my mind.
The Monty Hall problem seems like it can be simplified: Once you've picked the door, you can switch to instead selecting the two alternate doors. You know that one of the alternate doors contains a goat (since there's only one car), which is equivalent to having Monty open one of those two doors.
The trick is simply in assuming that Monty is actually introducing any new information.
Not sure if it's helpful to anyone else, but it just sort of clicked reading it this time :)
This is really brilliant. Thanks for making it all seem so easy: for some reason I never saw the connection between an update and rescaling like that, but now it seems obvious.
I'd like to see this kind of diagram for the Sleeping Beauty problem.
I tend to think out Monty Hall like this: The probability you have chosen the door hiding the car is 1/3. Once one of the other two doors is shown to hide a goat, the probablity of the third door hiding the car must be 2/3. Therefore you double your chances to win the car by switching.
Wow, that was great! I already had a fairly good understanding of the Theorem, but this helped cement it further and helped me compute a bit faster.
It also gave me a good dose of learning-tingles, for which I thank you.
In figure 10 above, should the second blob have a value of 72.9% ? I noticed that the total of all the percents are only adding up to 97% with the current values. I calculated as follows: New 100% value: 10% + 35% + 3% = 48% H1 : 10% / 48% = 20.833% H2: 35% / 48% = 72.9166% H3: 3% / 48% = 6.25% Total: 99.99%
Also, I found this easy to understand visually. Thanks.
A presentation critique: psychologically, we tend to compare the relative areas of shapes. Your ovals in Figure 1 are scaled so that their linear dimensions (width, for example) are in the ratio 2:5:3; however, what we see are ovals whose areas are in ratio 4:25:9, which isn't what you're trying to convey. I think this happens for later shapes as well, although I didn't check them all.
Great visualizations.
In fact, this (only without triangles, squares,...) is how I've been intuitively calculating Bayesian probabilities in "everyday" life problems since I was young. But you managed to make it even clearer for me. Good to see it applied to Monty Hall.
Thank you Komponisto! Apparently, my brain works similar to yours on this matter. Here is a video by Richard Carrier explaining Bayes' theorem that I also found helpful.
Perhaps a better title would be "Bayes' Theorem Illustrated (My Ways)"
In the first example you use shapes with colors of various sizes to illustrate the ideas visually. In the second example, you using plain rectangles of approximately the same size. If I was a visual learner, I don't know if your post would help me much.
I think you're on the right track in example one. You might want to use shapes that are easier to estimate the relative areas. It's hard to tell if one triangle is twice as big as another (as measured by area), but it's easie...
I find that Monty Hall is easier to understand with N doors, N > 2.
N doors, one hides a car. You pick a door at random yielding 1/N probability of getting car. Host now opens N-2 doors which are not your door, all containing goats. The probability the other door left has a car is not (N-1)/N.
Set N to 1000 and people generally agree that switching is good. Set N to 3 and they disagree.
The challenge with Bayes' illustrations is to simultaneously.show 1) relations 2) ratios. The suggested approach works well. I am suggesting to combine Venn diagrams and Pie charts:
http://oracleaide.wordpress.com/2012/12/26/a-venn-pie/
Happy New Year!
This problem is not so difficult to solve if we use a binomial tree to try to tackle it. Not only we will come to the same (correct) mathematical answer (which is brilliantly exposed in the first post in this thread) but logically is more palatable.
I will exposed the logically, semantically derived answer straight away and then I will jump in to the binomial tree for the proof-out.
The probability of the situation exposed here, which for the sake of being brief I’m going to put it as “contestant choose a door, a goat is revealed in a different door, then co...
Thanks for posting this. Your explanations are fascinating and helpful. That said, I do have one quibble. I was misled by the Two Aces problem because I didn't know that the two unknown cards (2C and 2D) were precluded from also being aces or aces of spaces. It might be better to edit the post to make that clear.
While on topic, GREAT demo of conditional probability.
http://www.cut-the-knot.org/Curriculum/Probability/ConditionalProbability.shtml
Why would you need more than plain English to intuitively grasp Monty-Hall-type problems?
Take the original Monty Hall 'Dilemma'. Just imagine there are two candidates, A and B. A and B both choose the same door. After the moderator picked one door A always stays with his first choice, B always changes his choice to the remaining third door. Now imagine you run this experiment 999 times. What will happen? Because A always stays with his initial choice, he will win 333 cars. But where are the remaining 666 cars? Of course B won them!
Or conduct the experiment...
I can testify that this isn't anywhere near as obvious to most people than it is to you. I, for one, had to have other people explain it to me the first time I ran into the problem, and even then it took a small while.
I have a small issue with the way you presented the Monty python problem. In my opinion, the setup could be a little clearer. The Bayesian model you presented holds true iff you make an assumption about the door you picked; either goat (better) or car (less wrong). If you pick a door at random with no presuppositions (I believe this is the state most people are in), then you have no basis to decide to switch or not switch, and have a truly 50% chance either way. If instead you introduce the assumption of goat, when the host opens up the other goat, you know you had a 2/3 chance to pick a goat. With both goats known or presumed, the last door must be the car with an error rate of 1/3.
(This post is elementary: it introduces a simple method of visualizing Bayesian calculations. In my defense, we've had other elementary posts before, and they've been found useful; plus, I'd really like this to be online somewhere, and it might as well be here.)
I'll admit, those Monty-Hall-type problems invariably trip me up. Or at least, they do if I'm not thinking very carefully -- doing quite a bit more work than other people seem to have to do.
What's more, people's explanations of how to get the right answer have almost never been satisfactory to me. If I concentrate hard enough, I can usually follow the reasoning, sort of; but I never quite "see it", and nor do I feel equipped to solve similar problems in the future: it's as if the solutions seem to work only in retrospect.
Minds work differently, illusion of transparency, and all that.
Fortunately, I eventually managed to identify the source of the problem, and I came up a way of thinking about -- visualizing -- such problems that suits my own intuition. Maybe there are others out there like me; this post is for them.
I've mentioned before that I like to think in very abstract terms. What this means in practice is that, if there's some simple, general, elegant point to be made, tell it to me right away. Don't start with some messy concrete example and attempt to "work upward", in the hope that difficult-to-grasp abstract concepts will be made more palatable by relating them to "real life". If you do that, I'm liable to get stuck in the trees and not see the forest. Chances are, I won't have much trouble understanding the abstract concepts; "real life", on the other hand...
...well, let's just say I prefer to start at the top and work downward, as a general rule. Tell me how the trees relate to the forest, rather than the other way around.
Many people have found Eliezer's Intuitive Explanation of Bayesian Reasoning to be an excellent introduction to Bayes' theorem, and so I don't usually hesitate to recommend it to others. But for me personally, if I didn't know Bayes' theorem and you were trying to explain it to me, pretty much the worst thing you could do would be to start with some detailed scenario involving breast-cancer screenings. (And not just because it tarnishes beautiful mathematics with images of sickness and death, either!)
So what's the right way to explain Bayes' theorem to me?
Like this:
We've got a bunch of hypotheses (states the world could be in) and we're trying to figure out which of them is true (that is, which state the world is actually in). As a concession to concreteness (and for ease of drawing the pictures), let's say we've got three (mutually exclusive and exhaustive) hypotheses -- possible world-states -- which we'll call H1, H2, and H3. We'll represent these as blobs in space:
Figure 0
Now, we have some prior notion of how probable each of these hypotheses is -- that is, each has some prior probability. If we don't know anything at all that would make one of them more probable than another, they would each have probability 1/3. To illustrate a more typical situation, however, let's assume we have more information than that. Specifically, let's suppose our prior probability distribution is as follows: P(H1) = 30%, P(H2)=50%, P(H3) = 20%. We'll represent this by resizing our blobs accordingly:
Figure 1
That's our prior knowledge. Next, we're going to collect some evidence and update our prior probability distribution to produce a posterior probability distribution. Specifically, we're going to run a test. The test we're going to run has three possible outcomes: Result A, Result B, and Result C. Now, since this test happens to have three possible results, it would be really nice if the test just flat-out told us which world we were living in -- that is, if (say) Result A meant that H1 was true, Result B meant that H2 was true, and Result 3 meant that H3 was true. Unfortunately, the real world is messy and complex, and things aren't that simple. Instead, we'll suppose that each result can occur under each hypothesis, but that the different hypotheses have different effects on how likely each result is to occur. We'll assume for instance that if Hypothesis H1 is true, we have a 1/2 chance of obtaining Result A, a 1/3 chance of obtaining Result B, and a 1/6 chance of obtaining Result C; which we'll write like this:
P(A|H1) = 50%, P(B|H1) = 33.33...%, P(C|H1) = 16.166...%
and illustrate like this:
Figure 2
(Result A being represented by a triangle, Result B by a square, and Result C by a pentagon.)
If Hypothesis H2 is true, we'll assume there's a 10% chance of Result A, a 70% chance of Result B, and a 20% chance of Result C:
Figure 3
(P(A|H2) = 10% , P(B|H2) = 70%, P(C|H2) = 20%)
Finally, we'll say that if Hypothesis H3 is true, there's a 5% chance of Result A, a 15% chance of Result B, and an 80% chance of Result C:
Figure 4
(P(A|H3) = 5%, P(B|H3) = 15% P(C|H3) = 80%)
Figure 5 below thus shows our knowledge prior to running the test:
Figure 5
Note that we have now carved up our hypothesis-space more finely; our possible world-states are now things like "Hypothesis H1 is true and Result A occurred", "Hypothesis H1 is true and Result B occurred", etc., as opposed to merely "Hypothesis H1 is true", etc. The numbers above the slanted line segments -- the likelihoods of the test results, assuming the particular hypothesis -- represent what proportion of the total probability mass assigned to the hypothesis Hn is assigned to the conjunction of Hypothesis Hn and Result X; thus, since P(H1) = 30%, and P(A|H1) = 50%, P(H1 & A) is therefore 50% of 30%, or, in other words, 15%.
(That's really all Bayes' theorem is, right there, but -- shh! -- don't tell anyone yet!)
Now, then, suppose we run the test, and we get...Result A.
What do we do? We cut off all the other branches:
Figure 6
So our updated probability distribution now looks like this:
Figure 7
...except for one thing: probabilities are supposed to add up to 100%, not 21%. Well, since we've conditioned on Result A, that means that the 21% probability mass assigned to Result A is now the entirety of our probability mass -- 21% is the new 100%, you might say. So we simply adjust the numbers in such a way that they add up to 100% and the proportions are the same:
Figure 8
There! We've just performed a Bayesian update. And that's what it looks like.
If, instead of Result A, we had gotten Result B,
Figure 9
then our updated probability distribution would have looked like this:
Figure 10
Similarly, for Result C:
Figure 11
Bayes' theorem is the formula that calculates these updated probabilities. Using H to stand for a hypothesis (such as H1, H2 or H3), and E a piece of evidence (such as Result A, Result B, or Result C), it says:
P(H|E) = P(H)*P(E|H)/P(E)
In words: to calculate the updated probability P(H|E), take the portion of the prior probability of H that is allocated to E (i.e. the quantity P(H)*P(E|H)), and calculate what fraction this is of the total prior probability of E (i.e. divide it by P(E)).
What I like about this way of visualizing Bayes' theorem is that it makes the importance of prior probabilities -- in particular, the difference between P(H|E) and P(E|H) -- visually obvious. Thus, in the above example, we easily see that even though P(C|H3) is high (80%), P(H3|C) is much less high (around 51%) -- and once you have assimilated this visualization method, it should be easy to see that even more extreme examples (e.g. with P(E|H) huge and P(H|E) tiny) could be constructed.
Now let's use this to examine two tricky probability puzzles, the infamous Monty Hall Problem and Eliezer's Drawing Two Aces, and see how it illustrates the correct answers, as well as how one might go wrong.
The Monty Hall Problem
The situation is this: you're a contestant on a game show seeking to win a car. Before you are three doors, one of which contains a car, and the other two of which contain goats. You will make an initial "guess" at which door contains the car -- that is, you will select one of the doors, without opening it. At that point, the host will open a goat-containing door from among the two that you did not select. You will then have to decide whether to stick with your original guess and open the door that you originally selected, or switch your guess to the remaining unopened door. The question is whether it is to your advantage to switch -- that is, whether the car is more likely to be behind the remaining unopened door than behind the door you originally guessed.
(If you haven't thought about this problem before, you may want to try to figure it out before continuing...)
The answer is that it is to your advantage to switch -- that, in fact, switching doubles the probability of winning the car.
People often find this counterintuitive when they first encounter it -- where "people" includes the author of this post. There are two possible doors that could contain the car; why should one of them be more likely to contain it than the other?
As it turns out, while constructing the diagrams for this post, I "rediscovered" the error that led me to incorrectly conclude that there is a 1/2 chance the car is behind the originally-guessed door and a 1/2 chance it is behind the remaining door the host didn't open. I'll present that error first, and then show how to correct it. Here, then, is the wrong solution:
We start out with a perfectly correct diagram showing the prior probabilities:
Figure 12
The possible hypotheses are Car in Door 1, Car in Door 2, and Car in Door 3; before the game starts, there is no reason to believe any of the three doors is more likely than the others to contain the car, and so each of these hypotheses has prior probability 1/3.
The game begins with our selection of a door. That itself isn't evidence about where the car is, of course -- we're assuming we have no particular information about that, other than that it's behind one of the doors (that's the whole point of the game!). Once we've done that, however, we will then have the opportunity to "run a test" to gain some "experimental data": the host will perform his task of opening a door that is guaranteed to contain a goat. We'll represent the result Host Opens Door 1 by a triangle, the result Host Opens Door 2 by a square, and the result Host Opens Door 3 by a pentagon -- thus carving up our hypothesis space more finely into possibilities such as "Car in Door 1 and Host Opens Door 2" , "Car in Door 1 and Host Opens Door 3", etc:
Figure 13
Before we've made our initial selection of a door, the host is equally likely to open either of the goat-containing doors. Thus, at the beginning of the game, the probability of each hypothesis of the form "Car in Door X and Host Opens Door Y" has a probability of 1/6, as shown. So far, so good; everything is still perfectly correct.
Now we select a door; say we choose Door 2. The host then opens either Door 1 or Door 3, to reveal a goat. Let's suppose he opens Door 1; our diagram now looks like this:
Figure 14
But this shows equal probabilities of the car being behind Door 2 and Door 3!
Figure 15
Did you catch the mistake?
Here's the correct version:
As soon as we selected Door 2, our diagram should have looked like this:
Figure 16
With Door 2 selected, the host no longer has the option of opening Door 2; if the car is in Door 1, he must open Door 3, and if the car is in Door 3, he must open Door 1. We thus see that if the car is behind Door 3, the host is twice as likely to open Door 1 (namely, 100%) as he is if the car is behind Door 2 (50%); his opening of Door 1 thus constitutes some evidence in favor of the hypothesis that the car is behind Door 3. So, when the host opens Door 1, our picture looks as follows:
Figure 17
which yields the correct updated probability distribution:
Figure 18
Drawing Two Aces
Here is the statement of the problem, from Eliezer's post:
(Once again, you may want to think about it, if you haven't already, before continuing...)
Here's how our picture method answers the question:
Since the person holding the cards has at least one ace, the "hypotheses" (possible card combinations) are the five shown below:
Figure 19
Each has a prior probability of 1/5, since there's no reason to suppose any of them is more likely than any other.
The "test" that will be run is selecting an ace at random from the person's hand, and seeing if it is the ace of spades. The possible results are:
Figure 20
Now we run the test, and get the answer "YES"; this puts us in the following situation:
Figure 21
The total prior probability of this situation (the YES answer) is (1/6)+(1/3)+(1/3) = 5/6; thus, since 1/6 is 1/5 of 5/6 (that is, (1/6)/(5/6) = 1/5), our updated probability is 1/5 -- which happens to be the same as the prior probability. (I won't bother displaying the final post-update picture here.)
What this means is that the test we ran did not provide any additional information about whether the person has both aces beyond simply knowing that they have at least one ace; we might in fact say that the result of the test is screened off by the answer to the first question ("Do you have an ace?").
On the other hand, if we had simply asked "Do you have the ace of spades?", the diagram would have looked like this:
Figure 22
which, upon receiving the answer YES, would have become:
Figure 23
The total probability mass allocated to YES is 3/5, and, within that, the specific situation of interest has probability 1/5; hence the updated probability would be 1/3.
So a YES answer in this experiment, unlike the other, would provide evidence that the hand contains both aces; for if the hand contains both aces, the probability of a YES answer is 100% -- twice as large as it is in the contrary case (50%), giving a likelihood ratio of 2:1. By contrast, in the other experiment, the probability of a YES answer is only 50% even in the case where the hand contains both aces.
This is what people who try to explain the difference by uttering the opaque phrase "a random selection was involved!" are actually talking about: the difference between
Figure 24
and
.
Figure 25
The method explained here is far from the only way of visualizing Bayesian updates, but I feel that it is among the most intuitive.
(I'd like to thank my sister, Vive-ut-Vivas, for help with some of the diagrams in this post.)