An example demonstrating how to deduce Bayes' Theorem

(This post is not an attempt to convey anything new, but is instead just an attempt to provide background context on how Bayes' theorem works by describing how it can be deduced. This is not meant to be a formal proof. There have been other elementary posts that have covered how to use Bayes’ theorem: here, here, here and here)

Consider the following example

Imagine that your friend has a bowl that contains cookies in two varieties: chocolate chip and white chip macadamia nut. You think to yourself: “Yum. I would really like a chocolate chip cookie”. So you reach for one, but before you can pull one out your friend lets you know that you can only pick one, that you cannot look into the bowl and that all the cookies are either fresh or stale. Your friend also tells you that there are 80 fresh cookies, 40 chocolate chip cookies, 15 stale white chip macadamia nut cookies and 100 cookies in total. What is the probability that you will pull out a fresh chocolate chip cookie?

To figure this out we will create a truth table. If we fill in the values that we do know, then we will end up with the below table. I have highlighted in yellow the cell that we want to find the value of.

	Chocolate Chip	White Chip Macadamia Nut	Total
Fresh			80
Stale		15
Total	40		100

If we look at the above table we can notice that, like in Sudoku, there are some values that we can fill in based on the information that we already know. These values are coloured in grey and they are:

The number of stale cookies. We know that 80 cookies are fresh and that there are 100 cookies in total, so this means that there must be 20 stale cookies.
The number of white chip macadamia nut cookies. We know that there are 40 chocolate chip cookies and 100 cookies in total, so this means that there must be 60 white chip macadamia nut cookies

If we fill in both these values we end up with the below table:

	Chocolate Chip	White Chip Macadamia Nut	Total
Fresh			80
Stale		15	20
Total	40	60	100

If we look at the table now, we can see that there are two more values that can be filled in. These values are coloured in grey and they are:

The number of fresh white chip macadamia nut cookies. We know that there are 60 white chip macadamia nut cookies and that 15 of these are stale, so this means that there must be 45 fresh white chip macadamia nut cookies.
The number of stale chocolate chip cookies. We know that there are 20 stale cookies and that 15 of these are white chip macadamia nut, so this means that there must be 5 stale chocolate chip cookies.

If we fill in both these values we end up with the below table:

	Chocolate Chip	White Chip Macadamia Nut	Total
Fresh		45	80
Stale	5	15	20
Total	40	60	100

We can now find out the number of fresh chocolate chip cookies. It is important to note that there are two ways in which we can do this. These two ways are called the inverse of each other (this will be used later):

Using the filled in row values. We know that there are 80 fresh cookies and that 45 of these are white chip macadamia nut, so this means that there must be 35 fresh chocolate chip cookies.
Using the filled in column values. We know that there are 40 chocolate chip cookies and the 5 of these are stale, so this means that there must be 35 fresh chocolate chip cookies.

If we fill in the last value in the table we end up with the below table:

	Chocolate Chip	White Chip Macadamia Nut	Total
Fresh	35	45	80
Stale	5	15	20
Total	40	60	100

We can now find out the probability of choosing a fresh chocolate chip cookie by dividing the number of fresh chocolate chip cookies (35) by the total number of cookies (100). This is 35 / 100 which is 35%. We now have the probability of choosing a fresh chocolate chip cookie (35%).

To get to the Bayes' theorem I will need to reduce the terms to a simpler form.

P(A) = probability of finding some observation A. You can think of this as the probability of the picked cookie being chocolate chip.
P(B) = the probability of finding some observation B. You can think of this as the probability of the picked cookie being fresh. Please note that A is what we want to find given B. If it was desired, then A could be fresh and B chocolate chip.
P(~A) = negated version of finding some observation A. You can think of this as the probability of the picked cookie not being chocolate i.e. being a white chip macadamia nut instead.
P(~B) = a negated version of finding some observation B. You can think of this as the probability of the picked cookie not being fresh i.e. being stale instead.
P(A∩B) = probability of being both A and B. You can think of this as the probability of the picked cookie being fresh and chocolate chip.

Now, we will start getting a bit more complicated as we start moving into the basis of the Bayes’ Theorem. Let’s go through another example based on the original.

Let’s assume that before you pull out a cookie you notice that it is fresh. Can you then figure out the likelihood of it being chocolate chip before you pull it out? The answer is yes.

We will find this out using the table that we filled in previously. The important row is underlined.

	Chocolate Chip	White Chip Macadamia Nut	Total
Fresh	35	45	80
Stale	5	15	20
Total	40	60	100

Since we already know that the cookie is fresh, we can say that the likelihood of it being a chocolate chip cookie is equal to the number of fresh chocolate chip cookies (35) divided by the total number of fresh cookies (80). This is 35 / 80 which is 43.75%.

In a simpler form this is:

P(A|B) - The probability of A given B. You can think of this as the probability of the picked cookie being chocolate chip if you already know that it is fresh.

If we relook at the table we can see that there is some extra important information that we can find out about P(A|B). We can discover that it is equal to P(A∩B) / P(B) You can think of this as the probability of the picked cookie being chocolate chip if you know that it is fresh (35 / 80) is equal to the probability of the picked cookie being fresh and chocolate chip (35 / 100) divided by the probability of it being fresh (80 / 100). This is P(A|B) = (35 / 100) / (80 / 100) which becomes 0.35 / 0.8 which is the same as the answer we found out above (43.75%). Take note of the fact that P(A|B) = P(A∩B) / P(B) as we will use this later.

Let’s now return to the inverse idea that was raised previously. If we want to know the probability of the picked cookie being fresh and chocolate chip, i.e. P(A∩B), then we can use the underlined parts of the filled in truth table.

	Chocolate Chip	White Chip Macadamia Nut	Total
Fresh	35	45	80
Stale	5	15	20
Total	40	60	100

If we know that the cookie is known to be fresh like in the top row above, then we can find out that: P(A∩B) = P(A|B) * P(B). This means that the probability of the picked cookie being fresh and chocolate chip (35 / 100) (remember that there were 100 cookies in total) is equal to the probability of it being chocolate chip given that you know that it is fresh (35 / 80) times the probability of it being fresh (80 / 100) . So, we end up with P(A∩B) = (35 / 80) * (80 / 100) which becomes 35% which is the same as 35 / 100 which we know is the right answer.

Alternatively, since we know that we can convert P(A|B) to P(A∩B) / P(B) (we found this out previously) we can also find out that:P(A∩B) = P(A|B) * P(B). We can do this by using the following method:

Assume P(A∩B) = P(A|B) * P(B)
Convert P(A|B) to P(A∩B) / P(B) so we get P(A∩B) = (P(A∩B) * P(B)) / P(B).
Notice that P(B) is on both the top and bottom of the equation, which means that it can be crossed out
Cross out P(B) to give you P(A∩B) = P(A∩B)

The inverse situation is when you know that the cookie is chocolate chip like in the left column in the table above. Using the left column we can find out that: P(A∩B) = P (B|A) * P(A). This means that the probability of the picked cookie being fresh and chocolate chip (35 / 100) is equal to the probability of it being fresh given that you know it is chocolate chip (35 / 40) times the probability of it being chocolate chip (40 / 100). This is: P(A∩B) = (35 / 40) * (40 / 100). This becomes 35% which we know is the right answer.

Now, we have enough information to deduce the simple form of Bayes’ Theorem.

Let’s first recount what we know:

P(A|B) = P(A∩B) / P(B)
P(A∩B) = P(B|A) * P(A)

By taking the first fact: P(A|B) = P(A∩B) / P(B) and using the second fact to convert P(A∩B) to P(B|A) * P(A) you end up with P(A|B) = (P(B|A) * P(A)) / P(B) which is Bayes' Theorem in its simple form.

From the simple form of Bayes' Theorem, there is one more conversion that we need to make to derive the explicit form of Bayes' Theorem which is the one we are trying to deduce.

To get to the explicit form version we need to first find out that P(B) = P(A) * P(B|A) + P(~A) * P(B|~A).

To do this let’s refer to the table again:

	Chocolate Chip	White Chip Macadamia Nut	Total
Fresh	35	45	80
Stale	5	15	20
Total	40	60	100

We can see that the probability that the picked cookie is fresh (80 / 100) is equal to the probability that it is fresh and chocolate chip (35 / 100) plus the probability that it is fresh and white chip macadamia nut (45 / 100). So, we can find out that the probability of P(B) (cookie is fresh) is equal to 35 / 100 + 45 / 100 which is 0.8 or 80% which we know is the answer. This gives the formula:P(B) = P(A∩B) + P(~A∩B)

We know that P(A∩B) = P(B|A) * P(A) as we found this out earlier. Similarly we can find out that P(~A∩B) = P(~A) * P(B|~A). This means that the probability of the picked cookie being fresh and white chip macadamia nut (45 / 100) is equal to the probability of it being white chip macadamia nut (60 / 100) times the probability of it being fresh cookie given that you know that it is white chip macadamia nut (45 / 60). This is: (60 / 100) * (45 / 60) which is 45% which we know is the answer.

Using this information, we can now get to the explicit form of Bayes' Theorem:

We know the simple form of Bayes' Theorem: P(A|B) = (P(B|A) * P(A)) / P(B)
We can convert P(B) to P(A∩B) + P(~A∩B) to get P(A|B) = (P(B|A) * P(A)) / (P(A∩B) + P(~A∩B))
We can convert P(A∩B) to P(A) * P(B|A) to get P(A|B) = (P(B|A) * P(A)) / (P(A) * P(B|A) + P(~A∩B))
We can convert P(~A∩B) to P(~A) * P(B|~A) to get P(A|B) = (P(B|A) * P(A)) / (P(A) * P(B|A) + P(~A) * P(B|~A))

Congratulations we have now reached the explicit form of Bayes' Theorem:

LESSWRONG
LW

LESSWRONG
LW

8

An example demonstrating how to deduce Bayes' Theorem

8

8