This is a linkpost for https://www.badprior.com/blog/a-visual-explanation-of-bayesian-updating/

a visual explanation of Bayesian updating

3Jan Christian Refsgaard

2Measure

1Jan Christian Refsgaard

2MrGus99

2Jan Christian Refsgaard

2Evenflair

2Jan Christian Refsgaard

New Comment

7 comments, sorted by Click to highlight new comments since: Today at 12:15 PM

I am well aware that nobody asked for this, but here is the proof that the posterior is for the beta-bernoulli model.

We start with Bayes Theorem:

Then we plug in the definition for the Bernoulli likelihood and Beta prior:

Let's collect the powers in the numerator, and things that does not depend on in the denominator

Here comes the conjugation shenanigans. If you squint, the top of the distribution looks like the top of a Beta distribution:

Let's continue the shenanigans, since the numerator looks like the numerator of a beta distribution, we know that it would be a proper beta distribution if we changed the denominator like this:

The order does not matter, you can see that by focusing on which is always equal to , you can also see it from the conjugation rule where you end with no matter the order.

If you wanted the order to matter you could down weight earlier shots or widen the uncertainty between the updates, so previous posterior becomes a slightly wider prior to capture the extra uncertainty from the passage of time.

As a teaser here is the visual version of Bayesian updating:

But in order to understand that figure we need to go through the prior and likelihood!

You find me standing in a basketball court ready to shoot some hoops. What do you believe about my performance before I take a shot?. There are no good Null hypothesis here unless you happen to have a lot of knowledge about the average human basket ball performance!, and even so, why do you care whether I am significant different from the average?, You can fall back to the new statistics which is almost as good as the Bayesian approach, it but does not answer what you should believe before I take a shot.

The Beta distribution is a popular prior for binary events, when the two parameter (α and β) are equal to 1, it is uniform. Since you my dear reader have no concept about my basket skills you assume a θ comes from a Beta(1,1) distribution, formally:

p(θ)∼Beta(1,1)

Where θ is my probability of scoring, the distribution looks like this:

Completely Uniform, a great prior when you are totally oblivious.

I take a shot and miss (z=0), the likelihood of a miss looks like this:(if you are extra currious, you can brush up on the math behind all the binary distributions here)

Notice that:

Notice that these likelihoods and not probabilities, but how likely the data are for different values of θ, so it is twice as likely:

p(z=0∣θ=0)p(z=0∣θ=0.5)=10.5=2

That the data z=0 was generated by θ=0 compared to θ=0.5.

## Bayesian Updating Math

Here is Bayes theorem for the Bernoulli distribution with a Beta prior, where the parameter z is 1 when I score and 0 otherwise:

p(θ|z)=p(z∣θ)p(θ)p(z)

For technical reason p(z), the probability of the data, is difficult to calculate, it is however 'just a normalization constant' because it does not depend on θ which is my scoring probability, thus we can simply drop it and get an unnormalized posterior:

p(θ|z)∝p(z∣θ)p(θ)

An unnormalized posterior is simply a density function that does not sum to 1, when we plot it, it looks 'correct' except we have screwed up the numbers on the y axis.

## Visual Bayesian Updating

So now we have a 'square' prior p(θ)∼Beta(1,1) and we have a triangle likelihood p(z=0∣θ), if we multiply them together we get the unnormalized posterior, so we do:

p(θ|z)∝p(z∣θ)p(θ)

Which intuitively can be taught of as: the square makes everything equally likely, so the likelihood will dominate the posterior, or in dodgy math:

posterior∝square×triangle∝triangle

Here is the Figure:

Try to put your finger on the figure check that θ=0.5 is 1 for the square and 0.5 for the triangle and is thus 1×0.5=0.5 in the unnormalized posterior

I shoot again and score!Now we use the previous posterior as the new prior, but because we score we get an 'opposite triangle' which is the likelihood of p(z=1∣θ)

Again we multiply the prior triangle by the likelihood triangle and get a blob centered on 0.5 as the posterior:

Notice how the posterior is peaked at θ=0.5, this is because the two triangles at the center have an unnormalized posterior density of 0.5×0.5=0.25 where at edges such as θ=0.9 they have 0.9×0.1=0.09

I shoot again and score!So now again the previous blob posterior is our new prior, which we multiply by the 'I scored triangle' resulting in a blob that has a mode above 0.5, which makes sense as I made 2/3 shots:

While this may seem like a cute toy example it's a totally valid way of solving a Bayesian posterior, and is the way all most popular bayesian books (Gelman

^{[1]}, Kruschke^{[2]}and McElreath^{[3]}) introduce the concept!## Bayesian Updating using Conjugation

In the case of the Bernoulli events we can actually solve the posterior easily because the Beta is conjugated to the Bernoulli, conjugation is simply fancy statistics speak for it having a simple mathematical form, and that form is also a Beta distribution, thus you can update the beta distribution using this simple rule:

Beta(α+z,β+1−z)

So we Started with a prior with α=β=1

Beta(1,1)

Then we got a miss, z=0

Beta(1,2)

Then we got a hit, z=1

Beta(2,2)

Then we got a hit, z=1

Beta(3,2)

We can plot the Beta(3,2) posterior

Notice how the this posterior has the exact same shape as the one we got via updating, the only different is the numbers on the y-axis.

(Hi, if you made it this far please comment, if there were something that was not well explained, I care more about my statistics communication skills than my ego, so negative feedback is very welcome)

Gelman, Hill and Vehtari, “Regression and Other Stories” ↩︎

Richard McElreath "Statistical Rethinking" ↩︎

John Kruschke "Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan 2nd Edition" ↩︎