Jan Christian Refsgaard

Data Scientist


Prediction and Calibration - Part 1

you may be disappointed, unless you make 40+ predictions per week it will be hard to compare weekly drift, the Bernoulli distribution has a much higher variance compared to the normal distribution, so the uncertainty estimate of the calibration is correspondingly wide (high uncertainty of data -> high uncertainty of regression parameters). My post 3 will be a hierarchical model which may suite your needs better but it will maybe be a month before I get around to making that model.

If there are many people like you then we may try to make a hackish model that down weights older predictions as they are less predictive of your current calibration than newer predictions, but I will have to think long and hard to make than into a full Bayesian model, so I am making no promises

What do the reported levels of protection offered by various vaccines mean?

It almost means 3. It means the Vaccine Efficacy is 95%

is calculated this way:

where are the number of sick people in the vaccine group and is the number of sick people in the control group

So if 100 got sick in the control group and 5 in the vaccine group then:

So it's a 95% reduction in your probability of getting COVID :)

Note that the number reported is sometimes the mode and sometimes the mean of the distribution, but beta/binomial distributions are skewed so the mean is often lower than the mode. I have written a blogpost where I redo the Pfizer analysis

Prediction and Calibration - Part 1

I have tried to add a paragraph about this, because I think it's a good point, and it's unlikely that you were the only one who got confused about this, Next weekend I will finish part 2 where I make a model that can track calibration independent of prediction, and in that model the 60% 61/100 will have a better posterior of the calibration parameter than then 60% 100/100, though the likelihood of the 100/100 will of course still be highest.

Prediction and Calibration - Part 1

I have gotten 10 votes, the sum of which is 4, all of you guys who disliked the post can you please comment so I know why?

Prediction and Calibration - Part 1

you mean the N'th root of 2 right?, which is what I called the null predictor and divided Scott predictions by in the code:

random_predictor = 0.5 ** len(y)

which is equivalent to where is the total number of predictions

Prediction and Calibration - Part 1

You are absolutely right, any framework that punishes you for being right would be bad, my point is that increasing your calibration helps a surprising amount and is much more achievable than "just git good" which is required for improving prediction.

I will try to put your point into the draft when I am off work , thanks

Prediction and Calibration - Part 1

Thanks, also thanks for pointing out that I had written a few places instead of $p(y\mid\theta), since everything is the bernoulli distribution I have changed everything to

Prediction and Calibration - Part 1

I have not been consistent with my probability notation, I sometimes use upper case P and sometimes lower case p, in future posts I will try to use the same notation as Andrew Gelman, which is for things that are probabilities (numbers) such as and for distributions such as . However since this is my first post, I am afraid that 'editing it' will waist the moderators time as they will have to read it again to check for trolling, what is the proper course of action?