EDIT: I originally said you can do this for multiple choice questions, which is wrong. It only works for questions with two answers.

(In a comment, to keep top level post short.)

One cute way to do calibration for probabilities, is to construst a spinner. If you have a true/false question, you can construct a spinner which is divided up according to your probability that each answer is the correct answer.

If you were to then spin the spinner once, and win if it comes up on the correct answer, this would not incentize constructing the spinner to represent you... (read more)

Showing 3 of 15 replies (Click to show all)
11Neil Fitzgerald2yI think an algorithm for N outcomes is: spin twice, gain 1 every time you get the answer right but lose 1 if both guesses are the same. One can "see intuitively" why it works: when we increase the spinner-probability of outcome i by a small delta (imagining that all other probabilities stay fixed, and not worrying about the fact that our sum of probabilities is now 1 + delta) then the spinner-probability of getting the same outcome twice goes up by 2 x delta x p[i]. However, on each spin we get the right answer delta x q[i] more of the time, where q[i] is the true probability of outcome i. Since we're spinning twice we get the right answer 2 x delta x q[i] more often. These cancel out if and only if p[i] = q[i]. [Obviously some work would need to be done to turn that into a proof...]
5gjm2yJust to be clear: if you spin twice and both come up right, you're gaining 2 and then losing 1? (I.e., this is equivalent to what you wrote in an earlier version of the comment?)

A Proper Scoring Rule for Confidence Intervals

by Scott Garrabrant 1 min read13th Feb 201846 comments


You probably already know that you can incentivise honest reporting of probabilities using a proper scoring rule like log score, but did you know that you can also incentivize honest reporting of confidence intervals?

To incentize reporting of a confidence interval, take the score , where is the size of your confidence interval, and is the distance between the true value and the interval. is whenever the true value is in the interval.

This incentivizes not only giving an interval that has the true value of the time, but also distributes the remaining 10% equally between overestimates and underestimates.

To keep the lower bound of the interval important, I recommend measuring and in log space. So if the true value is and the interval is , then is and is for underestimates and for overestimates. Of course, you need questions with positive answers to do this.

To do a confidence interval, take the score .

This can be used to make training calibration, using something like Wits and Wagers cards more fun. I also think it could be turned into app, if one could get a large list of questions with numerical values.