You probably already know that you can incentivise honest reporting of probabilities using a proper scoring rule like log score, but did you know that you can also incentivize honest reporting of confidence intervals?
To incentize reporting of a confidence interval, take the score , where is the size of your confidence interval, and is the distance between the true value and the interval. is whenever the true value is in the interval.
This incentivizes not only giving an interval that has the true value of the time, but also distributes the remaining 10% equally between overestimates and underestimates.
To keep the lower bound of the interval important, I recommend measuring and in log space. So if the true value is and the interval is , then is and is for underestimates and for overestimates. Of course, you need questions with positive answers to do this.
To do a confidence interval, take the score .
This can be used to make training calibration, using something like Wits and Wagers cards more fun. I also think it could be turned into app, if one could get a large list of questions with numerical values.
I need help figuring out how to use this scoring rule. Please consider the following application.
How much does it cost to mail a letter under 30g in Canada?1
I remember when I was a child buying 45c stamps, so it's likely to be larger than that. It's been over a decade or so, and assuming a 2% rise in cost per year, then we should be around 45∗(1.02)10∼60c per stamp. However, we also had big budget cuts to our postal service that even I learned about despite not reading the news. Let's say that Canada Post increased their prices by 25% to accomodate some shortfall. My estimate is that stamps cost 75c.
What should be my confidence interval? Would I be surprised if a stamp cost a dollar? Not really, but it feels like an upper bound. Would I be surprised if a stamp cost less than 50c? Yes. 60c? Yes. 70c? Hmmm.... Assume that I'm well calibrated, so I'm reporting 90% confidence for an interval of stamps costing 70c to 100c.
Answer: Stamps in booklets cost 85c each, individual stamps are 100c each. Because I would always buy stamps in booklets, I will use the 85c figure.
S is the size of my confidence interval, S=100−70=30 . D is the distance between the true value and the interval, but is 0 in this case because the true value is in the interval.
Score=−S−20⋅D=−30
I'm not really sure what to do with this number, so let's move to the next paragraph of the post.
The true value is T=85 and the interval is (L,U)=(70,100). Because the true value is contained in the interval, D=0.
S=log(UL)=log(10070)=0.15
Score=−S−20⋅D=−0.15
How does this incentivise honest reporting of confidence intervals?
Let's say that, when I intuited my confidence interval above that I was perturbed that it wasn't symmetric about my estimate of 75c, so I set it to (L,U)=(50,100) for aesthetic reasons. In this case, my score would be Score=−0.30 Which is worse than my previous score by a factor of 2.
Let's say that, when I remembered the price of stamps in my childhood, I was way off and remembered 14c stamps. Then I would believe that stamps should cost around 22c now. (Here I have the feeling of "nothing costs less than a quarter!", so I would probably reject this estimate.)That would likely anchor me, so that I would set a high confidence on the price being within (L,U)=(20,24)
S=0.08, D=log(LT)=log(2085)=−0.63
Score=−S−20⋅D=12.52
Am I trying to maximize this score?
1I looked up the answer, and the lowest cost standard delivery is for letters under 30g.
Thanks for this reply. The technique of asking what each term of your equation represents is one I have not practiced in some time.
This answer very much helped me to understand the model.