Michal

Comments

What is the semantics of assigning probabilities to future events?

I like the idea of defining a betting game 'forecasters vs cosmic bookie'. Then saying 'the probability that people will land on Mars by 2040 is 42%' translates into semantics 'I am willing to buy an option for Y<42 cents that would be worth $1 if we land on Mars by 2040 or $0 otherwise'.

To compare several forecasters we can consider a game in which each player is offered to buy some options of this kind. Suppose that for each x in {1, \dots, 99} each player is allowed to buy one option for x cents. If one believes that the probability of an event is 30% then it is profitable for them to buy the 29 cheapest options and nothing more (it does not matter if one buys the option for 30 cents or not).  

To make the calculations simpler, we can make the prices continuous. So one is allowed to buy an option-interval [0,x] for some real x in [0,1]: by integration its price should be  and the pay-off is x if the event occurs. If the 'true' probability of the event is y then the expected profit equals . One can easily see that if you know the value of y then the optimal strategy sets x=y. The larger mistake you make, the lower is your expected profit. The value of the game is the sum of all the profits and being a good forecaster means that one can design a strategy with high expected revenue.

An important drawback of this approach is that when you correctly estimate the probability of successful Mars landing to be 42%, then the optimal strategy gives expected profit . However, if the question would be 'what is the probability that people would FAIL to land on Mars by 2040?', then the same knowledge gives you answer 58% and the expected profit is different: . Hence, the bookie should also sell options that pays when the event does not occur or, equivalently, always consider each question together with its dual, i.e., the question about the event not happening. Now it begins to look like a proper mathematical formalization of forecasting.

Still, the problem remains that the choice of available options is arbitrary. Here I assumed that the prices are distributed uniformly in the interval [0,1] but one can consider some other distribution. The choice of the distribution governs how much you lose when you are off by 1% or 2%. The loss value is also different when you mistake 50% vs 51% and, e.g., 70% vs 71%. Tweaking the parameters of the distributions can change the result of any forecasting competition, but this should be fine as long as the parameters are known to the contestants.