Scoring Rules and Automated Market Makers

There are two very similar pages. This one and

The log score penalizes overconfidence (i.e. a forecast that is too certain) stronger than underconfidence.  While all proper scoring rules should incentivize the forecaster to predict their accurate true belief, forecstersforecasters may feel enticed to err on the side of caution when scored using the log score.


I like this page!

Forecasting rules and their flaws

  • Average Brier 
    • Encourages only forecasting on questions that you know about more than your current average Brier
    • Hard to compare to others since they may have forecasted on easier questions
  • Community average Brier 
    • Encourages only forecasting on questions you think you know more than the community on
    • use this
  • Summed log score
  • Profit
    • Discourages forecasting on long-term questions
  • Profit + loans
    • Is very heavily dependent on the % return of the loan

Log Score

The Log score (sometimes called surprisal) is a strictly proper scoring rule[1] used to evaluate how good forecasts were. A forecaster scored by the log score will, in expectation, obtain the best score by providing a predictive distribution that is equal to the data-generating distribution. The log score therefore incentivizes forecasters to report their true belief about the future.

All Metaculus scores are types of log score[2].


The log score is usually computed as the negative logarithm of the predictive density evaluated at the observed value y , log   log score(y)=logf(y) , where f() is the predicted probability density function. Usually, the natural logarithm is used, but the log score remains strictly proper for any base  1}">>1 used for the logarithm.

In the formulation presented above, the score is negatively oriented, meaning that smaller values are better. Sometimes the sign of the log score is inversed and it is simply given as the log predictive density. If this is the case, then larger values are better.

The log score is applicable to binary outcomes as well as discrete or continuous outcomes. In the case of binary outcomes, the formula above simplifies to

log score(y)=logP(y) ,

where P(y) is the probability assigned to the binary outcome y. If a forecaster for example assigned 70% probability that team A would win a soccer match, then the resulting log score would be log0.70.36 if team A wins and  log0.31.20 if team A doesn't win.


Illustration of the difference between local and global scoring rules. Forecasters A and B both predicted the number of goals in a soccer match and assigned the same probability to the outcome that was later observed and therefore receive the same log score. Forecaster B, however, assigned a significant probability to outcomes far away from the observed outcome and therefore receives worse scores for the global scoring rules CRPS and DSS.

The log score is a local scoring rule, meaning that the score only depends on the probability (or probability density) assigned to the actually observed values. The score, therefore, does not depend on the probability (or probability density) assigned to values not observed. This is in contrast to so-called global proper scoring rules, which take the entire predictive distribution into account.

Penalization of Over- and Underconfidence

The log score penalizes overconfidence (i.e. a forecast that is too certain) stronger than underconfidence.  While all proper...

Read More (40 more words)
Created by Nathan Young at