cross-posted from

Iqisa is a library for handling and comparing forecasting datasets from different platforms.

Iqisa: A Library For Handling Forecasting Datasets

The eventual success of my archives reinforced my view that public permission-less datasets are often a bottleneck to research: you cannot guarantee that people will use your dataset, but you can guarantee that they won’t use it.

Gwern Branwen, “2019 News”, 2019

Iqisa is a collection of forecasting datasets and a simple library for handling those datasets. Code and data available here.

So far it contains data from:

for a total of ~4.2m forecasts, as well as code for handling private Metaculus data (available to researchers on request to Metaculus), but I plan to also add data from various other sources.

The documentation can be found here, but a simple example for using the library is seeing whether traders with more than 100 trades have a better Brier score than traders in general:

import gjp
import iqisa as iqs

def brier_score(probabilities, outcomes):
	return np.mean((probabilities-outcomes)**2)

def brier_score_user(user_forecasts):
	return np.mean((probabilities-user_right)**2)

trader_scores=iqs.score(market_fcasts, brier_score, on=['user_id'])
filtered_trader_scores=iqs.score(market_fcasts.groupby(['user_id']).filter(lambda x: len(x)>100), brier_score, on=['user_id'])

And we can see:

>>> np.mean(trader_scores)
score    0.159194
dtype: float64
>>> np.mean(filtered_trader_scores)
score    0.159018
dtype: float64

Concluding that more experienced traders are only very slightly better at trading.


Possible Projects

  • Take questions from different platforms that are close to each other on sentence2vec, and check which platform made the better predictions on that question.

Known Bugs

Since this is a project I'm now doing in my free time, it might not be as polished as it should be. Sorry :-/

If you decide to work with this library, feel free to contact me.

  • Issues with the time fields
    • The native pandas datetime format is too restricted for some time ranges in these datasets, those values might be set to NaT.
    • Not all time-related fields have timezone information attached to them.
  • Some predictions in the dataset have occurred after question resolution. There should be a way to filter those out programmatically.
  • The columns of the datasets are not sorted the same way for question DataFrames and forecast DataFrames.
  • I fear that despite my best efforts, not all data frome the GJP data has been transferred.
  • The default fields in the Metaculus & PredictionBook data should be NA more often than they are right now.
  • The documentation is still slightly spotty, and tests are mostly nonexistent.
  • Some variables shouldn't be exposed, but are.

Feature Wishlist

  • Create a pip package
  • Add data from more platforms

Potential Additional Sources for Forecasting Data


Credits go to Arb Research for funding the first 85% of this work, and Misha Yagudin in particular for guidance and mentorship.