PredictionBook.com - Track your calibration

Eliezer Yudkowsky

PredictionBook.com - Track your calibration

1 min read14th Oct 200953 comments

41

BettingForecasting & PredictionSoftware Tools

Our hosts at Tricycle Developments have created PredictionBook.com, which lets you make predictions and then track your calibration - see whether things you assigned a 70% probability happen 7 times out of 10.

The major challenge with a tool like this is (a) coming up with good short-term predictions to track (b) maintaining your will to keep on tracking yourself even if the results are discouraging, as they probably will be.

I think the main motivation to actually use it, would be rationalists challenging each other to put a prediction on the record and track the results - I'm going to try to remember to do this the next time Michael Vassar says "X%" and I assign a different probability. (Vassar would have won quite a few points for his superior predictions of Singularity Summit 2009 attendance - I was pessimistic, Vassar was accurate.)

Mentioned in

9Predicting: Quick Start

New Comment

53 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:20 AM

[-]Cyan15y150

maintaining your will to keep on tracking yourself even if the results are discouraging, as they probably will be...

I predict with probability 0.95 that my 95% intervals will contain the quantity I'm estimating around 50% of the time.

[-]gwern15y50

I see what you did there.

[-]LauraABJ15y00

Hilarious.

[-]MBlume15y90

I've created an account to represent predictions made by the intrade markets. If anyone would like to help me update with regular quotes, PM me for the password.

[-]gwern14y00

I don't think I have time for regular quotes, but I could help you out monthly by judging predictions, and adding the prediction of markets added since the last month (if there is any easy way to get such a list).

[-]Jack15y80

Eliezer, you ought to be ashamed of yourself!

[-]CannibalSmith15y70

The sun will rise tomorrow morning
( 80% confidence; 5 wagers; 1 comment )

O_o

[-]Jayson_Virissimo12y10

David Hume is fence sitting 236 years ago.

[-]gwern12y20

Very funny, but to be fair, he would've given it a higher probability*, even if he might not have reinvented Laplace's law of succession and been able to give the sun rising a 1/1826250 chance of not rising.

* Remember, the empiricism vs rationalism debate could be summed up as 'is infinite certainty possible?'

[-]Jack15y70

Everyone here should go and make at least one prediction. Its rationalist homework time.

[-]Jonathan_Graehl15y60

Calibration may be achievable by a general procedure of making and testing (banded) predictions, but I wouldn't trust anyone's calibration in a particular domain on evidence of calibration in another.

In other words, people will have studied the accuracy of only some of their maps.

[-]gwern13y20

Do you have any evidence for this? I don't remember any strongly domain-specific results in Tetlock's study, the book I read about calibration in business, or any studies. Nor does Wikipedia mention anything except domain experts being overconfident (as opposed to people being random outside their domain even when supposedly calibrated, as you imply), which is fixable with calibration training.

And this is what I would expect given that the question is not about accuracy (one would hope experts would win in a particular domain) but about calibration - why can't one accurately assess, in general, one's ignorance?

(I have >1100 predictions registered on PB.com and >=240 judged so far; I can't say I've noticed any especial domain-related correlations.)

[-]Jonathan_Graehl13y00

p.s. that's a lot of predictions :)

[-]lessdazed13y00

How many would you have thought gwern had?

[-]Jonathan_Graehl13y10

I found this question puzzling, and difficult to answer (I'm sleep deprived). Funny joke if you were sneakily trying to get me to make a prediction.

Unfortunately I'm pretty well anchored now.

I'd expect LW-haunters who decide to make predictions at PB.com to make 15 on the first day and 10 in the next year (with a mode of 0).

[-]Jonathan_Graehl13y00

Your point regarding the overconfidence of most domain experts is a strong one. I've updated :) This is not quite antipodal to the incompetent most overestimating their percentile competence - D-K.

I was merely imagining, without evidence, that some of the calibration training would be general and some would be domain specific. Certainly you'd learn to calibrate, in general. You just wouldn't automatically be calibrated in all domains. Obviously, if you've optimized on your expertise in a domain (or worse: on getting credit for a single bold overconfident guess), then I don't expect you to have optimized your calibration for that domain. In fact, I have only a weak opinion about whether domain experts should be better or worse calibrated on average in their natural state. I'm guessing they'll overly signal confidence (to their professional+status benefit) moreso than that they're really more overconfident (when it comes to betting their own money).

[-]gwern13y00

Fortunately, Dunning-Kruger does not seem to be universal (not that anyone who would understand or care about calibration would also be in the stupid-enough quartiles in the first place).

Certainly you'd learn to calibrate, in general. You just wouldn't automatically be calibrated in all domains.

Again, I don't see why I couldn't. All I need is a good understanding of what I know, and then anytime I run into predictions on things I don't know about, I should be able to estimate my ignorance and adjust my predictions closer to 50% as appropriate. If I am mistaken, well, in some areas I will be underconfident and in some overconfident, and they balance out.

[-]Jonathan_Graehl13y10

If there's a single thing mainly responsible for making people poor estimators of their numerical certainty (judged against reality), then you're probably right. For example, it makes sense for me to be overconfident in my pronouncements if I want people to listen to me, and there's little chance of me being caught in my overconfidence. This motivation is strong and universal. But I can learn to realize that I'm effectively lying (everyone does it, so maybe I should persist in most arenas), and report more honestly and accurately, if only to myself, after just a little practice in the skill of soliciting the right numbers for my level of information about the proposition I'm judging.

I have no data, so I'll disengage until I have some.

[-]JoshuaZ13y00

(I have >1100 predictions registered on PB.com and >=240 judged so far; I can't say I've noticed any especial domain-related correlations.)

Note that there are some large classes of predictions which by nature will strongly cluster and won't show up until a fair bit in the future. For example there are various AI related predictions going about 100 years out. You've placed bets on 12 of them by my count. They strongly correlate with each other (for example general AI by 2018 and general AI by 2030). For that sort of issue it is very hard to notice domain related correlation when almost nothing in the domain has reached its judgement date yet. There are other issues with this sort of thing as well, such as a variety of the long-term computational complexity predictions (I'm ignoring here the Dick Lipton short-term statements which everyone seems to think are just extremely optimistic.). Have there been enough different domains that have had a lot of questions that one could notice domain specific predictions?

[-]gwern13y00

All that is true - and why it was the last and least of my points, and in parentheses even.

[-][anonymous]12y50

Hi. Who do I go to to request that PredictionBook have an option not to see other people's estimates before adding your estimate to someone else's prediction? I'm anchoring on other people's estimates and it's preventing me from using the site to calibrate myself without generating a lot of my own predictions.

[-]gwern12y30

You'd go to the PredictionBook GitHub repo to open a bug report; but PB is mostly in maintenance mode, so unless you're a Ruby programmer...

[-][anonymous]12y00

Nuts. Generating my own predictions it is.

[-]rwallace15y20

+1 Interesting! I've put in a prediction... and also pressed the wrong button on somebody else's prediction (for which the time hasn't elapsed yet) and marked it judged right, hopefully clicking Unknown undoes that...

The advantage of a site like this having been brought to the attention of geeks is that there are at least a few predictions listed to which my answer isn't "how the heck would I know?" :)

[-]rwallace15y40

Seems like a few other people have been doing the pressing the wrong button thing, if I'm now understanding the user interface correctly? I've tried setting some of those still in the future predictions to unknown, hopefully that's the right thing to do. If so, would it be possible to change the user interface to avoid this error?

[-]Emile15y70

Same here - once I entered a percentage, I wasn't sure which button to press, I hesitated between "right" (meaning the percentage I was giving was my confidence that it was right) and "my 2 cents" (which I thought only applieds to when you entered a comment). I selected "right", which was wrong.

The interface needs a bit of polishing.

[-]ektimo15y00

Me too. The interface for that was confusing enough that I ended up not submitting at all.

[-]gwern14y10

I have spent some time extracting bets and predictions from Long Bets under my account.

So far I have all open bets with fixed dates imported, and roughly a third of the predictions.

It would be nice if a bunch of LWers could go in and put down their own probabilities. It's true that most of them don't come due any time soon - looking at upcoming predictions I see the first Long Bet item coming up in 5 months, followed by 2 or 3 within half a year after that. But the more who use it, the more short-term predictions and the more useful. The rich get richer, etc.

[-]gwern14y00

I've finished importing all the sensible bets and predictions from LB. I suppose the next target is Wrong Tomorrow, and then I'll turn to Intrade.

(I see next to no contributions from LWers. This disappoints me; do we all think we are well-calibrated or what?)

[-]gwern14y00

Wrong Tomorrow turns out to be something like half or more expired predictions which haven't been judged (and only the moderators/admins can do that, so...). Imported the outstanding predictions faster than I expected, so that's done.

Next is Intrade and our 2010 predictions, unless anyone has other suggestions.

[-]kess3r15y10

Also, there need to be more explanations of how things work and the interface needs to be tweaked for better user friendliness. Also, please add more bandwidth. Otherwise, awesome idea.

[-]matt15y00

Yeh - sorry about the slow. Speed optimizations are one of the things we left out. If enough of you keep using it, we'll make it faster.

[-]kess3r15y10

This is pure awesome. Finally something has been done! This is akin to the mythbusters going on TV and doing science instead of just talking about how awesome science is.

Apologies for my little rant above.

As for the site itself, other than being awesome, it needs a few tweaks. There is no place to discuss the site itself and possible improvements to it. Also, I wish there was a feature to hide the result until after I vote.

[-]matt15y00

See the Feedback tab floating on the right.

[-]DanArmak15y10

This is quite interesting & exciting.

Are they planning on adding features relevant to a prediction market (apart from betting money)? E.g., tracking better reputation/score based on success or transitive trust; or tracking the overall predicted value of a prediction with many bets, weighed by the success/reputation/... of the betters.

[-]Eliezer Yudkowsky15y20

Whether they add features will depend on whether people seem interested in using it, they say.

[-]matt15y70

Official answer: Eliezer's right. If we see traffic growing we'll invest in further development.

We can think of many things we could do to make the site better… but those users who currently use it don't use it enough, and if they tell their friends about it their friends don't become regular users (often enough).

Hosting the current code is very cheap and easy, so the site's in little danger of being shut down, but we won't be developing it further unless you guys and gals (and your friends, and their friends) pile on the love.

[-]kess3r15y30

Just out of curiosity, are you a startup, a non profit or a guy doing a side project?

I predict the site's userbase will not explode overnight but will escalate in the shape of a hockey stick. That's how these things usually happen. You will have to keep improving it even while the userbase is still low, otherwise people will think the site is dying and they will stop showing up. Interesting things need to already be happening on the site before a larger audience will keep coming back to it, not vice versa.

Also, you need to add documentation no matter how simple and intuitive you think the sites features are. They don't seem as intuitive from the outside. By 'documentation' I mean a short and EXPLICIT description of what each feature does. I like the 'help' button near the timeframe for the prediction. You could add help buttons next to everything. Also a faq would be nice.

Overall I think the site has great potential. Keep up the good work.

[-]matt15y30

Just out of curiosity, are you a startup, a non profit or a guy doing a side project?

We're Investling, which is a handfull of startups and an IT consultancy. We're for-profit, with some non-profit projects on the side (in part because we'll make more profits if we can help save the world from surprise conversion to paperclips). The majority of our non-profit work is SIAI related.

I predict the site's userbase will not explode overnight but will escalate in the shape of a hockey stick. […]

Some projects follow that pattern. Some projects never hockey-stick. How can you tell which curve you're riding?

We have many projects running: some have maintained exponential growth since we became involved; some are too young to judge; and some are on the low end of a curve that may be a hockey stick and may just be a project that doesn't have any legs. I very much hope that the LW crowd will latch on to PBook (keep coming back, tell your friends, etc.). If you do (we do - several of us are very keen LWers) and we see traffic growing, we'll flood more resources into the project. If it languishes we'll continue to host it and may even open source it, but it seems more sensible to flood our resources into projects that are winning. I really don't want to see PBook die, but I'm trying to count warm fuzzies consciously.

Also, you need to add documentation […]

We know the documentation is sparse (or, more precisely, the user interface isn't intuitive - documentation is evidence of a UI failure and good design is self-documenting). If you guys are still around in 14 days we should talk about more dev resources.

[-]thomblake15y20

good design is self-documenting

Yes yes yes. Four times yes.

and we see traffic growing

Right now the UI is so slow / bad that I couldn't see myself using it.

[-]anonym15y30

Agreed on the UI being incredibly confusing (and slow).

In terms of usability, if they just moved the judgment buttons down below, added text like "Render final judgment on this prediction" to make it obvious what judgment does, and changed "My 2 cents" to "Submit Estimate" or something like that, it would be a huge improvement over the current. These sorts of very minor cosmetic UI changes would be trivial to make.

[-]gwern14y00

I just signed up and did a bunch of predictions. Here are my initial impressions:

The majority of our non-profit work is SIAI related.

A tool like PB is like spaced repetition flash card programs or writing Wikipedia articles - a long-term tool. Some benefits appear quickly, yes, but the bulk of the benefits arrive over years or decades. (PB is somewhat like Long Bets.)

As the saying goes, "In the long run, the utility of all non-Free software approaches zero. All non-Free software is a dead end." If I invest time in PB, what guarantee do I have that I will be able to get my data out of PB when* it dies, especially for topics I didn't write? Are you guys going to license the content under a CC license?

(You should do it early, while there still isn't too much content - once Wikipedia got large, it took years and years and a unique one-time exemption by the FSF to liberate its content from the GFDL into a CC license.)

* And it will die eventually. Every site either dies or evolves out of recognition.

** My data is vastly more important to me than the website software. If I had to, I could run a personal PB in just a flat text file, after all.

2) comments are ridiculously constrained. I dunno if you guys were trying for some sort of auto-Twitter compatibility, but it's really annoying. If you need to dump comments on Twitter and they're too long, then just truncate them.

3) I just judged a Michael Jackson-related prediction wrong, with a citation that the predicted event happened in the wrong year. But in the history section, my comment never appeared!
My current workaround is to make a 0 or 50% prediction (wrong/right), explain my reasoning as best as I can in so short a space, and then separately mark it wrong/right. This is unfair to my score, since obviously I can choose 0 or 100% and always be right.

4) The black boxes on prediction pages (eg. "Join this prediction") are horrible. I was convinced for the longest time that they were buttons to push, and that they were disabled by some JavaScript pokery until I went and read the page source.

5) Newlines in comments do not get translated to a space or two in the comment; they get translated to nothing whatsoever.

6) No apparent way to edit 'due dates' for predictions; many unjudged predictions can't be judged at all because they seem to have been created expired.

7) On userpages, the most recent prediction/action gets split in half by the statistics graph in my Firefox; screenshot.

8) Years get interpreted badly. '2029' becomes - somehow - 2 hours from right now, as opposed to 19 years. screenshot

9) The in-browser JS date checker seems to be quite inaccurate. I've been entering all my dates as '1 January 2024' and the like, which it has never validated - but which turn into the right date when actually submitted.

10) The site is slow. And it seems to be on the server itself. I'm the only user right now, and yet predictions can take as much as 10 seconds to enter. I don't understand how it can be so slow, given that a prediction is a 4-tuple of (date,prediction,owner, user-confidence) which probably adds up to less than a kilobyte of data.

[-]matt14y10

It's our intention to Open Source the PredictionBook code… and has been for at least six months, but we keep not quite getting around to it. It's also my intention to write a top level post about why I think PBook isn't getting much traffic (it being slow is only one reason).

Any one with a reputation on this site that wants access to the code before we get it open sourced is welcome to contact me directly. The code's on github.com and is written in Ruby on Rails.

(gwern, if you want access send me your promise that you'll behave responsibly and your github username.)

[-]gwern14y10

but we keep not quite getting around to it.

I know the feeling.

It's also my intention to write a top level post about why I think PBook isn't getting much traffic (it being slow is only one reason).

I have my own theories (mostly that people aren't very interested in truth-seeking, pace Hanson, and that the benefits are too long-term, cf. SRS flashcards), but that's just my perspective as a user.

gwern, if you want access send me your promise that you'll behave responsibly and your github username.

Do you mean access to the data? As I said, I'd like to edit the dates on some of the predictions...

I've signed up at http://github.com/gwern

[-]matt14y00

Do you mean access to the data?

No. People have private predictions in there, so I don't think I can in clear conscience give you access to anyone's predictions but your own (and giving you access to only your own is about half as much work as properly open sourcing the project). I mean the code… and you didn't quite send me your promise that you'll behave responsibly yet.

[-]gwern14y00

Well, alright. I see I didn't specifically say 'public data'. That's what I want.

and you didn't quite send me your promise that you'll behave responsibly yet.

I think it's kind of silly to ask for such a promise, but for what it's worth, you have it. (What irresponsible things could I do with just the codebase? I'm no cracker to find security holes and exploit them on the live site.)

[-][anonymous]13y00

Is there anywhere to report issues with PredictionBook? I can't seem to find one.

Edit: Huh, I can edit comments I've retracted? I had no idea.

[This comment is no longer endorsed by its author]Reply

[-]Jack15y00

Can some explain to me what is going on with this prediction given this prediction. I'm not going crazy, right:? People are confused.

[-]thomblake15y00

Interesting - by the time I checked this, it looks like there aren't any inconsistent estimates.

[-]Jack15y00

Right now the later prediction has 9 points higher probability than the sooner prediction. I counted two or three cases of individual users posting higher probabilities for the later prediction. Unless they're 're really confident the first cryonic revival takes place during that decade they're making a huge mistake. My best explanation is that they just saw a farther off date and assumed a higher probability of everything...

[-]ektimo15y00

Some of my predictions are of the sort "the stock market will fall 50% tomorrow with 20% odds" (not a real prediction!). If it did happen I should get huge credit, but would it show up as negative credit since I predicted there was only a 20% chance it would happen? Is there some way it would be possible to do this kind of prediction with PredictionBook?

I predict this comment will get less than 4 points by Oct. 19 with 75% odds.

[-]gwern15y00

If it did happen I should get huge credit, but would it show up as negative credit since I predicted there was only a 20% chance it would happen? Is there some way it would be possible to do this kind of prediction with PredictionBook?

It seems to me like you're asking about 2 different issues: the first is not desiring to be penalized for making low-probability bets; but that should be handled already by low confidences - if you figure it at 1 in 5, then after only a few failed bets things should and ought to start looking bad for you, but if at 1 in thousands, each failed prediction ought to affect your score very little.

Presumably PredictionBook is offering richer rewards for low-probability successes, just like a 5% share on a prediction market pays out (proportionately) much more than a 95% share would; on net you would do the same.

The second issue is that you seem to think that certain predictions are simply harder to predict better than chance, and that you should be rewarded for going out on a limb? (20% odds on a big market bet tomorrow is much more detailed than the default 1-in-thousands-chance-per-day prediction.)

I don't know what the fair reward here is. If few people are making that prediction at all, then it should be easy to do better than them. In prediction markets, one expects that unpopular markets will be easier to arbitrage and beat - the thicker the market, the more efficient; standard economics. So in a sense, unpopular predictions are their own reward.

But this doesn't prevent making obscure predictions ('will I remember to change my underwear tomorrow?') Nor would it seem to adequately cover 'big questions' like open scientific puzzles or predictions about technological development (think the union of Longbets & Intrade). Maybe there could be a bonus for having predictions pay out with confidence levels higher than the average? This would attract well-calibrated people to predictions where others are not informed or are too pessimistic.

[-]UnholySmoke15y00

Cracking idea, like it a lot. Hofstadter would jump for joy, and in his honour:

http://predictionbook.com/predictions/532

Moderation Log