Open Thread for February 11 - 17

Thanks Emile,

Is there anything you'd like to see added?

For example, I was thinking of running it on nodejs and logging the scores of players, so you could see how you compare. (I don't have a way to host this, right now, though.)

Or another possibility is to add diagnostics. E.g. were you setting your guess too high systematically or was it fluctuating more than the data would really say it should (under some models for the prior/posterior, say).

Also, I'd be happy to have pointers to your calibration apps or others you've found useful.

Alternative to Bayesian Score

Here's the "normalized" version: f(x)=1+log2(x), g(x)=1+log2(1-x) (i.e. scale f and g by 1/log(2) and add 1).

Now f(1)=1, f(.5)=0, f(0)=-Inf ; g(1)=-Inf, g(.5)=0, g(0)=1.

Ok?

Open Thread for February 11 - 17

I've written a game (or see (github)) that tests your ability to assign probabilities to yes/no events accurately using a logarithmic scoring rule (called a Bayes score on LW, apparently).

For example, in the subgame "Coins from Urn Anise," you'll be told: "I have a mysterious urn labelled 'Anise' full of coins, each with possibly different probabilities. I'm picking a fresh coin from the urn. I'm about to flip the coin. Will I get heads? [Trial 1 of 10; Session 1]". You can then adjust a slider to select a number a in [0,1]. As you adjust a, you adjust the payoffs that you'll receive if the outcome of the coin flip is heads or tails. Specifically you'll receive 1+log2(a) points if the result is heads and 1+log2(1-a) points if the result is tails. This is a proper scoring rule in the sense that you maximize your expected return by choosing a equal to the posterior probability that, given what you know, this coin will come out heads. The payouts are harshly negative if you have false certainty. E.g. if you choose a=0.995, you'd only stand to gain 0.993 if heads happens but would lose 6.644 if tails happens. At the moment, you don't know much about the coin, but as the game goes on you can refine your guess. After 10 flips the game chooses a new coin from the urn, so you won't know so much about the coin again, but try to take account of what you do know -- it's from the same urn Anise as the last coin (iid). If you try this, tell me what your average score is on play 100, say.

There's a couple other random processes to guess in the game and also a quiz. The questions are intended to force you to guess at least some of the time. If you have suggestions for other quiz questions, send them to me by PM in the format:

{q:"1+1=2. True?", a:1} // source: my calculator

where a:1 is for true and a:0 is for false.

Other discussion: probability calibration quizzes Papers: Some Comparisons among Quadratic, Spherical, and Logarithmic Scoring Rules; Bickel

Alternative to Bayesian Score

There's no math error.

Why is it consistent that assigning a probability of 99% to one half of a binary proposition that turns out false is much better than assigning a probability of 1% to the opposite half that turns out true?

I think there's some confusion. Coscott said these three facts:

Let f(x) be the output if the question is true, and let g(x) be the output if the question is false.

f(x)=g(1-x)

f(x)=log(x)

In consequence, g(x)=log(1-x). So if x=0.99 and the question is false, the output is g(x)=log(1-x)=log(0.01). Or if x=0.01 and the question is true, the output is f(x)=log(x)=log(0.01). So the symmetry that you desire is true.

It's certainly in the right spirit. He's reasoning backwards in the same way Bayesian reasoning does: here's what I see; here's what I know about possible mechanisms for how that could be observed and their prior probabilities; so here what I think is most likely to be really going on.