LESSWRONGLW

Open Thread, Jul. 27 - Aug 02, 2015

Suppose that I am given a calibration question about a racehorse and I guess "Secretariat" (since that's the only horse I remember) and give a 30% probability (since I figure it's a somewhat plausible answer). If it turns out that Secretariat is the correct answer, then I'll look really underconfident.

But that's just a sample size of one. Giving one question to one LWer is a bad method for testing whether LWers are overconfident or underconfident (or appropriately confident). So, what if we give that same question to 1000 LWers?

That actually does... (Read more)(Click to expand thread. ⌘/CTRL+F to Expand All)Cmd/Ctrl F to expand all comments on this post

That is why I looked at all 10 questions in aggregate.

Well, you did not look at calibration, you looked at overconfidence which I don't think is a terribly useful metric -- it ignores the actual calibration (the match between the confidence and the answer) and just smushes everything into two averages.

It reminds me of an old joke about a guy who went hunting with his friend the statistician. They found a deer, the hunter aimed, fired -- and missed. The bullet went six feet to the left of the deer. Amazingly, the deer ignored the shot, so the hunter aime... (Read more)(Click to expand thread. ⌘/CTRL+F to Expand All)Cmd/Ctrl F to expand all comments on this post

5

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters: