Lesson Plan: Biases in Quantity Estimation

LESSWRONG
is fundraising!
LW

Lesson Plan: Biases in Quantity Estimation — LessWrong

Pedagogic status: I’ve never executed this exact lesson plan, but I ran a workshop similar to this for a dozen friendly but non-technical adults a few years ago, and it seemed to go well. The content won’t be new to the average LessWronger, but I do present some novel ways to teach, practice, and think about these concepts.

Good afternoon, students. Today, we’ll be talking about biases relevant to forecasting, and some ways you can get around them.

Well, we’ll be talking. There are some weird voyeurs watching us from the past and/or future, for whom this is more of a reading/writing kind of deal: try to ignore their eerie gazes.

We’re going to be using a quiz to evaluate how over- or under-confident you are. This is a little different to the quizzes you’re used to, so I’ll take you through an example question before handing them out.

I could break the ice and keep up engagement by asking one of you to supply the question, but there’s a chicken-and-egg problem: I can’t easily-reliably-quickly get you to give me a question in the right shape without sharing an example first. So annoyingly, for most audiences, it’s probably best to just use a canned example.

The example question looks like this:

I’m 80% confident there are at least ____ species of Pokemon.

I’m 95% confident there are at least ____ species of Pokemon.

I’m 99% confident there are at least ____ species of Pokemon.

(By the way, the word ‘confident’ is being used here in a technical sense: if you’re something% confident, it means that you think there’s a something% chance your guess is right. In other words, if you made a hundred 99% confidence guesses, you’d expect – on average – that about one of them would turn out false.)

Why am I using these kinds of questions, instead of doing the typical ‘estimate-the-probability-of-this-binary-outcome’ thing? Partly it’s because I suspect that approach trains subtly flawed skills, in ways that make my mental model of Taleb recoil in horror: the Credence Calibration Test punishes you for assigning 99% weight to guesses, while in the real world there are plenty of situations where you need that level of certainty. But mostly it’s because I don’t expect a maybe-nontechnical audience to understand Brier Scores, but I do expect them to understand “if you got 20% of your 99%-confidence predictions wrong (or 100% of your 80%-confidence predictions right) you messed up somewhere”.

I’m a dirty cheater because I’ve looked up the answer, but pretending I’m not: for the 80% line, I vaguely remember hearing there were about 700 Pokemon, and that was a few years back, so there are probably more by now. I’m about 80% confident there are at least 700 Pokemon.

For the 95% line, I’m going to be a bit more cautious. I’m 95% sure there are at least 400 Pokemon.

For the 99% line, I’m going to be very careful. I know there were about 250 when I stopped playing, and I know they added some more since then, so I can feel very confident there are at least 300.

The answer, according to Wikipedia: there are 905 species of Pokemon. So, in that example, I got all my predictions ‘right’.

. . . except in the sense that the right answer is ‘0, because Pokemon are fake’.

However, that isn’t necessarily a good thing: if you get 100% of your 80%-confidence predictions ‘right’, that means you’re underconfident and need to be more daring with your guesses.

Okay, does everyone understand the idea?

Realistically I’m expecting most of you stay quiet even if you’re hopelessly lost but I’m hoping there’ll be at least one aggressively confused person who gives me an excuse to clarify for the whole class.

Great!

I hope I’m successfully obscuring how much your meekness disappoints me.

I’ll hand out the worksheet now: there are twenty questions, I’m giving you ten minutes total, don’t overthink anything. I’ll also be walking around the class while you’re taking the test, so feel free to call me over if you have any questions.

And for the voyeurs, I have a fancy javascript doodad that lets them take the same test via a web browser, then automatically checks and analyses the output.

. . . okay, it looks like you’re all about finished. Trade sheets with the person next to you, and score which of their predictions are ‘right’ vs ‘wrong’ as I read the answers out.

Technically it makes more sense to mark down only the ones they got 'wrong' but handling it that way seems cruel somehow.

Why am I getting you to mark each other’s work instead of your own? Answer: this is a sneaky way for me to get you to see someone else’s results before I explain what they mean, so you’re less likely to feel alone in screwing up.

*20 answers later*

Okay, now add up the final scores and hand the quizzes back. Let’s go through what these scores mean.

There were twenty questions. 80% of 20 is 16. So, if you were calibrated perfectly, you should have gotten about 16 of the 80% questions ‘right’, and about 4 of them ‘wrong’.

95% of 20 is 19. So if you were calibrated perfectly, you should have gotten about 19 of the 95% questions ‘right’, and about 1 of them ‘wrong’.

99% of 20 is 19.8. So if you were calibrated perfectly, you should have gotten all of the 99% questions ‘right’, or maybe gotten one wrong if you were unlucky.

I’m guessing that for most of you, that’s not what happened; walking around the room as you were taking the test, I saw a lot of tests with two or more wrong 99%-confidence answers.

Unless I didn’t, in which case I’ll replace this part with awkward insistences that most people fail that way, and heartfelt congratulations on your apparent transcendence of mortal limits.

This is because people making predictions under uncertainty are overconfident and overprecise. We feel too sure that our guesses are correct, and pick confidence intervals that are too narrow.

How can we work around our natural overconfidence? There are a couple of methods:

The first, best, and easiest way to correct for a bias is just knowing you have it, and doing your best to adjust for it.
You can also check with other people. If you know your high-confidence guesses are on average not extreme enough, you can get around this by asking someone else what they got, then taking the most extreme position between yours and theirs.
Finally, you can let yourself seem crazy. If you want your 99% answers to be ‘right’ 99% of the time, you need embarrassingly wide error bars, especially for topics you don’t know much about.

Now, let’s put theory into practice! I have another worksheet*. This one has ten questions**, you’re allowed to talk things over with the person next to you but still can’t look things up on your phones, you have five minutes***: get to it!

*Voyeur version here.

**Because I worry you’ll get bored if I give you another test as long as the last one.

***Unless you finished the first test faster/slower than I expected, in which case I’ll adjust the time limit here.

*One test-taking and marking later*

Okay, hopefully you did a bit better on that one. Let’s sum up. Can anyone tell me what we’ve learned about human nature today?

Great, and who can tell me some ways of dealing with that?

Ok, lovely! That’s the end of the lesson: go forth and make slightly less biased predictions! Class dismissed.

Actually, before you go . . . I would like to sell you a rock. Here is the rock. It is a magic rock that keeps tigers away.

You might object that you’re not at risk of tiger attacks, that magic isn’t real, and that this rock is just a rock. To this I say: are you 99% sure of that, even after accounting for overconfidence? Being attacked by a tiger would be very bad; you should probably buy my rock just to err on the side of caution.

Ahem. In case it isn’t obvious, I’m messing with you. The point I’m obliquely trying to make here is that correcting for (real!) biases can also make it easier for unscrupulous people to scam, indoctrinate, or otherwise manipulate you. So these techniques are best applied only when you’re sure no-one could directly stand to gain from you being underconfident.

(How sure, you might ask? Oh, about 95%.)

Unfortunately, while it seems reckless to teach you about biases without teaching you about how correcting for them can hurt you, dragging too many concepts into a single ~half-hour session is a big no-no in the teaching profession, so I’m probably not saying this part. But on the off-chance I am: class actually dismissed, for realsies this time.

. . . huh. It looks like there are still some voyeurs hanging around, even after you all left. Well, I do have one last quiz (cw: mortality, trolling) for them, if they’re interested.

Pleasant dreams!