Calibrate words, not just probabilities

MikkW

Calibrate words, not just probabilities

3 min read18th Jul 20203 comments

11

In Phil Tetlock's book Superforecasters, much emphasis is put on making sure that a forecaster's predictions are well calibrated- if a forecaster gives a 70% chance to 100 different events, we should expect that 70 of those events will have happened, and 30 didn't happen. If only 50 events actually happened when the forecaster said 70%, then the forecaster may want to improve their calibration (especially if they are the betting sort), and anybody listening to them would be well advised to take their poor calibration into account when hearing a prediction.

Tetlock laments that television pundits almost never give probabilities for their predictions, so they basically can't be calibrated, furthermore, the events they describe are so vague that it can't even be readily agreed whether or not their predictions were correct even in retrospect. All of these things give a headache to someone who actually wants to hold pundits accountable, and to have a meaningful conversation with people who are influenced by such pundits.

Now consider the phrase "I predict event X will happen by November 31st with 80% probability"- the function of such an utterance is that upon hearing these sounds, an idea will be formed inside my mind, that if we saved the state of the simulator we live in, and ran the simulator from that point 100 times, I should expect to see event X happen in 80 of these histories by November 31st". When a pundit utters the phrase "Zombies will certainly roam the Earth if we implement policy Y", a similar idea is formed in my mind. But where the first statement allowed me to form an idea with a clear timeframe, and a precise level of certainty, listening to the pundit, I have to infer these for myself.

To help me get the best possible understanding from the pundit's words, I can calibrate his words just the same as I can calibrate probabilities. After all, if Freddy Forecaster says "70% probability" for events that happen only 60% of the time, I know to correct, in mind, Freddy's forecast- when he says 70%, I know to anticipate that it will actually happen only 60% of the time, and would bet accordingly. So if Peter Pundit says something "certainly" will happen 100 times, and we see 55 of these events actually happen, the next time he says something "certainly" will happen, I would be willing to bet based on his words suggesting a 55% probability. I can likewise calibrate when he says he's "extremely confident", or that "X will happen" (without any indicators), or "it's probable", and understand that such words are correlated with certain underlying probabilities.

Likewise, even if a pundit fails to give a meaningful timeline for his predictions, we can still calibrate timelines based on previous predictions. If he said previously that "This will cause the price of X to go up", then 21 months later, the price does indeed go up, we don't have to concern ourselves with whether the price going up has to do with the original event the predictor linked it to. We can simply observe that typically, if Peter Pundit says something will happen, it will happen 1.25 times sooner than the base rate would suggest, and use this to translate a vague, non-time-bound prediction into something attached to a verifiable timeframe, which can then help our calibration for other types of vague wording.

Throughout this post, I've been talking about forecasters and pundits, but these principles can be used to analyze everyday discourse and conversation as well. Oftentimes we hear people say things, without actually giving much thought to what falsifiable model of the world we should actually infer from and be willing to trust based upon their words. It could be valuable to calibrate the statements of different people in a given social environment, and use that calibration to inform our communication, decisions, and thinking based on other people's proclamations.

Philosophy of LanguageWorld Modeling

Frontpage

11

Calibrate words, not just probabilities

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 6:07 AM

[-]Ericf4y50

There is an unstated assumption here that the words chosen map to some internal level of confidence. I don't believe that is the case for most people. Saying something is "certain" vs "likely" vs "will happen" is driven more by the immediate external factors (eg did the previous speaker just use the word "certain" - even on a completely different topic?) than any long term internal consistent reflections of internal confidence.

[-]MikkW4y10

I mostly agree with this comment. I do think there are broad categories we can put words / phrases (and probably body language and other paralinguistics) into, which can give us meaningful evidence of the other person's confidence.

[-]Filipe Marchesini4y20

Most babble that seems to be "predictions" are actually not predictions and, as pointed by Ericf, they do not reflect the internal confidence of the speaker. Sometimes I hear "I am completely sure my favorite team is going to win the championship", although it is clear that this is not a prediction made by the person, it is his way of saying "I really would like this outcome to happen and that's my way of signal this".

"He is not going to die" doesn't mean "I predict with 90% confidence that he is not going to die" but rather "I wouldn't like him to die, and even though the unknown real probability may be high, just accepting this may create this reality, so I will say he is not going to die and reality will follow my words, and that's the power of words, as god said on the bible".

I really see a lot of people talking about "the power of words", so they don't try to truly have accurate beliefs that predict accurately the results on some timestamps, but just uttering the words "may alter reality in a way that they don't like", so they just pretend to be high confident on some possible good outcomes because, well, "I am absolutely sure coronavirus will not be that bad", but hey, "although it was very bad, I am sure everything is going to be fine". Hey, I am sure we will handle the situation and that there will still be some beds on the hospital for people. Why these fucking words don't work? Your partner says: don't give up, I am sure everything is going to be fine.

After all, if Freddy Forecaster says "70% probability" for events that happen only 60% of the time, I know to correct, in mind, Freddy's forecast- when he says 70%, I know to anticipate that it will actually happen only 60% of the time, and would bet accordingly. So if Peter Pundit says something "certainly" will happen 100 times, and we see 55 of these events actually happen, the next time he says something "certainly" will happen, I would be willing to bet based on his words suggesting a 55% probability.

I agree with you that we should try our best to give our best estimates, and also say our confidence in our estimates, while also creating our historic record of predictions for everyone to calibrate their confidence in our statements. But, for real, every time I see a new pundit, probably this will be the first and also the last time we will be hearing about him. It is hard to have any history of his predictions. It will be very hard to find 100 predictions registered on a platform, and count how many he got it right. And even if such a platform existed with all historic predictions, that also could be gamified in a certain way, e.g, it is easy to predict that the sun will come up tomorrow, and I will win everytime I bet on this. After winning 100/100, I try to predict the price of Tesla shares on the next day. Well, even if you used my history of random easy predictions to calibrate your confidence on my hard predictions, that wouldn't help. Idk, for me it is just ABSOLUTELY hard to calibrate my confidence on the pundits' statements even if he had put "70%" on the middle of the sentence. Probably he doesn't even know what he is talking about. And probably we won't ever have any opportunity to make him to pay rent in anticipated experiences, nor to check any previous hard predictions.

Moderation Log