# 47

Heeding others' impressions often increases accuracy.  But "agreement"  and "majoritarianism" are not magic;  in a given circumstance, agreement is or isn't useful for *intelligible* reasons.

You and four other contestants are randomly selected for a game show.  The five of you walk into a room.  Each of you is handed a thermometer drawn at random from a box; each of you, also, is tasked with guessing the temperature of a bucket of water.  You’ll each write your guess at the temperature on the card; each person who is holding a card that is within 1° of the correct temperature will win $1000. The four others walk to the bucket, place their thermometers in the water, and wait while their thermometers equilibrate. You follow suit. You can all see all of the thermometers’ read-outs: they’re fairly similar, but a couple are a degree or two off from the rest. You can also watch, as each of your fellow-contestants stares fixedly at his or her own thermometer and copies its reading (only) onto his or her card. Should you: 1. Write down the reading on your own thermometer, because it’s yours; 2. Write down an average* thermometer reading, because probably the more accurate thermometer-readings will cluster; 3. Write down an average of the answers on others’ cards, because rationalists should try not to disagree; 4. Follow the procedure everyone else is following (and so stare only at your own thermometer) because rationalists should try not to disagree about procedures? Choice 2, of course. Thermometers imperfectly indicate temperature; to have the best possible chance of winning the$1000, you should consider all the information you have, from all the (randomly allocated, and so informationally symmetric) thermometers.  It doesn’t matter who was handed which thermometer.

Forming accurate beliefs is *normally* about this simple.  If you want the most accurate beliefs you can get, you’ll need to pay attention to the evidence.  All of the evidence.  Evenly.  Whether you find the evidence in your hand or mind, or in someone else’s.  And whether weighing all the evidence evenly leaves you with an apparently high-status social claim (“My thermometer is better than yours!”), or an apparently deferential social claim (“But look -- I’m trying to agree with all of you!”), or anywhere else.

I’ll try to spell out some of what this looks like, and to make it obvious why certain belief-forming methods give you more accurate beliefs.

Principle 1:  Truth is not person-dependent.

There’s a right haircut for me, and a different right haircut for you.  There’s a right way for me to eat cookies if I want to maximize my enjoyment, and a different right way for you to eat cookies, if you want to maximize your enjoyment.  But, in the context of the game-show, there isn’t a right temperature for me to put on my card, and a different right temperature for you to put on your card.  The game-show host hands \$1000 to cards with the right temperature -- he doesn’t care who is holding the card.  If a card with a certain answer will make you money, that same card and answer will make me money.  And if a certain answer won’t make me money, it won’t make you money either.

Truth, or accuracy, is like the game show in this sense.  “Correct prediction” or “incorrect prediction” applies to beliefs, not to people with beliefs.  Nature doesn’t care what your childhood influences were, or what kind of information you did or didn’t have to work with, when it deems your predictions “accurate!” or “inaccurate!”.  So, from the point of view of accuracy, it doesn’t make any sense to say “I think the temperature is 73°, but you, given the thermometer you were handed, should think it 74°”.  Nor “I think X, but given your intuitions you should think Y” in any other purely predictive context.

That is: while “is a good haircut” is a property of the (person, haircut) pair, “is an accurate belief” is a property of the belief only.

Principle 2:  Watch the mechanisms that create your beliefs.  Ask if they’re likely to lead to accurate beliefs.

It isn’t because of magic that you should use the median thermometer’s output.  It’s because, well, thermometers noisily reflect the temperature, and so the central cluster of the thermometers is more likely to be accurate.  You can see why this is the accuracy-producing method.

Sometimes you’ll produce better answers by taking an average over many peoples’ impressions, or by updating from other peoples’ beliefs, or by taking disagreement between yourself and someone else as a sign that you should debug your belief-forming process.  And sometimes (e.g., if the people around you are choosing their answers by astrology), you won’t.

But in any of these circumstances, if you actually ask yourself “What belief-forming process is really, actually likely to pull the most juice from the evidence?”, you’ll see what the answer is, and you’ll see why the answer is that.  It won’t be “agree with others, because agreement is a mysterious social ritual that rationalists aim for”, or “agree with others, because then others will socially reciprocate by agreeing with you”.  It won’t be routed through the primate social system at all.  It’ll be routed through seeing where evidence can be found (seeing what features of the world should look different if the world is in one state rather than another -- the way thermometer-readings should look different if the bucket is one temperature rather than another) and then seeing how to best and most thoroughly and evenly gather up all that evidence.

Principle 2b:  Ask if you are weighing all similarly truth-indicative mechanisms evenly.

Even when the processes that create our beliefs are truth-indicative, they generally aren’t fully, thoroughly, and evenly truth-indicative.  Let’s say I want to know whether it’s safe for my friend to bike to work.  My own memories are truth indicative, but so are my friends’ and neighbors’ memories, and so are the memories of the folk in surveys I can find on line.  The trouble is that my own memories arrive in my head with extreme salience, and move my automatic anticipations a lot; while my friend’s have less automatic impact, and those of the surveyed neighbors still less.  So if I just go with the impressions that land in my head, my predictions will overweight a few samples of evidence at the expense of all the others.

That is: our automatic cognition tends not to weigh the evidence evenly *at all*.  It takes conscious examination and compensation.

Principle 3:  Ask what an outside observer would say.

Since truth doesn’t depend on who is asking -- and since our feelings about the truth often do depend -- it can help to ask what an outside observer would say.  Instead of asking “Am I right in this dispute with my friend?” ask: “If I observed this from the outside, and saw someone with my track record and skillset, and someone else with my friend’s track record and skillset, disagreeing in this manner -- who would I think was probably right?”.

Common pitfall: Idolatry

We’re humans.  Give us a good idea, and we’ll turn it into an idol and worship its (perhaps increasingly distorted) image.  Tell us about the Aumann Agreement Theorem, and we’re liable to make up nonsense rituals about how one must always agree with the majority.

The solution is to remove the technical terms and ask *why* each belief-forming method works.  Where is the evidence?  What observations would you expect to see, if the universe were one way rather than another?  What method of aggregating the evidence most captures the relevant data?

That is: don’t memorize the idea that “agreement”, the “scientific method”, or any other procedure is “what rationalists do”.  Or, at least, don’t *just* memorize it.  Think it through every time.  Be able to see why it works.

Common pitfall: Primate social intuitions

Again: we’re humans.  Give us a belief-forming method, and we’ll make primate politics out of it.  We’ll say “I should agree with the majority, so that religious or political nuts will also agree with the majority via social precedent effects”.  Or: “I should believe some of my interlocutor’s points, so that my interlocutor will believe mine”.  And we’ll cite “rationality” while doing this.

But accurate beliefs have nothing to do with game theory.  Yes, in an argument, you may wish to cede a point in order to manipulate your interlocutor.  But that social manipulation has nothing to do with truth.  And social manipulation isn’t why you’ll get better predictions if you include others’ thermometers in your average, instead of just paying attention to your own thermometer.

Example problems:  To make things concrete, consider the following examples.  My take on the answers appears in the comments.  Please treat these as real examples; if you think real situations diverge from my idealization, say so.

Problem 1: Jelly-beans

You’re asked to estimate the number of jelly-beans in a jar.  You have a group of friends with you. Each friend privately writes down her estimate, then all of the estimates are revealed, and then each person has the option of changing her estimate.

How should you weigh: (a) your own initial, solitary estimate; (b) the initial estimates of each of your friends; (c) the estimates your friends write down on paper, after hearing some of the others’ answers?

Problem 2: Housework splitting

You get into a dispute with your roommate about what portion of the housework you’ve each been doing.  He says you’re being biased, and that you always get emotional about this sort of thing.  You can see in his eyes that he’s upset and biased; you feel strongly that you could never have such biases.  What to believe?

Problem 3:  Christianity vs. atheism

You get in a dispute with your roommate about religion.  He says you’re being biased, and that your “rationalism” is just another religion, and that according to his methodology, you get the right answer by feeling Jesus in your heart.  You can see in his eyes that he’s upset and biased you feel strongly that you could never have such biases.  What to believe?

Problem 4:  Honest Bayesian wannabes

Two similarly rational people, Alfred and Betty, estimate the length of Lake L.  Alfred estimates “50 km”; Betty simultaneously estimates “10 km”.  Both realize that Betty knows more geography than Alfred.  Before exchanging any additional information, the two must again utter simultaneous estimates regarding the answer to G.  Is it true that if Alfred and Betty are estimating optimally, it is as likely that Betty’s answer will now be larger than Alfred’s as the other way round?  Is it true that if these rounds are repeated, Alfred and Betty will eventually stabilize on the same answer?  Why?

# 47

New Comment
Some comments are truncated due to high volume. Change truncation settings

Anyone who hasn't already, check out Anna's OB post, Share likelihood ratios, not posterior beliefs.

(Anna: you write lots of great stuff; link it up!).

It was written about a year ago, but it's actually a good follow up to this post. The point is that, ideally, people would share raw observations. But sometimes that's too slow, so instead we should share a form of summarized evidence. Sharing opinions is a noisy way to do that, because other peoples' prior beliefs get needlessly mixed in with the observations, and then with your opinion, just like the s...

Let me give an argument in favor of #4, doing what the others do, in the thermometer problem. Now we seem to have them behaving badly. I think in practice many people would in fact look at other thermometers too in making their guesses. So why aren't they doing it? Two possibilities: they're stupid; or they have a good reason to do it. An example good reason: some thermometers don't read properly from a side angle, so although you think you can see and read all of them, you might be wrong. (This could be solved by #3, writing down the average of the cards,...

“is an accurate belief” is a property of the belief only

Technically, it's a property of the (belief, what-the-belief-is-about) pair. Beliefs can't be accurate by themselves; there has to be an external referent to compare them with. (Only-)self-referencing beliefs degrade straighforwardly to tautologies or contradictions.

The thermometer answer is wrong, you're ignoring that you're on a game show On a game show the producers try to organize things such that few people (or only one person) wins a challenge. As such I would expect all but one thermometer to be in error. Furthermore by watching old episodes of the show I could tell if only one thermometer will be right or if several contestants also succeed at each challenge and therefore either pick the small clump or the lone outlier.

7jimrandomh
This is a very good point. Since you might be being messed with, you should run every sanity check you can think of. In increasing order of difficulty and also increasing order of value: get the room temperature from all the thermometers; take your own temperature; ask for a drink of ice water and take its temperature. You should also consider the possibility that all of the other contestants are actors with fake thermometers.
0DanielLC
It's a metaphor, like the Monty Haul problem. The fact that that's not how game shows really work doesn't matter.

You’re asked to estimate the number of jelly-beans in a jar. You have a group of friends with you. Each friend privately writes down her estimate, then all of the estimates are revealed, and then each person has the option of changing her estimate.

How should you weigh: (a) your own initial, solitary estimate; (b) the initial estimates of each of your friends; (c) the estimates your friends write down on paper, after hearing some of the others’ answers?

I start by asking them how they made their initial estimates, and how they used others'.

This might see...

You should consider all the information you have, from all the (randomly allocated, and so informationally symmetric) thermometers. It doesn’t matter who was handed which thermometer. Forming accurate beliefs is normally about this simple. If you want the most accurate beliefs you can get, you’ll need to pay attention to the evidence. All of the evidence. Evenly.

This gives the impression that you think that normally one can just collect all the relevant evidence, after which you don't need to consider anyone else's opinion. I suppose it depends on what sort of world you live in, but that seems far from the normal situation to me.

5Paul Crowley
It's artificial in exactly the way a trolley problem is and with the same virtues, surely?
[-]Shae40

Let’s say I want to know whether it’s safe for my friend to bike to work. My own memories are truth indicative, but so are my friends’ and neighbors [and online surveys]... The trouble is my own memories arrive in my head with extreme salience, and move my automatic anticipations a lot; while my friend’s have less automatic impact, and those of the surveyed neighbors still less...our automatic cognition tends not to weigh the evidence evenly at all. <

I sometimes wonder, though, if giving one's own experiences greater weight in situations like these ...

I sometimes wonder, though, if giving one's own experiences greater weight in situations like these (though not in the thermometer situation) is rational:

The relevant question, I believe, is how much weight you should give the evidence from different sources. You should not think that the amount of weight we intuitively give evidence from our own experience is optimal, and this permits a reversal test.

9AnnaSalamon
Re: Problem 4: Roughly speaking: yes. Ordinary disagreements persist after hearing others' estimates. A and B may start out asserting "50" and "10", and then argue their way to "25" and "12", then "23" and "17". But if you want each estimate to be as accurate as possible, this is silly behavior; if A can predict that his estimate will go down over time (as he integrates more of B's evidence), he can also predict that his current estimate is too high -- and so he can improve his accuracy by lowering his estimate right now. The two parties should be as likely to overshoot as to undershoot in their disagreements, e.g.: A: 50; B: 10 A: 18; B: 22 A: 21; B: 21. So next time you're in a dispute, try applying Principle 3: ask what an outside observer would say about the situation. If Alfred and Betty both apply this principle, they'll each ask: "What would an outside observer guess about Lake L, given that Betty has studied geography and said "10", while Alfred said "50"?" And, thus viewing the situation from the (same) outside, Betty and Alfred will both weigh Betty's evidence about equally. Alfred may underweight Betty's impression (e.g., because he doesn't realize she wrote her thesis on Lake L) -- but he may equally overweight Betty's opinion (e.g., because he doesn't realize that she's never heard of Lake L either). If he could predict that he was (over/under) weighting her opinion, he'd quit doing it. More precisely: if you and your interlocutor can predict your direction of disagreement, at least one of you is forming needlessly inaccurate estimates.
3NancyLebovitz
Before I read your reply, I assume that Alfred will lower his estimate a lot, and Betty might raise her estimate a little. I expect Betty's estimate to still be lower than Alfred's, though the size of these effects would be dependent on how much more geography Betty knows than Alfred. After reading your reply, I think you're right about convergence, and definitely right about driving your answer towards what you think is correct as fast as possible rather than holding back for fear of seeming to give in.
1Jonathan_Graehl
It's an interesting problem, and you're not doing it justice. A and B have a prior based on certain evidence. Their first guess conveys only the mean of that prior. You also posit that they have a shared belief about the (expected) amount of evidence behind their prior. To update at each iteration, they need to infer what evidence about the world is behind the exchange of guesses so far. I don't agree with anything you've claimed about this scenario. I'll grant you any simplifying assumptions you need to prove it, but let's be clear about what those assumptions are.
0steven0461
If they're only similarly rational rather than perfectly rational, they'll probably both be biased toward their own estimates. It also depends on common knowledge assumptions. As far as I know two people can be perfectly rational, and both can think the other is irrational, or think the other is rational but thinks they're irrational and therefore won't update, and therefore not get to an equilibrium. So I would disagree with your statement that: In general, the insights needed to answer the questions at the end of the post go beyond what one can learn from the ultra-simple "everyone can see the same evidence" example at the start of the post, I think.
7AnnaSalamon
Re: Problem 2: Take an even probability distribution involving your feelings and your roommate’s feelings on housework (and on who’s emotionally biased). You have no reason to treat your and your roommate's feelings as asymmetrically indicative (unless unbiased indicators have told you that you're especially above- or below- average at this sort of thing). It’s like the thermometers, again. Re: Problem 3: Keep your belief in atheism. Your evidence against a Christian god is way stronger than any evidence provided by your roommate's assertion. Despite the superficial symmetry with Problem 2, the prior against the complex hypothesis of a Christian god is many orders of magnitude stronger than the prior against you being wilfully mistaken about the housework -- and these orders of magnitude matter. (Though note that this reasoning only works because such "extraordinary claims" are routinely made without extraordinary evidence; psychology and anthropology indicate that p( your roommate's assertion | no Christian god) is relatively large -- much larger than a simplicity prior would assign to p(Christian god), or p(flying spaghetti monster).
5AlexMennen
No, problems 2 and 3 are symmetrical in a more than superficial way. In both cases, the proper course of action is to attempt to conduct an unbiased evaluation of the evidence and of the biases affecting each of you. The difference is, in problem 3, we have already encountered and evaluated numerous nearly identical situations, so it is easy to come to the proper decision, whereas in problem 2, the situation could be new and unique, and missing background information about the effects of bias on the two individuals and the accuracy of their predictions becomes important.
0MartinB
The description of both problem 2 and 3 indicates a possible biasing in both participants. Its therefore reasonable to cool down first, and then check the evidence. In problem 3 roommate might point out valid criticisms about biases one might have, while still being wrong on the question itself. Either way its not rational to argue when in heat.
0RobinZ
Before reading your answers: Problem 2: Given the stated conditions ("you feel strongly that you could never have such biases" is unlikely in my case, but taking it as fact), I would tentatively interpret my roommates remarks as indicating his frustration rather than my disposition. However, I would take the probability of being mistaken as high enough that I would attempt to find some way to defuse the situation that would work either way - most likely, arbitration from a mutually trusted party. Problem 3: I would quickly review what I know about the debate, and conclude that I have received no additional evidence one way or the other. I would continue to be confident in my naturalist worldview. After reading your answers: Problem 2: I notice that you interpret "you feel strongly that you could never have such biases" differently to how I interpret it - I would not feel thus without an observed track record of myself supporting that conclusion. My actions are scarcely changed from those implied by your judgement, however.
-1NancyLebovitz
Problem 2: I'd work on finding out what criteria we were using. In general, I believe that I can tell when I'm going off balance. I'm not sure if I can test this, but I get the impression that most people have no clue at all about when they're going off balance. I will also note that even if I feel I'm going off balance, there may not be anything I can do about it in the short run. Problem 3: I'm an agnostic, not an atheist. That being said, I would notice that the Christian is using a circular system of proof, and not agree with them.
4AnnaSalamon
3NancyLebovitz
Since I know those people, I would weight their answers according to my best estimate of their skill at such tasks, and then average the whole group, including me.
4Peter_de_Blanc
Doing this correctly can get pretty complicated. Basically, the more people you have, the less you should weight the low-quality estimates compared to the high-quality estimates. For example, suppose that "good" thermometers are unbiased and "bad" thermometers are all biased in the same direction, but you don't know which direction. If you have one thermometer which you know is good, and one which you're 95% sure is good, then you should weight both measurements about the same. But if you have 10^6 thermometers which you know are good, and 10^6 which you're 95% sure are good, then you should pretty much ignore the possibly-bad ones.
0NancyLebovitz
Not that it matters tremendously, but I was thinking of the jelly bean problem.
2Jonathan_Graehl
What kind of weighted average?
2NancyLebovitz
My math isn't good enough to formalize it-- I'd do it by feel.
2Jonathan_Graehl
Drat - likewise.
1RobinZ
Before reading your answer: Human beings are bad at estimating volumes, as opposed to lengths. I would form my estimate by observing the apparent density of jellybean in the jar (e.g. by examining a square centimeter cross-section), observing the dimensions, and multiplying. Then, on the second stage, I would discard estimates which are radically different from mine (cutoff to be chosen based on observed distribution), and take the mean of the remaining. I would allow myself to be influenced in my choice of data to include by those whose data I was already inclined to include in my average. After reading your answer: Should I notice an apparent and popular upweighting of certain responses such as you suggest, I would increase the weight of those in my average.
0cgm_E
I would look for response clusters. Each participant could have a different counting method rendering different results (e.g. - estimate volumes/ count radius & height/ estimate there's an empty cone at the top which you don't see), and some methods could be common pitfalls. Therefore, some results - those obtained by a wrong way of counting, should be discarded, otherwise the median result would lead away from the right result. In order to decide which is the right response cluster, trying to figure out each method/mistake and determining the correct one would be useful. Of course, your method is not necessarily the right one, just because it's yours.

Should you discount multiple opinions which are based on the same information source?

6jimrandomh
It matters how confident you are in the original information source, and how confident you are that it was relayed properly. Suppose the question you is "Will it rain tomorrow?" In the first scenario, you ask some people, and each one pulls out their phone, looks at it, and says "weather.com says yes". In this case, the probability that it will rain is almost exactly equal to the accuracy of the original information source, and the additional opinions add nothing. In the second scenario, you ask some people, and each of them says "I checked weather.com's forecast this morning, and I think it said yes." In this case, your estimate of the probability that it will rain is a bit lower, because they might have mis-remembered the forecast; but as you ask more people, your estimate should increase asymptotically towards your estimate of the forecast's accuracy.

Therein lies a tail... Even in the first, thermometer, problem, there is still a question about whether to average or to take the median. Roughly speaking, if one expects some form of independent random additive noise in each thermometer, the choice of what to do with outliers depends on what one's prior for the expected noise distribution looks like. If one expects a gaussian, the variance of the distribution is finite, and one does better by averaging the readings. If one expects a distribution with long tails, with an unbounded variance, then one wa...

Problem 1: Jelly-beans

I entered such a contest as a child. I mentally divided the jar into its different conic sections, took a string and measured each section in units of jelly beans, then computed its volume in jelly beans.

I came in second. There's always someone whose wild guess is better than your best calculations.

1DanielLC
So? You did far better than most of the people who made a wild guess.
1RobinZ
How do you know it was a wild guess? And how many (possibly-)wildly-guessing competitors were there?

Problem 1 is basically a noisy thermometers problem, except that the noise is gaussian in the estimate of the length/density along one dimension, not the number given. So I would take the cube root of each answer (including my own), then average them, then cube the result to make my estimate. If I thought one person was a particularly good or bad estimator, I would apply that as a weighting in the middle step.

2gerg
I'm mathematically interested in this procedure; can you please provide a reference?
0jimrandomh
I don't have a reference because the procedure is not rigorous; I came up with it off the top of my head. The intuition is that each of the contestants would've estimated the linear density of the jelly-beans, which is the same on all axes, and then cubed it, so you invert that by taking the cube root to get their actual estimates. To make this rigorous, you'd also have to account for the fact that the jar isn't actually a cube, which I have not done. I'd start by reducing the volume calculation to a bounding box (a cuboid) and a constant multiplicative factor, and assuming that everyone knows the correct constant factor for a cylinder. The length being different between the three dimensions does make a difference. I suspect (but have not proven) that having the jar, say, twice as tall as its diameter, would cause my procedure to act as though the error distribution for the height was twice as large. If anyone knows of a source that handles this class of problem rigorously, please do post it. If not, perhaps it'd make a good exercise for someone looking for topics to write papers on.

For problem 2, the answer is that you should be able to test whether you are upset directly, using introspection (perhaps combined with a mental trick or two), and if you do it right, the result of this test should be much better evidence of your mental state than your roommate's observation would be. However, performing this test is a skill and the problem description doesn't mention having it. So if you've explicitly practiced inspecting your mental state, then you should mostly ignore your roommate, but if you haven't then you should listen to him.

(But ...

3NancyLebovitz
Fairness and housework may not be best handled as an enumeration problem. I know a family (two adults, one child) which started by listing the necessary housework, and then each listing which things they liked doing, which they disliked, and which they were neutral about, and came to a low-stress agreement. Admittedly, this takes good will, honesty, and no one in the group who's too compulsive about doing or not doing housework.
1bluej100
Steven Brams has devised some fair division algorithms that don't require good will: see his surplus procedure ( http://en.wikipedia.org/wiki/Surplus_procedure ) and his earlier adjusted winner procedure ( http://en.wikipedia.org/wiki/Adjusted_Winner_procedure ).

1. I look at the jar and estimate how many jellybabies wide and tall is the filled volume, do some mental arithmetic, and reduce the answer according to an estimate of the packing fraction, which I would expect to be the greatest source of error.

Anyone else can do the same thing, if they're smart enough to not just pull a figure out of the air. If I think the other contestants are unlikely to be, I ignore them. It's like the thermometer example, except that I have reason to think my the...

3gerg
What if everyone else's estimate is between 280 and 320? Do you discard your own estimate if it's an outlier? Does the answer depend on whether you can find an error in your reasoning?
5Richard_Kennaway
Maybe I've made an error no-one else made. Maybe everyone else made an error I didn't make. (I have personally experienced this. I knew what error everyone else was making and stuck to my answer, which in the end turned out to be right.) The thing to do is to find out why the discrepancy happened; then I will know what to do about it. In some situations this will not be possible. Then I will have to just make an optimal Bayesian calculation based on limited information, i.e. guess. But "optimal" no more implies "accurate" than "statistically significant" implies "important".

I've been thinking about something like this.

If there's two people arguing, there's a 50% chance a given one is right. There is generally no reason to believe the the correct one happens to be you.

That only really adds a bit of evidence against you, which doesn't seem like much. That said, if the other person realizes this, and doesn't change their mind, their evidence was one bit stronger than you previously thought.

Furthermore, if they realize that you know that and haven't changed your mind, and they still don't change their mind, that adds another bit.

Etc.

If both people understand this, they will either immediately figure that the one who's more sure is probably right, or just not be sure.

[-][anonymous]00

Therein lies a tail... Even in the first, thermometer, problem, there is still a question about whether to average or to take the median. Roughly speaking, if one expects some form of independent random additive noise in each thermometer, the choice of what to do with outliers depends on what one's prior for the expected noise distribution looks like. If one expects a gaussian, the variance of the distribution is finite, and one does better by averaging the readings. If one expects a distribution with long tails, with an unbounded variance, then one wa...

What's stopping you from saying "Hey, why don't we average our thermometers?" You should at least see if they're updating on your suggestion for proper procedure before you update on their default procedure.