Navigating disagreement: How to keep your eye on the evidence

Anyone who hasn't already, check out Anna's OB post, Share likelihood ratios, not posterior beliefs.

(Anna: you write lots of great stuff; link it up!).

It was written about a year ago, but it's actually a good follow up to this post. The point is that, ideally, people would share raw observations. But sometimes that's too slow, so instead we should share a form of summarized evidence. Sharing opinions is a noisy way to do that, because other peoples' prior beliefs get needlessly mixed in with the observations, and then with your opinion, just like the s... (read more)

[-]HalFinney16y60

Let me give an argument in favor of #4, doing what the others do, in the thermometer problem. Now we seem to have them behaving badly. I think in practice many people would in fact look at other thermometers too in making their guesses. So why aren't they doing it? Two possibilities: they're stupid; or they have a good reason to do it. An example good reason: some thermometers don't read properly from a side angle, so although you think you can see and read all of them, you might be wrong. (This could be solved by #3, writing down the average of the cards,... (read more)

[-]Divide16y50

“is an accurate belief” is a property of the belief only

Technically, it's a property of the (belief, what-the-belief-is-about) pair. Beliefs can't be accurate by themselves; there has to be an external referent to compare them with. (Only-)self-referencing beliefs degrade straighforwardly to tautologies or contradictions.

[-]AnotherKevin16y50

The thermometer answer is wrong, you're ignoring that you're on a game show On a game show the producers try to organize things such that few people (or only one person) wins a challenge. As such I would expect all but one thermometer to be in error. Furthermore by watching old episodes of the show I could tell if only one thermometer will be right or if several contestants also succeed at each challenge and therefore either pick the small clump or the lone outlier.

7jimrandomh16y

This is a very good point. Since you might be being messed with, you should run every sanity check you can think of. In increasing order of difficulty and also increasing order of value: get the room temperature from all the thermometers; take your own temperature; ask for a drink of ice water and take its temperature. You should also consider the possibility that all of the other contestants are actors with fake thermometers.

0DanielLC16y

It's a metaphor, like the Monty Haul problem. The fact that that's not how game shows really work doesn't matter.

[-]Nick_Tarleton16y40

You’re asked to estimate the number of jelly-beans in a jar. You have a group of friends with you. Each friend privately writes down her estimate, then all of the estimates are revealed, and then each person has the option of changing her estimate.
How should you weigh: (a) your own initial, solitary estimate; (b) the initial estimates of each of your friends; (c) the estimates your friends write down on paper, after hearing some of the others’ answers?

I start by asking them how they made their initial estimates, and how they used others'.

This might see... (read more)

[-]RobinHanson16y40

You should consider all the information you have, from all the (randomly allocated, and so informationally symmetric) thermometers. It doesn’t matter who was handed which thermometer. Forming accurate beliefs is normally about this simple. If you want the most accurate beliefs you can get, you’ll need to pay attention to the evidence. All of the evidence. Evenly.

This gives the impression that you think that normally one can just collect all the relevant evidence, after which you don't need to consider anyone else's opinion. I suppose it depends on what sort of world you live in, but that seems far from the normal situation to me.

5Paul Crowley16y

It's artificial in exactly the way a trolley problem is and with the same virtues, surely?

[-]Shae16y40

Let’s say I want to know whether it’s safe for my friend to bike to work. My own memories are truth indicative, but so are my friends’ and neighbors [and online surveys]... The trouble is my own memories arrive in my head with extreme salience, and move my automatic anticipations a lot; while my friend’s have less automatic impact, and those of the surveyed neighbors still less...our automatic cognition tends not to weigh the evidence evenly at all. <

I sometimes wonder, though, if giving one's own experiences greater weight in situations like these ... (read more)

[-]thomblake16y100

I sometimes wonder, though, if giving one's own experiences greater weight in situations like these (though not in the thermometer situation) is rational:

The relevant question, I believe, is how much weight you should give the evidence from different sources. You should not think that the amount of weight we intuitively give evidence from our own experience is optimal, and this permits a reversal test.

[-]AnnaSalamon16y40

My take on how to get the best estimates, in separate comments for tidier discussion threads:

9AnnaSalamon16y

Re: Problem 4: Roughly speaking: yes. Ordinary disagreements persist after hearing others' estimates. A and B may start out asserting "50" and "10", and then argue their way to "25" and "12", then "23" and "17". But if you want each estimate to be as accurate as possible, this is silly behavior; if A can predict that his estimate will go down over time (as he integrates more of B's evidence), he can also predict that his current estimate is too high -- and so he can improve his accuracy by lowering his estimate right now. The two parties should be as likely to overshoot as to undershoot in their disagreements, e.g.: A: 50; B: 10 A: 18; B: 22 A: 21; B: 21. So next time you're in a dispute, try applying Principle 3: ask what an outside observer would say about the situation. If Alfred and Betty both apply this principle, they'll each ask: "What would an outside observer guess about Lake L, given that Betty has studied geography and said "10", while Alfred said "50"?" And, thus viewing the situation from the (same) outside, Betty and Alfred will both weigh Betty's evidence about equally. Alfred may underweight Betty's impression (e.g., because he doesn't realize she wrote her thesis on Lake L) -- but he may equally overweight Betty's opinion (e.g., because he doesn't realize that she's never heard of Lake L either). If he could predict that he was (over/under) weighting her opinion, he'd quit doing it. More precisely: if you and your interlocutor can predict your direction of disagreement, at least one of you is forming needlessly inaccurate estimates.

3NancyLebovitz16y

Before I read your reply, I assume that Alfred will lower his estimate a lot, and Betty might raise her estimate a little. I expect Betty's estimate to still be lower than Alfred's, though the size of these effects would be dependent on how much more geography Betty knows than Alfred. After reading your reply, I think you're right about convergence, and definitely right about driving your answer towards what you think is correct as fast as possible rather than holding back for fear of seeming to give in.

1Jonathan_Graehl16y

It's an interesting problem, and you're not doing it justice. A and B have a prior based on certain evidence. Their first guess conveys only the mean of that prior. You also posit that they have a shared belief about the (expected) amount of evidence behind their prior. To update at each iteration, they need to infer what evidence about the world is behind the exchange of guesses so far. I don't agree with anything you've claimed about this scenario. I'll grant you any simplifying assumptions you need to prove it, but let's be clear about what those assumptions are.

0steven046116y

If they're only similarly rational rather than perfectly rational, they'll probably both be biased toward their own estimates. It also depends on common knowledge assumptions. As far as I know two people can be perfectly rational, and both can think the other is irrational, or think the other is rational but thinks they're irrational and therefore won't update, and therefore not get to an equilibrium. So I would disagree with your statement that: In general, the insights needed to answer the questions at the end of the post go beyond what one can learn from the ultra-simple "everyone can see the same evidence" example at the start of the post, I think.

7AnnaSalamon16y

Re: Problem 2: Take an even probability distribution involving your feelings and your roommate’s feelings on housework (and on who’s emotionally biased). You have no reason to treat your and your roommate's feelings as asymmetrically indicative (unless unbiased indicators have told you that you're especially above- or below- average at this sort of thing). It’s like the thermometers, again. Re: Problem 3: Keep your belief in atheism. Your evidence against a Christian god is way stronger than any evidence provided by your roommate's assertion. Despite the superficial symmetry with Problem 2, the prior against the complex hypothesis of a Christian god is many orders of magnitude stronger than the prior against you being wilfully mistaken about the housework -- and these orders of magnitude matter. (Though note that this reasoning only works because such "extraordinary claims" are routinely made without extraordinary evidence; psychology and anthropology indicate that p( your roommate's assertion | no Christian god) is relatively large -- much larger than a simplicity prior would assign to p(Christian god), or p(flying spaghetti monster).

5AlexMennen16y

No, problems 2 and 3 are symmetrical in a more than superficial way. In both cases, the proper course of action is to attempt to conduct an unbiased evaluation of the evidence and of the biases affecting each of you. The difference is, in problem 3, we have already encountered and evaluated numerous nearly identical situations, so it is easy to come to the proper decision, whereas in problem 2, the situation could be new and unique, and missing background information about the effects of bias on the two individuals and the accuracy of their predictions becomes important.

0MartinB16y

The description of both problem 2 and 3 indicates a possible biasing in both participants. Its therefore reasonable to cool down first, and then check the evidence. In problem 3 roommate might point out valid criticisms about biases one might have, while still being wrong on the question itself. Either way its not rational to argue when in heat.

0RobinZ16y

Before reading your answers: Problem 2: Given the stated conditions ("you feel strongly that you could never have such biases" is unlikely in my case, but taking it as fact), I would tentatively interpret my roommates remarks as indicating his frustration rather than my disposition. However, I would take the probability of being mistaken as high enough that I would attempt to find some way to defuse the situation that would work either way - most likely, arbitration from a mutually trusted party. Problem 3: I would quickly review what I know about the debate, and conclude that I have received no additional evidence one way or the other. I would continue to be confident in my naturalist worldview. After reading your answers: Problem 2: I notice that you interpret "you feel strongly that you could never have such biases" differently to how I interpret it - I would not feel thus without an observed track record of myself supporting that conclusion. My actions are scarcely changed from those implied by your judgement, however.

-1NancyLebovitz16y

Problem 2: I'd work on finding out what criteria we were using. In general, I believe that I can tell when I'm going off balance. I'm not sure if I can test this, but I get the impression that most people have no clue at all about when they're going off balance. I will also note that even if I feel I'm going off balance, there may not be anything I can do about it in the short run. Problem 3: I'm an agnostic, not an atheist. That being said, I would notice that the Christian is using a circular system of proof, and not agree with them.

4AnnaSalamon16y

Re: problem 1: Jelly bean number estimates are just like thermometer readings, except that the reading is in someone’s head, rather than their hand. So the obvious answer is to average everyone’s initial, solitary impressions, absent reason to expect one individual or another is an above-average (or below-average) estimator. If your friends use lopsided weighting schemes in their second answers, should you re-update? This depends a lot on your friends. * Don't re-update from their answers if you think they don't understand the merits of averaging; you want to weight each person's raw impression evenly, not to overweight it based on how many others were randomly influenced by it (cf. information cascades: http://en.wikipedia.org/wiki/Information_cascade). * Do re-update if your friends understand the merits of averaging, such that their apparent over-weighting of a few peoples' datapoints suggests they know something you don't (e.g., perhaps your friend Julie has won past championships in jelly-bean estimation, and everyone but you knows it).

3NancyLebovitz16y

Since I know those people, I would weight their answers according to my best estimate of their skill at such tasks, and then average the whole group, including me.

4Peter_de_Blanc16y

Doing this correctly can get pretty complicated. Basically, the more people you have, the less you should weight the low-quality estimates compared to the high-quality estimates. For example, suppose that "good" thermometers are unbiased and "bad" thermometers are all biased in the same direction, but you don't know which direction. If you have one thermometer which you know is good, and one which you're 95% sure is good, then you should weight both measurements about the same. But if you have 10^6 thermometers which you know are good, and 10^6 which you're 95% sure are good, then you should pretty much ignore the possibly-bad ones.

0NancyLebovitz16y

Not that it matters tremendously, but I was thinking of the jelly bean problem.

2Jonathan_Graehl16y

What kind of weighted average?

2NancyLebovitz16y

My math isn't good enough to formalize it-- I'd do it by feel.

2Jonathan_Graehl16y

Drat - likewise.

1RobinZ16y

Before reading your answer: Human beings are bad at estimating volumes, as opposed to lengths. I would form my estimate by observing the apparent density of jellybean in the jar (e.g. by examining a square centimeter cross-section), observing the dimensions, and multiplying. Then, on the second stage, I would discard estimates which are radically different from mine (cutoff to be chosen based on observed distribution), and take the mean of the remaining. I would allow myself to be influenced in my choice of data to include by those whose data I was already inclined to include in my average. After reading your answer: Should I notice an apparent and popular upweighting of certain responses such as you suggest, I would increase the weight of those in my average.

0cgm_E16y

I would look for response clusters. Each participant could have a different counting method rendering different results (e.g. - estimate volumes/ count radius & height/ estimate there's an empty cone at the top which you don't see), and some methods could be common pitfalls. Therefore, some results - those obtained by a wrong way of counting, should be discarded, otherwise the median result would lead away from the right result. In order to decide which is the right response cluster, trying to figure out each method/mistake and determining the correct one would be useful. Of course, your method is not necessarily the right one, just because it's yours.

[-]NancyLebovitz16y30

Should you discount multiple opinions which are based on the same information source?

7Academian16y

Follow Anna's Why would you? advice. The point is simply to have a reliable computation performed on the observations, and you do whatever is equivalent to that. If the opinion involves a computation from the information source that is difficult enough that people might do it wrong, then count more sources as more evidence. After a math exam, when you poll your friends, * "Who answered pi for number 6?", it is rational to be more confident in "pi" as more of your computationally skilled friends answer "pi", even though it all came from the same information: the exam question. This is similar to the phenomenon that checking over your own workings should typically make you more confident, depending on how complex they are. Another sort of such a computation is memory itself. Some people fail to compute the correct present memories from their past exposure to stimuli. So if you want to know * "Was there a red card in the parking lot?", more witnesses should make you more convinced, even if they're all honest people... they might just have bad memories. But if you have 3 friends standing in your dining room right now, and you ask them * "Are there enough chairs in there for all 4 of us?", and someone says "yes", additional yesses should contribute very little marginal confidence. In summary, Extra opinions on the same information are redundant only insofar as computational error checking is redundant.

6jimrandomh16y

It matters how confident you are in the original information source, and how confident you are that it was relayed properly. Suppose the question you is "Will it rain tomorrow?" In the first scenario, you ask some people, and each one pulls out their phone, looks at it, and says "weather.com says yes". In this case, the probability that it will rain is almost exactly equal to the accuracy of the original information source, and the additional opinions add nothing. In the second scenario, you ask some people, and each of them says "I checked weather.com's forecast this morning, and I think it said yes." In this case, your estimate of the probability that it will rain is a bit lower, because they might have mis-remembered the forecast; but as you ask more people, your estimate should increase asymptotically towards your estimate of the forecast's accuracy.

[-]soreff16y20

Therein lies a tail... Even in the first, thermometer, problem, there is still a question about whether to average or to take the median. Roughly speaking, if one expects some form of independent random additive noise in each thermometer, the choice of what to do with outliers depends on what one's prior for the expected noise distribution looks like. If one expects a gaussian, the variance of the distribution is finite, and one does better by averaging the readings. If one expects a distribution with long tails, with an unbounded variance, then one wa... (read more)

[-]PhilGoetz16y10

Problem 1: Jelly-beans

I entered such a contest as a child. I mentally divided the jar into its different conic sections, took a string and measured each section in units of jelly beans, then computed its volume in jelly beans.

I came in second. There's always someone whose wild guess is better than your best calculations.

1DanielLC16y

So? You did far better than most of the people who made a wild guess.

1RobinZ16y

How do you know it was a wild guess? And how many (possibly-)wildly-guessing competitors were there?

[-]jimrandomh16y10

Problem 1 is basically a noisy thermometers problem, except that the noise is gaussian in the estimate of the length/density along one dimension, not the number given. So I would take the cube root of each answer (including my own), then average them, then cube the result to make my estimate. If I thought one person was a particularly good or bad estimator, I would apply that as a weighting in the middle step.

2gerg16y

I'm mathematically interested in this procedure; can you please provide a reference?

0jimrandomh16y

I don't have a reference because the procedure is not rigorous; I came up with it off the top of my head. The intuition is that each of the contestants would've estimated the linear density of the jelly-beans, which is the same on all axes, and then cubed it, so you invert that by taking the cube root to get their actual estimates. To make this rigorous, you'd also have to account for the fact that the jar isn't actually a cube, which I have not done. I'd start by reducing the volume calculation to a bounding box (a cuboid) and a constant multiplicative factor, and assuming that everyone knows the correct constant factor for a cylinder. The length being different between the three dimensions does make a difference. I suspect (but have not proven) that having the jar, say, twice as tall as its diameter, would cause my procedure to act as though the error distribution for the height was twice as large. If anyone knows of a source that handles this class of problem rigorously, please do post it. If not, perhaps it'd make a good exercise for someone looking for topics to write papers on.

[-]jimrandomh16y10

For problem 2, the answer is that you should be able to test whether you are upset directly, using introspection (perhaps combined with a mental trick or two), and if you do it right, the result of this test should be much better evidence of your mental state than your roommate's observation would be. However, performing this test is a skill and the problem description doesn't mention having it. So if you've explicitly practiced inspecting your mental state, then you should mostly ignore your roommate, but if you haven't then you should listen to him.

(But ... (read more)

3NancyLebovitz16y

Fairness and housework may not be best handled as an enumeration problem. I know a family (two adults, one child) which started by listing the necessary housework, and then each listing which things they liked doing, which they disliked, and which they were neutral about, and came to a low-stress agreement. Admittedly, this takes good will, honesty, and no one in the group who's too compulsive about doing or not doing housework.

1bluej10016y

Steven Brams has devised some fair division algorithms that don't require good will: see his surplus procedure ( http://en.wikipedia.org/wiki/Surplus_procedure ) and his earlier adjusted winner procedure ( http://en.wikipedia.org/wiki/Adjusted_Winner_procedure ).

[-]Richard_Kennaway16y10

(Written before and edited where marked after reading the comments.)

1. I look at the jar and estimate how many jellybabies wide and tall is the filled volume, do some mental arithmetic, and reduce the answer according to an estimate of the packing fraction, which I would expect to be the greatest source of error.

Anyone else can do the same thing, if they're smart enough to not just pull a figure out of the air. If I think the other contestants are unlikely to be, I ignore them. It's like the thermometer example, except that I have reason to think my the... (read more)

3gerg16y

What if everyone else's estimate is between 280 and 320? Do you discard your own estimate if it's an outlier? Does the answer depend on whether you can find an error in your reasoning?

5Richard_Kennaway16y

Maybe I've made an error no-one else made. Maybe everyone else made an error I didn't make. (I have personally experienced this. I knew what error everyone else was making and stuck to my answer, which in the end turned out to be right.) The thing to do is to find out why the discrepancy happened; then I will know what to do about it. In some situations this will not be possible. Then I will have to just make an optimal Bayesian calculation based on limited information, i.e. guess. But "optimal" no more implies "accurate" than "statistically significant" implies "important".

[-]DanielLC16y00

I've been thinking about something like this.

If there's two people arguing, there's a 50% chance a given one is right. There is generally no reason to believe the the correct one happens to be you.

That only really adds a bit of evidence against you, which doesn't seem like much. That said, if the other person realizes this, and doesn't change their mind, their evidence was one bit stronger than you previously thought.

Furthermore, if they realize that you know that and haven't changed your mind, and they still don't change their mind, that adds another bit.

Etc.

If both people understand this, they will either immediately figure that the one who's more sure is probably right, or just not be sure.

[-][anonymous]16y00

[-]JamesAndrix16y00

What's stopping you from saying "Hey, why don't we average our thermometers?" You should at least see if they're updating on your suggestion for proper procedure before you update on their default procedure.

[+]ZeroBlacktip16y-80

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

47

Navigating disagreement: How to keep your eye on the evidence

47

47