Techniques for probability estimates

4th Jan 2011

24bill

12Mass_Driver

0SK2

12Vladimir_M

7shokwave

8Osmium_Penguin

1orthonormal

0Osmium_Penguin

8endoself

6Dreaded_Anomaly

3Skatche

7gsgs

7TheOtherDave

6Manfred

4JoshuaZ

14Mass_Driver

3Vladimir_M

3Liron

5WrongBot

1Spurlock

1Liron

0Spurlock

0Liron

0Spurlock

1scav

2Nornagest

3Simon Fischer

1eitan weiss

1fischer

2shokwave

0fischer

1gfrlog

2JGWeissman

2gfrlog

2JGWeissman

1gfrlog

0JGWeissman

4jimrandomh

0gfrlog

0JGWeissman

1gfrlog

0Will_Sawin

0gfrlog

0Will_Sawin

0gfrlog

1djcb

1marc

0NancyLebovitz

0gsgs

0fischer

1jimrandomh

0fischer

1Desrtopa

3shokwave

5Desrtopa

1shokwave

1Will_Sawin

0orthonormal

0orthonormal

0Scott Alexander

New Comment

60 comments, sorted by Click to highlight new comments since: Today at 1:13 AM

From Spetzler and Stael von Holstein (1975), there is a variation of Bet On It that doesn't require risk neutrality.

Say we are going to flip a thumbtack, and it can land heads (so you can see the head of the tack), or tails (so that the point sticks up like a tail). If we want to assess your probability of heads, we can construct two deals.

Deal 1: You win $10,000 if we flip a thumbtack and it comes up heads ($0 otherwise, you won't lose anything). Deal 2: You win $10,000 if we spin a roulette-like wheel labeled with numbers 1,2,3, ..., 100, and the wheel comes up between 1 and 50. ($0 otherwise, you won't lose anything).

Which deal would you prefer? If you prefer deal 1, then you are assessing a probability of heads greater than 50%; otherwise, you are assessing a probability of heads less than 50%.

Then, ask the question many times, using a different number than 50 for deal 2. For example, if you first say you would prefer deal 2, then change it to winning on 1-25 instead, and see if you still prefer deal 2. Keep adjusting until you are indifferent between deal 1 and 2. If you are indifferent between the two deals when deal 2 wins from 1-37, then you have assessed a probability of 37%.

The above describes one procedure used by professional decision analysts; they usually use a physical wheel with a "winning area" that is adjustable continuously rather than using numbers like the above.

Who are "professional decision analysts?" Where do they come from, and who are their clients/employers? Do they go by any other names? This sounds fascinating.

A major problem with these approaches is that for the majority of real-life questions, the circuits in your brain that are best capable of analyzing the situation and giving you an answer along with a vague feeling of certainty are altogether different from those that you can use to run these heuristics. This is why, in my opinion, attempts to assign numerical probabilities to common-sense judgments usually don't make sense.

If your brain has the ability to make a common-sense judgment about some real-world phenomenon, this ability will typically be implemented in the form of a black-box module that will output the answer along with some coarsely graded intuitive feeling of certainty. You cannot open this black box and analyze its algorithms in order to upgrade this vague feeling into a precise numerical probability estimate. If you instead use heuristics that yield numerical probabilities, such as finding reference classes, this means side-stepping your black-box module and using an altogether different algorithm instead -- and the probability estimate you'll arrive at this way won't be pertinent to your best analysis that uses the black-box intuition module.

Surely you can teach yourself to compare intuitive certainty to probabilities, though. I mean, if you come up with rough labels for levels of intuitive certainty, and record how often each label is right or wrong, you'd get a really rough corresponding probability already.

Edit: Oh, this is predictionbook's *raison d'être*.

Inspired by your final paragraph, I sought out a variety of test questions on the web -- both on Steven's blog and elsewhere. I was expecting systematic overconfidence, with a smaller chance of systematic underconfidence, throughout the probability spectrum.

Instead I found a very interesting pattern.

When I was 90% or 95% certain of a fact, I was slightly overconfident. My 90% estimates shook out at about 80%, and my 95% estimates shook out around 90%. When I was completely uncertain of a fact, I was also slightly overconfident, but within the realm of experimental error.

But when I was just 50% confident of a fact, I was *almost always* wrong. Far more often than anyone could achieve by random guessing: my wrongness was thorough and integrated and systematic.

Clearly, that feeling of slight concern which I've always interpreted as, "I think I remember X, but it could go either way," actually means something closer to, "X is not true; my beliefs are inconsistent."

If I'm sure I know something, I probably do. If I'm sure I'm clueless, I probably am. But if I *think I might* know something, then I almost certainly have it backwards.

Is this a common bias which I should have read about by now?

I think most of these have the same limitation. When the numbers are to big, such as 1000 comparable cases or a 1 in 1000 chance, the human brain cannot intuitively grasp what to do. We are really only optimized for things in a central range (and, obviously, not even that under many circumstances). Rarer events, at least ones that do not occur in aggregate, do not produce sufficient optimization pressures. At some point, all the hard parts must be reduced to purely mathematical questions. If you can actually think of 10 corresponding situations, or remember the average of 100 past situations, you can use that, but picturing yourself dealing with 10 000 of something does not feel very different than picturing 100 000.

That's definitely a problem; a million is a statistic. I think we can try to work around it in some cases, though. You mentioned the numbers 10,000 and 100,000; one might convert these into a car and a house, respectively, by estimating costs. By interpreting such large numbers in terms of these real concepts, we get a more concrete sense of the difference between such large numbers. You can then think of the issue in terms of how often you use the car vs. the house, or even how much time you're going to spend paying them off. That reduces the difference to something manageable. Obviously, this won't work in all cases, and the weight or cost of even a real concept can vary based on the person and their location (spatial and temporal), but it can be worth trying.

For another example, consider the way people sometimes talk about government budgets. Someone might be outraged at $100 million going to a certain area, out of the overall budget of $50 billion. "Million" and "billion" are usually processed by our brains as just "big," so we focus on the 100 and the 50, and 100 is bigger than 50, so... outrage! But if we divide by a million, we have $100 (a new cell phone) vs. $50,000 (a year of college tuition, or an expensive car). The difference is much clearer.

A technique I use to get around this problem is to think in terms of orders of magnitude. What you can do is ask yourself (for example) about being in ten corresponding situations, then ask yourself about that (i.e. the set of ten situations) happening ten times, then about *that* happening ten times. This is also, with a little practice, an effective way to develop a visceral (and accordingly mind-blowing) sense of cosmic/microscopic scales, long periods of time, and so forth - cf. the Powers of Ten video.

I just found this with google. I spent much time in 2005-2007 to get experts assign a subjective probability to a severe (H5N1) pandemic with >100M deaths. This was a strange experience. Experts didn't want to give probabilities but painted a somehow dark picture in interviews.Economists ignored the problem in their models (mortality bonds rating). Among the few who gave estimates were Bob Gleeson and Michael Steele with ~20% per year. The same problem occurs in other sciences : ask your surgeon for the probability that you'll die or your lawer for the probability to win the process or your teacher for the probability that you'll pass the exam or the candidate for his probability to win the election or the president for his probability of a nuclear war or global recession etc. These would be useful information, even if only subjective, informal. Yet people usually won't give them. Make a better society with people giving probability estimates !

pandemic probabilities: http://www.setbb.com/fluwiki2/viewforum.php?f=10&sid=2d1caa0fad5093a8c4291f45e0d39b67&mforum=fluwiki2

Nice. Much of this has been covered implicitly in various comments, but having it all in one place is lovely.

Something that is perhaps obvious but seems worth saying explicitly is that these techniques aren't mutually exclusive, and using several of them can help contrast their relative weaknesses and strengths.

For example, if I want to put a number on my estimate that I will die tomorrow (D), I can prepare for revelation and observe that, if I imagine Omega appearing and offering to tell me whether D, I'd become very anxious. So I might conclude that my estimated probability of D is significant enough to worry about... on the order of 10%, say.

But I can also look at the reference class of people like me and see how many of them die on any given day. Ideally I'd look up my age and other factors on an actuarial table, but I don't feel like bothering, so instead I take the less reliable reference class of humans... ~150000 deaths per day out of 6.8 billion people is something on the order of 2e-5.

And I can also convert to a frequency... I've lived ~15,000 days without dying, which gets me ~7e-5.

And now I can compare these techniques. [EDIT: I mean, compare them in the specific context of this question.]

Converting to a frequency has the bizarre property that my dying becomes less and less likely as I get older, when common sense tells me the opposite is true. Reference class of humans probably has the same property, but to a lesser degree. Actuarial reference class *doesn't* have this property, which is a point in its favor (no surprise there). Preparing for revelation relies on some very unreliable intuitions. And so forth.

(I conclude, incidentally, that I have a poorly calibrated fear of dying. This is unsurprising; after my recent stroke it was a major post-traumatic problem. I've worked it down to a tolerable level, but it does not surprise me at all that it's still orders of magnitude higher than it should be.)

My largest problem with this post comes right at the beginning:

humans are not utility maximizers and often refuse to give precise numerical probabilities. Nevertheless, their actions reflect a "hidden" probability.

.

although a mind "hosts" a probability estimate, that mind does not arbitrarily determine the estimate, but rather calculates it according to mathematical laws from available evidence.

If our basic brain processes are irrational, there is no reason why there has to be a well-defined probability in there somewhere. You might extract a probability by some method or another, but you might also extract method-dependent gobbledygook.

Regarding bettting, one way of handling the issue of diminishing marginal returns for money is to use bets that are small compared to your net worth. When the bet size is small, the ratio of utility should be close to linear. This doesn't work perfectly, but it does help reduce the problem some what. Of course, this only works for issues where the expected probability is not extreme (by the time one gets to more than 10 to 1 this starts to break down).

This doesn't work great even if you deal with moderate probabilities, because you need high fractions of net worth to get people to stop signaling...if I am a Yankees fan who earns $50,000 a year, I will bet $10 at even odds that the Yankees will win even if my available data would only predict a 40% chance for the Yankees to win. The expected loss of $1 doesn't even come close to the expected loss of appearing not to love the pinstriped sluggers with all my heart.

Mass_Driver:

This doesn't work great even if you deal with moderate probabilities, because you need high fractions of net worth to get people to stop signaling...

Yes, and there's also the issue of transaction costs. Especially since transaction costs are basically the opportunity costs of the time spent arranging the bet and the payment, and for people with higher net worth, these opportunity costs are typically also higher.

Congrats, this is an excellent post. Why hasn't it been promoted?

I totally use #1 + #2 by imagining betting, and then imagining my reaction when the outcome is revealed. I try to think whether I would say:

- "Shit, I should have bet with lower confidence"

or

- "Wow, that is truly surprising"

I would just add a note that assigning really small probabilities is sometimes necessary but often fraught with danger. For example, I would not bet my life that the sun will rise tomorrow in exchange for $1, even though my life is not worth as much to me as 1 trillion times the utility of $1.

I would take that bet, but only because an event that caused the sun to not rise tomorrow would almost certainly kill me anyway.

I'm looking forward to using this kind of reasoning to profit off end-of-the-worlders in late 2012.

Well, that kind of reasoning and just my run-of-the-mill, "no I don't think the world is ending" reasoning.

Actually, it's not analogous, because you don't have any non-zero-sumness with the doomsday bettor beyond that which is always present when two parties have differing predictions.

Imagine two bettors who each try to maximize U_expected = P(e)U(e) + (1-P(e))U(~e)

Typically the bettors have the same U(e) and U(~e), and only disagree on P(e).

If you analyze the doomsday bet with e = "the world ends", then it's just a standard bet situation, because both bettors set U(the world ends) = 0.

If you analyze the doomsday bet with e = "whatever happens in the year 2013", then it's seemingly unusual in that both bettors set P(e) to the same value (1), but it's really not unusual because you can factor their respective probability of doomsday out of their U(e) values.

So why isn't my/WrongBot's bet analogous? Let's say Omega offered me $1 today in exchange for getting to kill me if the sun doesn't rise tomorrow. Let e = "sun doesn't rise tomorrow".

My bet with Omega has two properties that are not true about a typical zero-sum bet:

Since my U(e) is 0, Omega's U(e) must be positive for it to make that bet. Any time there's a contract that the parties enter into because of differing U(e) values, and the U(e) difference doesn't factor into a P(e_subevent) like in the 2012 doomsday bet, the contract is not so much a bet as a non-zero-sum trade.

I'd bet on ~e regardless of how high my P(e) is, because there's no P(e) that can make P(e)U(e) + (1-P(e))U(~e) < 0 for me. That's a general property of contracts which are guaranteed to make my life better than before, i.e. non-zero-sum trades.

I have to admit... I'm mostly confused by this comment. Not by the math, but by exactly what you're getting at/disagreeing with.

If you're just saying that the doomsday scenario isn't perfectly analogous to the Omega scenario, I accept this, and never meant to imply that it was. I was only pointing out that the "if I lose I'll be dead anyway" general type of reasoning could be applied to the other situation (and not necessarily through explicitly betting against the other party). If you're saying that it couldn't, then I confess that I still don't understand why from your comment.

I was only pointing out that the "if I lose I'll be dead anyway" general type of reasoning could be applied to the other situation (and not necessarily through explicitly betting against the other party).

My point is that actually, you don't get any extra expected value from the doomsayer's "if I lose I'll be dead anyway" reasoning. You get exactly as much expected value from them as you would get from anyone with any kind of prediction whose accuracy is lower than your own by the same amount.

In contrast, WrongBot did get to capitalize on a special "if I lose I'm dead" property of his bet, and my previous post details the important properties that make WrongBot's bet atypical (properties that your own bet does not have).

Ah, I see then where we miscommunicated. I meant that I, not he, would be applying that reasoning. I strongly anticipate not being dead, and for the purposes of this bet (and only for this bet) don't care if I'm wrong about it. He would strongly anticipate being dead, and might therefore neglect the possibility that he'll have to suffer the consequences of whatever we're doing. My losing the bet is "protected" (in a rather dreary way), his isn't.

Obviously, I haven't worked out the details, and probably won't actually go around taking advantage of these people, but it occurred to me the other day while I was pondering how one should almost always be able to turn better-calibrated expectations into utility.

Obviously, I haven't worked out the details, and probably won't actually go around taking advantage of these people

Hey, they'd be happy enough to still be alive, and you could donate the proceeds to eradicating polio. But unfortunately you'd also be encouraging people to take existential threats less seriously in general, which may be a bad idea. I can't decide.

Anyway, good luck finding a believer in any kind of woo who is prepared to make a cash wager on a testable outcome. Think how quickly we would have eradicated homeopathy and astrology by now! :)

I wonder if there's a reasonably straightforward way to find a function for yourself such that your subjective utility is roughly linear over f($). That'd make the betting approach a lot more widely applicable.

log( (net worth + payoff) / net worth) times some constant seems like a good start, but I already see some possible flaws.

I may be misunderstanding the multiple statements technique. I'd be willing to bet on upwards of a million to one odds that France is bigger than Italy. I would not be willing to make a million related sentences and expect to only be wrong once. The misunderstanding prob comes from the vagueness of "related" sentences. What does that mean?

In any case, the technique is only as good as the ability to judge which statements are equally difficult to a given statement

In order to do that you already have to know how sure you are of each sentence!! Isn't that cyclical?

I'd be more interested to know what LW thought of creating a Probability Distribution for a continuous outcome. This seems to be cumbersome with all of the above tools, which I'll admit are quite helpful for binary events; but when you're purchasing a new computer, it's months that the computer will last before breaking, not whether it breaks in the first two years, that is relevant.

If taken to inanity, one could construct a large number of binary outcomes and try to smash them together to get a probability distribution for a continuous variable. But, that's pretty annoying - surely there are better ways

For this, I would use the 'smash-together' method. "How many months have contained an experience of a computer breaking on me?" *over* "How many months have I owned computers?" will give me the probability of the computer breaking in any given month, and then the graph y=(1-pr(break))^x represents the continuous variable "My computer is not broken". This takes about five minutes: it's worth it for cars, computers, homes, smartphones, etc. But you're right, too annoying for smaller cases.

I think you could perform the dice rolling experiment without any need for security against tampering. To generate a random number from 0 to N-1, have every interested party generate their own number (roll their own die), then everybody reveals their numbers together and the group adds them all up and takes the remainder after dividing by N.

With that procedure everybody should be convinced that the result is at least as random as their own number.

It is if you use a commitment scheme. Such a thing allows you to commit to a value before revealing it. So you go in two steps -- everybody commits, then everybody reveals. Nobody can change their value after committing, so nobody can base their values on others' values.

But there's no paranoia involved. It's cryptographically quite simple. All you need is a hash function.

Contrast with all of the governments and all of their security agents and such and nobody really trusts that it's secure.

All you need is a hash function.

A hash function on a die roll is quite vulnerable to a dictionary attack. You could add a salt, but this makes hash collisions easier to take advantage of.

You wouldn't use a hash function that people could generate collisions with, any more than you would use ROT-13.

Of course a salt. Not sure why that would make hash collisions easier to take advantage of though. Presumably you use a good hash function.

The point is there are people who would not realize that you need a salt, or a hash function not vulnerable to collisions. Yes, there are existing solutions for this problem, but even choosing an existing solution from the space of security solutions to different problems is not trivial.

This provides an excellent demonstration of E.T. Jaynes's point that making something more random really means making it more complicated.

The point isn't to make it more random, the point is to make it more trustworthy. You can participate in the process and be confident that the result is random without having to put any trust in the other participants.

Regardless, your original language claiming it made it more random was correct, because it does make it more hard-to-predict-but-with-clear-symmetry, aka random.

The point isn't to make it more random, the point is to make it more trustworthy. You can participate in the process and be confident that the result is random without having to put any trust in the other participants.

Thanks, nice article! I think the calibration of our probability estimation organs is one of the items that deserve a lot more attention on LW.

This isn't hugely relevant to the post, but LessWrong doesn't really provide a means for a time-sensitive link dump, and it seems a shame to miss the opportunity to promote an excellent site for a slight lack of functionality.

For any cricket fans that have been enjoying the Ashes, here is a very readable description of Bayesian statistics applied to cricket batting averages.

That seems like a good thing to post in Discussion. Also, to the extent that it's about the math more than the particular matches, it isn't all that time sensitive.

let me add a link (communicating uncertainty) http://www.cmu.edu/dietrich/sds/docs/fischhoff/IST%20Communicating%20Uncertainty.pdf

and a discussion: http://psandman.com/gst2013.htm#numbers

I don't think experiments performed prior to the invention of video recording ought to count.

The god of the one true religion will refuse to intervene to punish its believers for cooperating with the members of all the other religions for the experiment.

Prediction: The die comes up 666.

Confidence: If N>665, slightly higher than 1/N due to ironic gods. If N<666, 0.

When you edit this comment, click the "Help" link to the lower right of the text box for more information on the Markdown syntax. It doesn't accept HTML, alas.

Utility maximization often requires determining a probability of a particular statement being true. But humans are not utility maximizers and often refuse to give precise numerical probabilities. Nevertheless, their actions reflect a "hidden" probability. For example, even someone who refused to give a precise probability for Barack Obama's re-election would probably jump at the chance to take a bet in which ey lost $5 if Obama wasn't re-elected but won $5 million if he was; such decisions demand that the decider covertly be working off of at least a vague probability.

When untrained people try to translate vague feelings like "It seems Obama will probably be re-elected" into a precise numerical probability, they commonly fall into certain traps and pitfalls that make their probability estimates inaccurate. Calling a probability estimate "inaccurate" causes philosophical problems, but these problems can be resolved by remembering that probability is "subjectively objective" - that although a mind "hosts" a probability estimate, that mind does not arbitrarily determine the estimate, but rather calculates it according to mathematical laws from available evidence. These calculations require too much computational power to use outside the simplest hypothetical examples, but they provide a standard by which to judge real probability estimates. They also suggest tests by which one can judge probabilities as well-calibrated or poorly-calibrated: for example, a person who constantly assigns 90% confidence to eir guesses but only guesses the right answer half the time is poorly calibrated. So calling a probability estimate "accurate" or "inaccurate" has a real philosophical grounding.

There exist several techniques that help people translate vague feelings of probability into more accurate numerical estimates. Most of them translate probabilities from forms without immediate consequences (which the brain supposedly processes for signaling purposes) to forms with immediate consequences (which the brain supposedly processes while focusing on those consequences).

Prepare for RevelationWhat would you expect if you believed the answer to your question were about to be revealed to you?

In Belief in Belief, a man acts as if there is a dragon in his garage, but every time his neighbor comes up with an idea to test it, he has a reason why the test wouldn't work. If he imagined Omega (the superintelligence who is always right) offered to reveal the answer to him, he might realize he was expecting Omega to reveal the answer "No, there's no dragon". At the very least, he might realize he was worried that Omega would reveal this, and so re-think exactly how certain he was about the dragon issue.

This is a simple technique and has relatively few pitfalls.

Bet on itAt what odds would you be willing to bet on a proposition?

Suppose someone offers you a bet at even odds that Obama will be re-elected. Would you take it? What about two-to-one odds? Ten-to-one? In theory, the knowledge that money is at stake should make you consider the problem in "near mode" and maximize your chances of winning.

The problem with this method is that it only works when utility is linear with respect to money and you're not risk-averse. In the simplest case I should be indifferent to a $100,000 bet at 50% odds that a fair coin would come up tails, but in fact I would refuse it; winning $100,000 would be moderately good, but losing $100,000 would put me deeply in debt and completely screw up my life. When these sorts of consideration become paramount, imagining wagers will tend to give inaccurate results.

Convert to a FrequencyHow many situations would it take before you expected an event to occur?

Suppose you need to give a probability that the sun will rise tomorrow. "999,999 in a million" doesn't immediately sound wrong; the sun seems likely to rise, and a million is a very high number. But if tomorrow is an average day, then your probability will be linked to the number of days it will take before you expect that the sun will fail to rise on at least one. A million days is three thousand years; the Earth has existed for far more than three thousand years without the sun failing to rise. Therefore, 999,999 in a million is too low a probability for this occurrence. If you think the sort of astronomical event that might prevent the sun from rising happens only once every three billion years, then you might consider a probability more like 999,999,999,999 in a trillion.

In addition to converting to a frequency across time, you can also convert to a frequency across places or people. What's the probability that you will be murdered tomorrow? The best guess would be to check the murder rate for your area. What's the probability there will be a major fire in your city this year? Check how many cities per year have major fires.

This method fails if your case is not typical: for example, if your city is on the losing side of a war against an enemy known to use fire-bombing, the probability of a fire there has nothing to do with the average probability across cities. And if you think the reason the sun might not rise is a supervillain building a high-tech sun-destroying machine, then consistent sunrises over the past three thousand years of low technology will provide little consolation.

A special case of the above failure is converting to frequency across time when considering an event that is known to take place at a certain distance from the present. For example, if today is April 10th, then the probability that we hold a Christmas celebration tomorrow is much lower than the 1/365 you get by checking on what percentage of days we celebrate Christmas. In the same way, although we know that the sun will fail to rise in a few billion years when it burns out its nuclear fuel, this shouldn't affect its chance of rising tomorrow.

Find a Reference ClassHow often have similar statements been true?

What is the probability that the latest crisis in Korea escalates to a full-blown war? If there have been twenty crisis-level standoffs in the Korean peninsula in the past 60 years, and only one of them has resulted in a major war, then (war|crisis) = .05, so long as this crisis is equivalent to the twenty crises you're using as your reference class.

But finding the reference class is itself a hard problem. What is the probability Bigfoot exists? If one makes a reference class by saying that the yeti doesn't exist, the Loch Ness monster doesn't exist, and so on, then the Bigfoot partisan might accuse you of assuming the conclusion - after all, the likelihood of these creatures existing is probably similar to and correlated with Bigfoot. The partisan might suggest asking how many creatures previously believed not to exist later turned out to exist - a list which includes real animals like the orangutan and platypus - but then one will have to debate whether to include creatures like dragons, orcs, and Pokemon on the list.

This works best when the reference class is more obvious, as in the Korea example.

Make Multiple StatementsHow many statements could you make of about the same uncertainty as a given statement without being wrong once?

Suppose you believe France is larger than Italy. With what confidence should you believe it? If you made ten similar statements (Germany is larger than Austria, Britain is larger than Ireland, Spain is larger than Portugal, et cetera) how many times do you think you would be wrong? A hundred similar statements? If you think you'd be wrong only one time out of a hundred, you can give the statement 99% confidence.

This is the most controversial probability assessment technique; it tends to give lower levels of confidence than the others; for example, Eliezer wants to say there's a less than one in a million chance the LHC would destroy the world, but doubts he could make a million similar statements and only be wrong once. Komponisto thinks this is a failure of imagination: we imagine ourselves gradually growing tired and making mistakes, whereas this method only works if the accuracy of the millionth statement is exactly the same as the first.

In any case, the technique is only as good as the ability to judge which statements are equally difficult to a given statement. If I start saying things like "Russia is larger than Vatican City! Canada is larger than a speck of dust!" then I may get all the statements right, but it won't mean much for my Italy-France example - and if I get bogged down in difficult questions like "Burundi is larger than Equatorial Guinea" then I might end up underconfident. In cases where there is an obvious comparison ("Bob didn't cheat on his test", "Sue didn't cheat on her test", "Alice didn't cheat on her test") this problem disappears somewhat.

Imagine Hypothetical EvidenceHow would your probabilities adjust given new evidence?

Suppose one day all the religious people and all the atheists get tired of arguing and decide to settle the matter by experiment once and for all. The plan is to roll an n-sided numbered die and have the faithful of all religions pray for the die to land on "1". The experiment will be done once, with great pomp and ceremony, and never repeated, lest the losers try for a better result. All the resources of the world's skeptics and security forces will be deployed to prevent any tampering with the die, and we assume their success is guaranteed.

If the experimenters used a twenty-sided die, and the die comes up 1, would this convince you that God probably did it, or would you dismiss the result as a coincidence? What about a hundred-sided die? Million-sided? If a successful result on a hundred-sided die wouldn't convince you, your probability of God's existence must be less than one in a hundred; if a million-sided die would convince you, it must be more than one in a million.

This technique has also been denounced as inaccurate, on the grounds that our coincidence detectors are overactive and therefore in no state to be calibrating anything else. It would feel very hard to dismiss a successful result on a thousand-sided die, no matter how low the probability of God is. It might also be difficult to visualize a hypothetical where the experiment can't possibly be rigged, and it may be unfair to force subjects to imagine a hypothetical that would practically never happen (like the million-sided die landing on one in a world where God doesn't exist).

These techniques should be experimentally testable; any disagreement over which do or do not work (at least for a specific individual) can be resolved by going through a list of difficult questions, declaring confidence levels, and scoring the results with log odds. Steven's blog has some good sets of test questions (which I deliberately do

notlink here so as to not contaminate a possible pool of test subjects); if many people are interested in participating and there's a general consensus that an experiment would be useful, we can try to design one.