There once lived a great man named E.T. Jaynes. He knew that Bayesian inference is the only way to do statistics logically and consistently, standing on the shoulders of misunderstood giants Laplace and Gibbs. On numerous occasions he vanquished traditional "frequentist" statisticians with his superior math, demonstrating to anyone with half a brain how the Bayesian way gives faster and more correct results in each example. The weight of evidence falls so heavily on one side that it makes no sense to argue anymore. The fight is over. Bayes wins. The universe runs on Bayes-structure.

Or at least that's what you believe if you learned this stuff from Overcoming Bias.

Like I was until two days ago, when Cyan hit me over the head with something utterly incomprehensible. I suddenly had to go out and understand this stuff, not just believe it. (The original intention, if I remember it correctly, was to impress you all by pulling a Jaynes.) Now I've come back and intend to provoke a full-on flame war on the topic. Because if we can have thoughtful flame wars about gender but not math, we're a bad community. Bad, bad community.

If you're like me two days ago, you kinda "understand" what Bayesians do: assume a prior probability distribution over hypotheses, use evidence to morph it into a posterior distribution over same, and bless the resulting numbers as your "degrees of belief". But chances are that you have a very vague idea of what frequentists do, apart from deriving half-assed results with their ad hoc tools.

Well, here's the ultra-short version: frequentist statistics is *the art of drawing true conclusions about the real world* instead of assuming prior degrees of belief and coherently adjusting them to avoid Dutch books.

And here's an ultra-short example of what frequentists can do: estimate 100 independent unknown parameters from 100 different sample data sets and have 90 of the estimates turn out to be *true to fact* afterward. Like, fo'real. Always 90% in the long run, truly, irrevocably and forever. No Bayesian method known today can reliably do the same: the outcome will depend on the priors you assume for each parameter. I don't believe you're going to get lucky with all 100. And even if I believed you a priori (ahem) that don't make it true.

(That's what Jaynes did to achieve his awesome victories: use trained intuition to pick good priors by hand on a per-sample basis. Maybe you can learn this skill somewhere, but not from the Intuitive Explanation.)

How in the world do you do inference without a prior? Well, the characterization of frequentist statistics as "trickery" is totally justified: it has no single coherent approach and the tricks often give conflicting results. Most everybody agrees that you can't do better than Bayes if you have a clear-cut prior; but if you don't, no one is going to kick you out. We sympathize with your predicament and will gladly sell you some twisted technology!

Confidence intervals: imagine you somehow process some sample data to get an interval. Further imagine that hypothetically, *for any given hidden parameter value*, this calculation algorithm applied to data sampled under that parameter value yields an interval that covers it with probability 90%. Believe it or not, this perverse trick works 90% of the time without requiring any prior distribution on parameter values.

Unbiased estimators: you process the sample data to get a number whose expectation magically coincides with the true parameter value.

Hypothesis testing: I give you a black-box random distribution and claim it obeys a specified formula. You sample some data from the box and inspect it. Frequentism allows you to call me a liar and be wrong no more than 10% of the time reject truthful claims no more than 10% of the time, guaranteed, no prior in sight. (Thanks Eliezer for calling out the mistake, and conchis for the correction!)

But this is getting too academic. I ought to throw you dry wood, good flame material. This hilarious PDF from Andrew Gelman should do the trick. Choice quote:

Well, let me tell you something. The 50 states aren't exchangeable. I've lived in a few of them and visited nearly all the others, and calling them exchangeable is just silly. Calling it a hierarchical or multilevel model doesn't change things - it's an additional level of modeling that I'd rather not do. Call me old-fashioned, but I'd rather let the data speak without applying a probability distribution to something like the 50 states which are neither random nor a sample.

As a bonus, the bibliography to that article contains such marvelous titles as "Why Isn't Everyone a Bayesian?" And Larry Wasserman's followup is also quite disturbing.

Another stick for the fire is provided by Shalizi, who (among other things) makes the correct point that a good Bayesian must never be uncertain about the probability of any future event. That's why he calls Bayesians "Often Wrong, Never In Doubt":

The Bayesian, by definition, believes in a joint distribution of the random sequence X and of the hypothesis M. (Otherwise, Bayes's rule makes no sense.) This means that by integrating over M, we get an unconditional, marginal probability for f.

For my final quote it seems only fair to add one more polemical summary of Cyan's point that made me sit up and look around in a bewildered manner. Credit to Wasserman again:

Pennypacker:You see, physics has really advanced. All those quantities I estimated have now been measured to great precision. Of those thousands of 95 percent intervals, only 3 percent contained the true values! They concluded I was a fraud.

van Nostrand:Pennypacker you fool. I never said those intervals would contain the truth 95 percent of the time. I guaranteed coherence not coverage!

Pennypacker:A lot of good that did me. I should have gone to that objective Bayesian statistician. At least he cares about the frequentist properties of his procedures.

van Nostrand:Well I'm sorry you feel that way Pennypacker. But I can't be responsible for your incoherent colleagues. I've had enough now. Be on your way.

There's often good reason to advocate a correct theory over a wrong one. But all this evidence (ahem) shows that switching to Guardian of Truth mode was, at the very least, premature for me. Bayes isn't the correct theory to make conclusions about the world. *As of today, we have no coherent theory for making conclusions about the world.* Both perspectives have serious problems. So do yourself a favor and switch to truth-seeker mode.

Wrong. If all black boxes do obey their specified formulas, then every single time you call the other person a liar, you will be wrong. P(wrong|"false") ~ 1.

I'm thinking you still haven't quite understood here what frequentist statistics do.

It's not perfectly reliable. They assume they have perfect information about experimental setups and likelihood ratios. (Where does this perfect knowledge come from? Can Bayesians get their priors from the same source?)

A Bayesian who wants to report something at least as reliable as a frequentist statistic, simply reports a likelihood ratio between two or more hypotheses from the evidence; and in that moment has told another Bayesian just what frequentists think they have perfect knowledge of, but

simply, with far less confusion and error and mathematical chicanery and opportunity for distortion, and greater ability to combine the results of multiple experiments.And more importantly, we understand what likelihood ratios

are, and that they do not become posteriors without adding a prior somewhere.Who? Whaa? Your probability

isyour uncertainty.Well, yes. You have to bet at some odds. You're in some particular state of uncertainty and not a different one. I suppose the game is to make people think that being in some particular state of uncertainty, corresponds to claiming to know too much about the problem? The

ignoranceis shown in theinstabilityof the estimate - the way it reacts strongly to new evidence.Can you give a detailed numerical examples of some problem where the Bayesian and Frequentist give different answers, and you feel strongly that the Frequentist's answer is better somehow?

I think you've tried to do that, but I don't fully understand most of your examples. Perhaps if you used numbers and equations, that would help a lot of people understand your point. Maybe expand on your "And here's an ultra-short example of what frequentists can do" idea?

If your prior is screwed up enough, you'll also misunderstand the experimental setup and the likelihood ratios. Frequentism depends on prior knowledge just as much as Bayesianism, it just doesn't have a good formal way of treating it.

I didn't mean to rehabilitate frequentism! I only meant to point out that calibration is a frequentist optimality criterion, and one that Bayesian posterior intervals can be proved not to have in general. I view this as a bullet to be bitten, not dodged.

It's out of your hands now. Overcoming Bayes!

Can someone do something I've never seen anyone do - lay out a simple example in which the Bayesian and frequentist approaches give different answers?

I've had some training in Bayesian and Frequentist statistics and I think I know enough to say that it would be difficult to give a "simple" and satisfying example. The reason is that if one is dealing with finite dimensional statistical models (this is where the parameter space of the model is finite) and one has chosen a prior for those parameters such that there is non-zero weight on the true values then the Bernstein-von Mises theorem guarantees that the Bayesian posterior distribution and the maximum likelihood estimate converge to the same probability distribution (although you may need to use improper priors). The covers cases where we consider finite outcomes such as a toss of a coin or rolling a die.

I apologize if that's too much jargon, but for really simple models that are easy to specify you tend to get the same answer. Bayesian stats starts to behave different than frequentist statistics in noticeable ways when you consider infinite outcome spaces. An example here might be where you are considering probability distributions over curves (this arises in my research on speech recognition). In this case even if you have a seemingly sensible prior you can end ... (read more)

I had another thought on the subject. Consider flipping a coin; a Bayesian says that the 50% estimate of getting tails is just your own inability to predict with sufficient accuracy; a frequentist says that the 50% is a property of the coin - or to be less straw-making about it, a property of large sets of indistinguishable coin-flips. So, ok, in principle you could build a coin-predictor and remove the uncertainty. But now consider an electron passing through a beam splitter. Here there is no method

even in principleof predicting which Everett branch you... (read more)... as if applying the classical method doesn't require using trained intuition to use the "right" method for a particular kind of problem, which amounts to choosing a prior but doing it implicitly rather than explicitly ...

... (read more)Since we're discussing (among other things) noninformative priors, I'd like to ask: does anyone know of a decent (noninformative) prior for the space of stationary, bidirectionally infinite sequences of 0s and 1s?

Of course in any practical inference problem it would be pointless to consider the infinite joint distribution, and you'd only need to consider what happens for a finite chunk of bits, i.e. a higher-order Markov process, described by a bunch of parameters (probabilities) which would need to satisfy some linear inequalities. So it's easy to find a ... (read more)

Perhaps we can try an experiment? We have here, apparently, both Bayesians and frequentists; or at a minimum, people knowledgeable enough to be able to apply both methods. Suppose I generate 25 data points from some distribution whose nature I do not disclose, and ask for estimates of the true mean and standard deviation, from a Bayesian and a frequentist? The underlying analysis would also be welcome. If necessary we could extend this to 100 sets of data points, ask for 95% confidence intervals, and see if the methods are well calibrated. (This does proba... (read more)

I think this was a great post for having both context

andlinks and specifically (rather than generally) questioning assumptions the group hasn't visited in a while (if ever).What does one read to become well versed in this stuff in two days; and how much skill with maths does it require?

I'm surprised that nobody has mentioned the Universal Prior yet. Eliezer also wrote a post on it.

... What is it that frequentists do, again? I'm a little out of touch.

Strong evidence can always defeat strong priors, and vice versa.

Is there anything more to the issue than this?

I didn't mean to rehabilitate frequentism! I only meant to point out that calibration is a frequentist optimality criterion, and that it's one that Bayesian posterior intervals can be proved not to have in general.

I'd like to take advantage of frequentism's return to respectability to ask if anyonw knows where I can get a copy of "An Introduction to the Bootstrap" by Efron and Tibshirani.

It's on Google books, but I don't like reading things through Google books. It's for sale on-line, but it costs a lot and shipping takes a while. My university's library is supposed to have it, but the librarians can't find it. My local library hasn't heard of it.

I hardly know any statistics or probability; I've just been borrowing bits and pieces as I need them without... (read more)

Being a frequentist who hangs out on a Bayesian forum, I've thought about the difference between the two perspectives. I think the dichotomy is analogous to bottom-up verses top-down thinking; neither one is superior to the other but the usefulness of each waxes and wanes depending upon the current state of a scientific field. I think we need both to develop any field fully.

Possibly my understanding of the difference between a frequentist and Bayesian perspective is different than yours (I am a frequentist after all) so I will describe what I think the dif... (read more)