This article is an attempt to summarize basic material, and thus probably won't have anything new for the hard core posting crowd. It'd be interesting to know whether you think there's anything essential I missed, though.

You've probably seen the word 'Bayesian' used a lot on this site, but may be a bit uncertain of what exactly we mean by that. You may have read the intuitive explanation, but that only seems to explain a certain math formula. There's a wiki entry about "Bayesian", but that doesn't help much. And the LW usage seems different from just the "Bayesian and frequentist statistics" thing, too. As far as I can tell, there's no article explicitly defining what's meant by Bayesianism. The core ideas are sprinkled across a large amount of posts, 'Bayesian' has its own tag, but there's not a single post that explicitly comes out to make the connections and say "this is Bayesianism". So let me try to offer my definition, which boils Bayesianism down to three core tenets.

We'll start with a brief example, illustrating Bayes' theorem. Suppose you are a doctor, and a patient comes to you, complaining about a headache. Further suppose that there are two reasons for why people get headaches: they might have a brain tumor, or they might have a cold. A brain tumor always causes a headache, but exceedingly few people have a brain tumor. In contrast, a headache is rarely a symptom for cold, but most people manage to catch a cold every single year. Given no other information, do you think it more likely that the headache is caused by a tumor, or by a cold?

If you thought a cold was more likely, well, that was the answer I was after. Even if a brain tumor caused a headache every time, and a cold caused a headache only one per cent of the time (say), having a cold is so much more common that it's going to cause a lot more headaches than brain tumors do. Bayes' theorem, basically, says that if cause A might be the reason for symptom X, then we have to take into account both the probability that A caused X (found, roughly, by multiplying the frequency of A with the chance that A causes X) and the probability that anything else caused X. (For a thorough mathematical treatment of Bayes' theorem, see Eliezer's Intuitive Explanation.)

There should be nothing surprising about that, of course. Suppose you're outside, and you see a person running. They might be running for the sake of exercise, or they might be running because they're in a hurry somewhere, or they might even be running because it's cold and they want to stay warm. To figure out which one is the case, you'll try to consider which of the explanations is true most often, and fits the circumstances best.

Core tenet 1: Any given observation has many different possible causes.

Acknowledging this, however, leads to a somewhat less intuitive realization. For any given observation, how you should interpret it always depends on previous information. Simply seeing that the person was running wasn't enough to tell you that they were in a hurry, or that they were getting some exercise. Or suppose you had to choose between two competing scientific theories about the motion of planets. A theory about the laws of physics governing the motion of planets, devised by Sir Isaac Newton, or a theory simply stating that the Flying Spaghetti Monster pushes the planets forwards with His Noodly Appendage. If these both theories made the same predictions, you'd have to depend on your prior knowledge - your prior, for short - to judge which one was more likely. And even if they didn't make the same predictions, you'd need some prior knowledge that told you which of the predictions were better, or that the predictions matter in the first place (as opposed to, say, theoretical elegance).

Or take the debate we had on 9/11 conspiracy theories. Some people thought that unexplained and otherwise suspicious things in the official account had to mean that it was a government conspiracy. Others considered their prior for "the government is ready to conduct massively risky operations that kill thousands of its own citizens as a publicity stunt", judged that to be overwhelmingly unlikely, and thought it far more probable that something else caused the suspicious things.

Again, this might seem obvious. But there are many well-known instances in which people forget to apply this information. Take supernatural phenomena: yes, if there were spirits or gods influencing our world, some of the things people experience would certainly be the kinds of things that supernatural beings cause. But then there are also countless of mundane explanations, from coincidences to mental disorders to an overactive imagination, that could cause them to perceived. Most of the time, postulating a supernatural explanation shouldn't even occur to you, because the mundane causes already have lots of evidence in their favor and supernatural causes have none.

Core tenet 2: How we interpret any event, and the new information we get from anything, depends on information we already had.

Sub-tenet 1: If you experience something that you think could only be caused by cause A, ask yourself "if this cause didn't exist, would I regardless expect to experience this with equal probability?" If the answer is "yes", then it probably wasn't cause A.

This realization, in turn, leads us to

Core tenet 3: We can use the concept of probability to measure our subjective belief in something. Furthermore, we can apply the mathematical laws regarding probability to choosing between different beliefs. If we want our beliefs to be correct, we must do so.

The fact that anything can be caused by an infinite amount of things explains why Bayesians are so strict about the theories they'll endorse. It isn't enough that a theory explains a phenomenon; if it can explain too many things, it isn't a good theory. Remember that if you'd expect to experience something even when your supposed cause was untrue, then that's no evidence for your cause. Likewise, if a theory can explain anything you see - if the theory allowed any possible event - then nothing you see can be evidence for the theory.

At its heart, Bayesianism isn't anything more complex than this: a mindset that takes three core tenets fully into account. Add a sprinkle of idealism: a perfect Bayesian is someone who processes all information perfectly, and always arrives at the best conclusions that can be drawn from the data. When we talk about Bayesianism, that's the ideal we aim for.

Fully internalized, that mindset does tend to color your thought in its own, peculiar way. Once you realize that all the beliefs you have today are based - in a mechanistic, lawful fashion - on the beliefs you had yesterday, which were based on the beliefs you had last year, which were based on the beliefs you had as a child, which were based on the assumptions about the world that were embedded in your brain while you were growing in your mother's womb... it does make you question your beliefs more. Wonder about whether all of those previous beliefs really corresponded maximally to reality.

And that's basically what this site is for: to help us become good Bayesians.

is there a simple explanation of the conflict between bayesianism and frequentialism? I have sort of a feel for it from reading background materials but a specific example where they yield different predictions would be awesome. has such already been posted before?

Eliezer's views as expressed in Blueberry's links touch on a key identifying characteristic of frequentism: the tendency to think of probabilities as inherent properties of objects. More concretely, a pure frequentist (a being as rare as a pure Bayesian) treats probabilities as proper only to outcomes of a repeatable random experiment. (The definition of such a thing is pretty tricky, of course.)

What does that mean for frequentist statistical inference? Well, it's forbidden to assign probabilities to anything that is deterministic in your model of reality. So you have estimators, which are functions of the random data and thus random themselves, and you assess how good they are for your purpose by looking at their sampling distributions. You have confidence interval procedures, the endpoints of which are random variables, and you assess the sampling probability that the interval contains the true value of the parameter (and the width of the interval, to avoid pathological intervals that have nothing to do with the data). You have statistical hypothesis testing, which categorizes a simple hypothesis as “rejected” or “not rejected” based on a procedure assessed in terms of the sampling probability of an error in the categorization. You have, basically, anything you can come up with, provided you justify it in terms of its sampling properties over infinitely repeated random experiments.

Here is a more general definition of "pure frequentism" (which includes frequentists such as Reichenbach):

Consider an assertion of probability of the form "This X has probability p of being a Y." A frequentist holds that this assertion is meaningful only if the following conditions are met:

1. The speaker has already specified a determinate set X of things that actually have or will exist, and this set contains "this X".

2. The speaker has already specified a determinate set Y containing all things that have been or will be Ys.

The assertion is true if the proportion of elements of X that are also in Y is precisely p.

A few remarks:

1. The assertion would mean something different if the speaker had specified different sets X and Y, even though X and Y aren't mentioned explicitly in the assertion.

2. If no such sets had been specified in the preceding discourse, the assertion by itself would be meaningless.

3. However, the speaker has complete freedom in what to take as the set X containing "this X", so long as X contains X. In particular, the other elements don't have to be exactly like X, or be generated by exactly the same repeatable procedure,

...
I'm sorry to see such wrongheaded views of frequentism here. Frequentists also assign probabilities to events where the probabilistic introduction is entirely based on limited information rather than a literal randomly generated phenomenon. If Fisher or Neyman was ever actually read by people purporting to understand frequentist/Bayesian issues, they'd have a radically different idea. Readers to this blog should take it upon themselves to check out some of the vast oversimplifications... And I'm sorry but Reichenbach's frequentism has very little to do with frequentist statistics--. Reichenbach, a philosopher, had an idea that propositions had frequentist probabilities. So scientific hypotheses--which would not be assigned probabilities by frequentist statisticians--could have frequentist probabilities for Reichenbach, even though he didn't think we knew enough yet to judge them. He thought at some point we'd be able to judge of a hypothesis of a type how frequently hypothesis like it would be true. I think it's a problematic idea, but my point was just to illustrate that some large items are being misrepresented here, and people sold a wrongheaded view. Just in case anyone cares. Sorry to interrupt the conversation (errorstatistics.com)
Do you intend to be replying to me or to Tyrrell McAllister?
Wait - Bayesians can assign probabilities to things that are deterministic? What does that mean? What would a Bayesian do instead of a T-test?
[-]wnoise250

Wait - Bayesians can assign probabilities to things that are deterministic? What does that mean?

Absolutely!

The Bayesian philosophy is that probabilities are about states of knowledge. Probability is reasoning with incomplete information, not about whether an event is "deterministic", as probabilities do still make sense in a completely deterministic universe. In a poker game, there are almost surely no quantum events influencing how the deck is shuffled. Classical mechanics, which is deterministic, suffices to predict the ordering of cards. Even so, we have neither sufficient initial conditions (on all the particles in the dealer's body and brain, and any incoming signals), nor computational power to calculate the ordering of the cards. In this case, we can still use probability theory to figure out probabilities of various hand combinations that we can use to guide our betting. Incorporating knowledge of what cards I've been dealt, and what (if any) are public is straightforward. Incorporating player's actions and reactions is much harder, and not really well enough defined that there is a mathematically correct answer, but clearly we should use that knowledge ...

1Cyan
Very nice! I'd only replace "useful" with "plausible". (Sure, it's hard to define plausibility, but usefulness is not really the right concept.)
5wnoise
"Usefulness" certainly isn't the orthodox Bayesian phrasing. I call myself a Bayesian because I recognize that Bayes's Rule is the right thing to use in these situations. Whether or not the probabilities assigned to hypotheses "actually are" probabilities (whatever that means), they should obey the same mathematical rules of calculation as probabilities. But precisely because only the manipulation rules matter, I'm not sure it is worth emphasizing that "to be a good Bayesian" you must accord these probabilities the same status as other probabilities. A hardcore Frequentist is not going to be comfortable doing that. Heck, I'm not sure I'm comfortable doing that. Data and event probabilities are things that can eventually be "resolved" to true or false, by looking after the fact. Probability as plausibility makes sense for these things. But for hypotheses and models, I ask myself "plausibility of what? Being true?" Almost certainly, the "real" model (when that even makes sense) isn't in our space of models. For example, a common, almost necessary, assumption is exchangeability: that any given permutation of the data is equally likely -- effectively that all data points are drawn from the same distribution. Data often doesn't behave like that, instead having a time drift. Coins being tossed develop wear, cards being shuffled and dealt get bent. I really do prefer to think of some models being more or less useful. Of course, following this path shades into decision theory: we might want to assign priors according to how "tractable" the models are, including both in specification (stupid models that just specify what the data will be take lots of specification, so should have lower initial probabilities). Models that take longer to compute data probabilities should similarly have a probability penalty, not simply because they're implausible, but because we don't want to use them unless the data force us to.

...shades into decision theory...Models that take longer to compute data probabilities should similarly have a probability penalty, not simply because they're implausible, but because we don't want to use them unless the data force us to.

Whoa! that sounds dangerous! Why not keep the beliefs and costs separate and only apply this penalty at the decision theory stage?

2wnoise
Well, I said shaded into the lines of decision theory... Yes, it absolutely is dangerous, and thinking about it more I agree it should not be done this way. Probability penalties do not scale correctly with the data collected: they're essentially just a fixed offset. Modified utility of using a particular method really is different. If a method is unusable, we shouldn't use it, and methods that trade off accuracy for manageability should be decided at that level, once we can judge the accuracy -- not earlier. EDIT: I suppose I was hoping for a valid way of justifying the fact that we throw out models that are too hard to use or analyze -- they never make it into our set of hypotheses in the first place. It's amazing how often conjugate priors "just happen" to be chosen...
4Cyan
Plausibility of being true given the prior information. Just as Aristotelian logic gives valid arguments (but not necessarily sound ones), Bayes's theorem gives valid but not necessarily sound plausibility assessments. That's pretty much why I wanted to make the distinction between plausibility and usefulness. One of the things I like about the Cox-Jaynes approach is that it cleanly splits inference and decision-making apart.
2wnoise
Okay, sure we can go back to the Bayesian mantra of "all probabilities are conditional probabilities". But our prior information effectively includes the statement that one of our models is the "true one". And that's never the actual case, so our arguments are never sound in this sense, because we are forced to work from prior information that isn't true. This isn't a huge problem, but it in some sense undermines the motivation for finding these probabilities and treating them seriously -- they're conditional probabilities being applied in a case where we know that what is being conditioned on is false. What is the grounding to our actual situation? I like to take the stance that in practice this is still useful -- as an approximation procedure -- sorting through models that are approximately right.
3Cyan
One does generally resort to non-Bayesian model checking methods. Andrew Gelman likes to include such checks under the rubric of "Bayesian data analysis"; he calls the computing of posterior probabilities and densities "Bayesian inference", a preceding subcomponent of Bayesian data analysis. This makes for sensible statistical practice, but the underpinnings aren't strong. One might consider it an attempt to approximate the Solomonoff prior.
0wnoise
Yes, in practice people resort to less motivated methods that work well. I'd really like to see some principled answer that has the same feel as Bayesianism though. As it stands, I have no problem using Bayesian methods for parameter estimation. This is natural because we really are getting pdf(parameters | data, model). But for model selection and evaluation (i.e. non-parametric Bayes) I always feel that I need an "escape hatch" to include new models that the Bayes formalism simply doesn't have any place for.
0Cyan
I feel the same way.
3wedrifid
I am much more comfortable leaving probability as it is but using a different term for usefulness.
1nazgulnarsil
the tendency to think of probabilities as inherent properties of objects. yeah, this was my intuitive reason for thinking frequentists are a little crazy.
5byrnema
On the other hand, it's evidence to me that we're talking about different types of minds. Have we identified whether this aspect of frequentism is a choice, or just the way their minds work? I'm a frequentist, I think, and when I interrogate my intuition about whether 50% heads / 50% tails is a property of a fair coin, it returns 'yes'. However, I understand that this property is an abstract one, and my intuition doesn't make any different empirical predictions about the coin than a Bayesian would. Thus, what difference does it make if I find it natural to assign this property? In other words, in what (empirically measurable!) sense could it be crazy?
7wnoise
http://comptop.stanford.edu/preprints/heads.pdf Well, the immediate objection is that if you hand the coin to a skilled tosser, the frequencies of heads and tails in the tosses can be markedly different than 50%. If you put this probability in the coin, than you really aren't modeling things in a manner that accords with results. You can, of course talk instead about a procedure of coin-tossing, that naturally has to specify the coin as well. Of course, that merely pushes things back a level. If you completely specify the tossing procedure (people have built coin-tossing machines), then you can repeatedly get 100%/0% splits by careful tuning. If you don't know whether it is tuned to 100% heads or 100% tails, is it still useful to describe this situation probabilistically? A hard-core Frequentist "should" say no, as everything is deterministic. Most people are willing to allow that 50% probability is a reasonable description of the situation. To the extent that you do allow this, you are Bayesian. To the extent that you don't, you're missing an apparently valuable technique.
2byrnema
The frequentist can account for the biased toss and determinism, in various ways. My preferred reply would be that the 50/50 is a property of the symmetry of the coin. (Of course, it's a property of an idealized coin. Heck, a real coin can land balanced on its edge.) If someone tosses the coin in a way that biases the coin, she has actually broken the symmetry in some way with her initial conditions. In particular, the tosser must begin with the knowledge of which way she is holding the coin -- if she doesn't know, she can't bias the outcome of the coin. I understand that Bayesian's don't tend to abstract things to their idealized forms ... I wonder to what extent Frequentism does this necessarily. (What is the relationship between Frequentism and Platonism?)
7wnoise
Oh, absolutely. The typical way is choosing some reference class of idealized experiments that could be done. Of course, the right choice of reference class is just as arbitrary as the right choice of Bayesian prior. Whereas the Bayesian would argue that the 50/50 property is a symmetry about our knowledge of the coin -- even a coin that you know is biased, but that you have no evidence for which way it is biased. Well, I don't think Bayesians are particularly reluctant to look at idealized forms, it's just that when you can make your model more closely match the situation (without incurring horrendous calculational difficulties) there is a benefit to do so. And of course, the question is "which idealized form?" There are many ways to idealize almost any situation, and I think talking about "the" idealized form can be misleading. Talking about a "fair coin" is already a serious abstraction and idealization, but it's one that has, of course, proven quite useful. That's a very interesting question.
5Blueberry
To quote from Gelman's rejoinder that Phil Goetz mentioned, So, speaking very loosely, Bayesianism is to science, inductive logic, and Aristotelianism as frequentism is to math, deductive logic, and Platonism. That is, Bayesianism is synthesis; frequentism is analysis.
1byrnema
Interesting! That makes a lot of sense to me, because I had already made connections between science and Aristotelianism, pure math and Platonism.
7Blueberry
This and this might be the kind of thing you're looking for. Though the conflict really only applies in the artificial context of a math problem. Frequentialism is more like a special case of Bayesianism where you're making certain assumptions about your priors, sometimes specifically stated in the problem, for ease of calculation. For instance, in a Frequentialist analysis of coin flips, you might ignore all your prior information about coins, and assume the coin is fair.
2nazgulnarsil
thanks, that's what I was looking for. would it be correct to say that in the frequentist interpretation your confidence interval narrows as your trials approach infinity?
4wnoise
That is a highly desired property of Frequentist methods, but it's not guaranteed by any means.
6bill
If it helps, I think this is an example of a problem where they give different answers to the same problem. From Jaynes; see http://bayes.wustl.edu/etj/articles/confidence.pdf , page 22 for the details, and please let me know if I've erred or misinterpreted the example. Three identical components. You run them through a reliability test and they fail at times 12, 14, and 16 hours. You know that these components fail in a particular way: they last at least X hours, then have a lifetime that you assess as an exponential distribution with an average of 1 hour. What is the shortest 90% confidence interval / probability interval for X, the time of guaranteed safe operation? Frequentist 90% confidence interval: 12.1 hours - 13.8 hours Bayesian 90% probability interval: 11.2 hours - 12.0 hours Note: the frequentist interval has the strange property that we know for sure that the 90% confidence interval does not contain X (from the data we know that X <= 12). The Bayesian interval seems to match our common sense better.
8cupholder
Heh, that's a cheeky example. To explain why it's cheeky, I have to briefly run through it, which I'll do here (using Jaynes's symbols so whoever clicked through and has pages 22-24 open can directly compare my summary with Jaynes's exposition). Call N the sample size and θ the minimum possible widget lifetime (what bill calls X). Jaynes first builds a frequentist confidence interval around θ by defining the unbiased estimator θ∗, which is the observations' mean minus one. (Subtracting one accounts for the sample mean being >θ.) θ∗'s probability distribution turns out to be y^(N-1) exp(-Ny), where y = θ∗ - θ + 1. Note that y is essentially a measure of how far our estimator θ∗ is from the true θ, so Jaynes now has a pdf for that. Jaynes integrates that pdf to get y's cdf, which he calls F(y). He then makes the 90% CI by computing [y1, y2] such that F(y2) - F(y1) = 0.9. That gives [0.1736, 1.8259]. Substituting in N and θ∗ for the sample and a little algebra (to get a CI corresponding to θ∗ rather than y) gives his θ CI of [12.1471, 13.8264]. For the Bayesian CI, Jaynes takes a constant prior, then jumps straight to the posterior being N exp(N(θ - x1)), where x1's the smallest lifetime in the sample (12 in this case). He then comes up with the smallest interval that encompasses 90% of the posterior probability, and it turns out to be [11.23, 12]. Jaynes rightly observes that the Bayesian CI accords with common sense, and the frequentist CI does not. This comparison is what feels cheeky to me. Why? Because Jaynes has used different estimators for the two methods [edit: I had previously written here that Jaynes implicitly used different estimators, but this is actually false; when he discusses the example subsequently (see p. 25 of the PDF) he fleshes out this point in terms of sufficient v. non-sufficient statistics.]. For the Bayesian CI, Jaynes effectively uses the minimum lifetime as his estimator for θ (by defining the likelihood to be solely a function of the
8wnoise
This example really is Bayesianism-done-straightforwardly. The point is that you really don't need to be sly to get reasonable results. A constant prior ends up using only the likelihoods. The jump straight to the posterior is a completely mechanical calculation, just products, and normalization. Each individual likelihood goes to zero for (x < θ). This means that product also does for the smallest (x1 < θ). You will get out the same PDF as Jaynes. CIs can be constructed many ways from PDFs, but constructing the smallest one will give you the same one as Jaynes. EDIT: for full effect, please do the calculation yourself.
0Cyan
I stopped reading cupholder's comment before the last paragraph (to write my own reply) and completely missed this! D'oh!
1Cyan
Jaynes does go on to discuss everything you have pointed out here. He noted that confidence intervals had commonly been held not to require sufficient statistics, pointed out that some frequentist statisticians had been doubtful on that point, and remarked that if the frequentist estimator had been the sufficient statistic (the minimum lifetime) then the results would have agreed. I think the real point of the story is that he ran through the frequentist calculation for a group of people who did this sort of thing for a living and shocked them with it.
0cupholder
You got me: I didn't read the what-went-wrong subsection that follows the example. (In my defence, I did start reading it, but rolled my eyes and stopped when I got to the claim that "there must be a very basic fallacy in the reasoning underlying the principle of confidence intervals".) I suspect I'm not the only one, though, so hopefully my explanation will catch some of the eyeballs that didn't read Jaynes's own post-mortem. [Edit to add: you're almost certainly right about the real point of the story, but I think my reply was fair given the spirit in which it was presented here, i.e. as a frequentism-v.-Bayesian thing rather than an orthodox-statisticians-are-taught-badly thing.]
1Cyan
Independently reproducing Jaynes's analysis is excellent, but calling him "cheeky" for "implicitly us[ing] different estimators" is not fair given that he's explicit on this point. It's a frequentism-v.-Bayesian thing to the extent that correct coverage is considered a sufficient condition for good frequentist statistical inference. This is the fallacy that you rolled your eyes at; the room full of shocked frequentists shows that it wasn't a strawman at the time. [ETA: This isn't quite right. The "v.-Bayesian" part comes in when correct coverage is considered a necessary condition, not a sufficient condition.] ETA: This is a really good point, and it makes me happy that you wrote your explanation. For people for whom Jaynes's phrasing gets in the way, your phrasing bypasses the polemics and lets them see the math behind the example.
0cupholder
I was wrong to say that Jaynes implicitly used different estimators for the two methods. After the example he does mention it, a fact I missed due to skipping most of the post-mortem. I'll edit my post higher up to fix that error. (That said, at the risk of being pedantic, I did take care to avoid calling Jaynes-the-person cheeky. I called his example cheeky, as well as his comparison of the frequentist CI to the Bayesian CI, kinda.) When I read Jaynes's fallacy claim, I didn't interpret it as saying that treating coverage as necessary/sufficient was fallacious; I read it as arguing that the use of confidence intervals in general was fallacious. That was made me roll my eyes. [Edit to clarify: that is, I was rolling my eyes at what I felt was a strawman, but a different one to the one you have in mind.] Having read his post-mortem fully and your reply, I think my initial, eye-roll-inducing interpretation was incorrect, though it was reasonable on first read-through given the context in which the "fallacy" statement appeared.
0Cyan
Fair point.
0nazgulnarsil
excellent paper, thanks for the link.
0Jordan
My intuition would be that the interval should be bounded above by 12 - epsilon, since the probability that we got one component that failed at the theoretically fastest rate seems unlikely (probability zero?).
2Cyan
You can treat the interval as open at 12.0 if you like; it makes no difference.
2JGWeissman
If by epsilon, you mean a specific number greater than 0, the only reason to shave off an interval of length epsilon from the high end of the confidence interval is if you can get the probability contained in that epsilon-length interval back from a smaller interval attached to the low end of the confidence interval. (I haven't work through the math, and the pdf link is giving me "404 not found", but presumably this is not the case in this problem.)
2Cyan
The link's a 404 because it includes a comma by accident -- here's one that works: http://bayes.wustl.edu/etj/articles/confidence.pdf.
0Jordan
Thanks, that makes sense, although it still butts up closely against my intuition.
3PhilGoetz
Andrew Gelman wrote a parody of arguments against Bayesianism here. Note that he says that you don't have to choose Bayesianism or frequentism; you can mix and match. I'd be obliged if someone would explain this paragraph, from his response to his parody: • “Why should I believe your subjective prior? If I really believed it, then I could just feed you some data and ask you for your subjective posterior. That would save me a lot of effort!”: I agree that this criticism reveals a serious incoherence with the subjective Bayesian framework as well with in the classical utility theory of von Neumann and Morgenstern (1947), which simultaneously demands that an agent can rank all outcomes a priori and expects that he or she will make utility calculations to solve new problems. The resolution of this criticism is that Bayesian inference (and also utility theory) are ideals or aspirations as much as they are descriptions. If there is serious disagreement between your subjective beliefs and your calculated posterior, then this should send you back to re-evaluate your model.

Nice explanation. My only concern is that by the opening statement "aiming low". It makes it difficult to send this article to people without them justifiably rejecting it out of hand as a patronizing act. When the intention for aim low is truly noble, perhaps it is just as accurately described as writing clearly, writing for non-experts, or maybe even just writing an "introduction".

5Kaj_Sotala
Good point. I changed "to aim low" to "to summarize basic material".
0[anonymous]
And besides, as a software developer with plenty of Bayesian theory behind me, I appreciate the simplicity of the article for the clarity it provides me. Thanks for "aiming low" ;-)

Great, great post. I like that it's more qualitative and philosophical than quantitative, which really makes it clear how to think like a Bayesian. Though I know the math is important, having this kind of intuitive, qualitative understanding is very useful for real life, when we don't have exact statistics for so many things.

Re: "Core tenet 1: For any given observation, there are lots of different reasons that may have caused it."

This seems badly phrased. It is normally previous events that cause observations. It is not clear what it means for a reason to cause something.

3Kaj_Sotala
Good point. That sentence structure was a carryover from Finnish, where you can say that reasons cause things. Would "Any given observation has many different possible causes" be better?
4Morthrod
Yes, that would be better.
2Kaj_Sotala
Changed.
I don't know if it belongs here or in a separate post but afaik there is no explanation of the Dutch book argument on Less Wrong. It seems like there should be. Just telling people that structuring your beliefs according to Bayes Theorem will make them accurate might not do the trick for some. The Dutch book argument makes it clear why you can't just use any old probability distribution.

8Kaj_Sotala
I thought about whether to include a Dutch Book discussion in this post, but felt it would have been too long and not as "deep core" as the other stuff. More like "supporting core". But yes, it would be good to have a discussion of that up on LW somewhere.
1wedrifid
3Jack
I'm on it.

Thanks Kaj,

As I stated in my last post, reading LW often gives me the feeling that I have read something very important, yet I often don't immediately know why what I just read should be important until I have some later context in which to place the prior content.

Your post just gave me the context in which to make better sense of all of the prior content on Bayes here on LW.

It doesn't hurt that I have finally dipped my toes in the Bayesian Waters of Academia in an official capacity with a Probability and Stats class (which seems to be a prerequisite for s...

Possible typo:

A theory about the laws of physics governing the motion of planets, devised by Sir Isaac Newton, or a theory simply stating that the Flying Spaghetti Monster pushes the planets forward>s< with His Noodly Appendage.

In the spirit of aiming low, I don't think you aimed nearly low enough. If I hadn't already read a small amount from the sequences I wouldn't have been able to pick up too much from this article. This reads as a great summary; I am not convinced it is a good explanation.

The rest of this comment is me saying the above in mo...

3Kaj_Sotala
This is excellent feedback; please, do go on. I did wonder if this was still too short and not aiming low enough. I chose to go on the side of briefness, partially because I was worried about ending up with a giant mammoth post and partially because I felt I'd just be repeating what Eliezer's said before. But yeah, looking at it now, I'm not at all convinced of how well I'd have gotten the message if my pre-OB self had read this. Interesting that you find the usage of "you" and "we" patronizing. I hadn't thought of it like that - I intended it as a way to make the post less formal and build a more comfortable atmosphere to the reader. Your rewording sounds good: not exactly the way I'd put it, but certainly something to build on. Hmm, what do people think - if we end up rewriting this, should I just edit this post? Or make an entirely new one? Perhaps keep this one as it is, but work the changes into a future one that's longer?
9MrHen
0Kaj_Sotala
Very interesting. Actually, I didn't seek to aim that low - I was targeting the average LW reader (or at least an average person who was comfortable with maths). However, I still find this to be very valuable, since I have played around with the idea of trying to write a book that'd attempt to sell (implicitly or explicitly) the idea of "maths / science, especially as applied to rationality / cognitive science is actually fun" to a lay audience. So I probably won't alter the original article as a reaction to this, but if you want to nevertheless help me in figuring out how to reach to that audience, do continue. :)
0MrHen
Haha, will do. I do realize that some of what I am bringing up is extremely petty, but I have watched some of my articles get completely derailed by what I would consider to be a completely irrelevant point. Even amongst the high quality discussions in the comments I find myself needing to back up and ask a Really Obvious Question. This is likely a fault in the way I communicate (which is accentuated online) and also a glitch where people are not willing/able to drop subjects that are bugging them. If I was fundamentally opposed to the claim that all brain tumors caused headaches I would feel compelled to point it out in the comments. (This compulsion is something I am trying to curb.) In any case, I am glad the comments are helpful and I will continue as I find the time. If you ever start drafting something like what you mentioned I am willing to proofread and comment.
7pjeby
3wnoise
Personally, I think if it's just minor stylistic changes in expressing the same material, editing the post is the way to go; if it's adding more material, or expressing it radically differently, then a new post is appropriate.
0h-H
it's fine the way it is I think, it covers enough without being too specific. great post.

A frequentist asks, "did you find enough evidence?" A Bayesian asks, "how much evidence did you find?"

Frequentists can be tricky, by saying that a very small amount of evidence is sufficient; and they can hide this claim behind lots of fancy calculations, so they usually get away with it. This makes for better press releases, because saying "we found 10dB of evidence that X" doesn't sound nearly as good as saying "we found that X".

1PhilGoetz
Since when do frequentists measure evidence in decibels?
2JGWeissman
jimrandomh claimed that frequentists don't report amounts of evidence. So you object that measuring in decibels is not how they don't report it? If they don't reports amount of evidence, then of course they don't report it in the precise way in the example.
1toto
Frequentists (or just about anybody involved in experimental work) report p-values, which are their main quantitative measure of evidence.
6JGWeissman
Evidence, as measured in log odds, has the nice property that evidence from independent sources can be combined by adding. Is there any way at all to combine p-values from independent sources? As I understand them, p-values are used to make a single binary decision to declare a theory supported or not, not to track cumulative strength of belief in a theory. They are not a measure of evidence.
Log odds of independent events do not add up, just as the odds of independent events do not multiply. The odds of flipping heads is 1:1, the odds of flipping heads twice is not 1:1 (you have to multiply odds by likelihood ratios, not odds by odds, and likewise you don't add log odds and log odds, but log odds and log likelihood-ratios). So calling log odds themselves "evidence" doesn't fit the way people use the word "evidence" as something that "adds up". This terminology may have originated here: http://causalityrelay.wordpress.com/2008/06/23/odds-and-intuitive-bayes/ I'm voting your comment up, because I think it's a great example of how terminology should be chosen and used carefully. If you decide to edit it, I think it would be most helpful if you left your original words as a warning to others :)
0JGWeissman
By "evidence", I refer to events that change an agent's strength of belief in a theory, and the measure of evidence is the measure of this change in belief, that is, the likelihood-ratio and log likelihood-ratio you refer to. I never meant for "evidence" to refer to the posterior strength of belief. "Log odds" was only meant to specify a particular measurement of strength in belief.
0Paul Crowley
Can you be clearer? Log likelihood ratios do add up, so long as the independence criterion is satisfied (ie so long as P(E_2|H_x) = P(E_2|E_1,H_x) for each H_x).
Sure, just edited in the clarification: "you have to multiply odds by likelihood ratios, not odds by odds, and likewise you don't add log odds and log odds, but log odds and log likelihood-ratios".
1Morendil
As long as there are only two H_x, mind you. They no longer add up when you have three hypotheses or more.
0Paul Crowley
Indeed - though I find it very hard to hang on to my intuitive grasp of this!
Here is the post on information theory I said I would write: http://lesswrong.com/lw/1y9/information_theory_and_the_symmetry_of_updating/ It explains "mutual information", i.e. "informational evidence", which can be added up over as many independent events as you like. Hopefully this will have restorative effects for your intuition!
Don't worry, I have an information theory post coming up that will fix all of this :)
1Cyan
There's lots of papers on combining p-values.
2JGWeissman
Well, just looking at the first result, it gives a formula for combining n p-values that as near as I can tell, lacks the property that C(p1,p2,p3) = C(C(p1,p2),p3). I suspect this is a result of unspoken assumptions that the combined p-values were obtained in a similar fashion (which I violate by trying to combine a p-value combined from two experiments with another obtained from a third experiment), which would be information not contained in the p-value itself. I am not sure of this because I did not completely follow the derivation. But is there a particular paper I should look at that gives a good answer?
0Cyan
I haven't actually read any of that literature -- Cox's theorem suggests it would not be a wise investment of time. I was just Googling it for you.
0JGWeissman
Fair enough, though it probably isn't worth my time either. Unless someone claims that they have a good general method for combining p-values, such that it does not matter where the p-values come from, or in what order they are combine, and can point me at one specific method that does all that.

I recently started working through this Applied Bayesian Statistics course material, which has done wonders for my understanding of Bayesianism vs. the bag-of-tricks statistics I learned in engineering school.

6Seth_Goldin
So I finally picked up a copy of Probability Theory: The Logic of Science, by E.T. Jaynes. It's pretty intimidating and technical, but I was surprised how much prose there is, which makes it surprisingly palatable. We should recommend this more here on Less Wrong.
2Erebus
Just remember that Jaynes was not a mathematician and many of his claims about pure mathematics (as opposed to computations and their applications) in the book are wrong. Especially, infinity is not mysterious.
0thomblake
It should be obvious that infinity (like all things) is not inherently mysterious, and equally obvious that it's mysterious (if not unknown) to most people.
0Erebus
Infinity is mysterious was intended as a paraphrase of Jaynes' chapter on "paradoxes" of probability theory, and I intended mysterious precisely in the sense of inherently mysterious. As far as I know, Jaynes didn't use the word mysterious himself. But he certainly claims that rules of reasoning about infinity (which he conveniently ignores) are not to be trusted and that they lead to paradoxes.

Bayesianism is more than just subjective probability; it is a complete decision theory.

A decent summary is provided by Sven Ove Hansson:

1. The Bayesian subject has a coherent set of probabilistic beliefs.
2. The Bayesian subject has a complete set of probabilistic beliefs.
3. When exposed to new evidence, the Bayesian subject changes his (her) beliefs in accordance with his (her) conditional probabilities.
4. Finally, Bayesianism states that the rational agent chooses the option with the highest expected utility.

3wnoise
What Bayescraft covers is a matter of tendentious definitions. I personally do not consider decision theory a necessary part of it, though it is certainly part of we're trying to capture at LessWrong.
7Douglas_Knight
I agree. The line between belief and decision is the line between 3 and 4 in that list and it is such a clean line that the von Neumann-Morgenstern axioms can be (and usually are) presented about a frequentist world.

"A might be the reason for symptom X, then we have to take into account both the probability that X caused A"

I think you have accidentally swapped some variables there

0Kaj_Sotala
Thanks, fixed.

It seems there are a few meta-positions you have to hold before taking Bayesianism as talked about here; you need the concept of Winning first. Bayes is not sufficient for sanity, if you have, say, an anti-Occamian or anti-Laplacian prior.

What this site is for is to help us be good rationalists; to win. Bayesianism is the best candidate methodology for dealing with uncertainty. We even have theorems that show that in it's domain it's uniquely good. My understanding of what we mean by Bayesianism is updating in the light of new evidence, and updating correctly within the constraints of sanity (cf Dutch books).

3Seth_Goldin
We can discuss both epistemic and instrumental rationality.
3prase
You are right that Bayesianism isn't sufficient for sanity, but why should it prevent a post explaining what Bayesianism is? It's possible to be a Bayesian with wrong priors. It's also good to know what Bayesianism is, especially when the term is so heavily used. My understanding is that the OP is doing a good job keeping concepts of winning and Bayesianism separated. The contrary would conflate Bayesianism with rationality.
3Kevin
Jonathan's post doesn't seem like much of an argument but more of criticism. There's lots more to write on this topic.

The penultimate paragraph about our beliefs isn't about Bayesianism so much as heuristics and biases. Unless you were a Bayesian from birth, for at least part of your life your beliefs evolved in a crazy fashion not entirely governed by Bayes' theorem. It is for this reason that we should be suspicious of the beliefs based on assumptions we've never scrutinized.

0Kaj_Sotala
Thanks! And interestingly, I find myself looking at my upvotes here and there and wondering what the appropriate "conversion rate" is for purposes of feeling good over a successful post. I've now gotten 31 upvotes there, but only 13 here.
2Kevin
By any standard you had a successful Hacker News post -- it was on the front page for most of the morning, which is good. The number of votes is not meaningful at all on Hacker News so there's no conversion rate. Also, I strongly suspect that many of the initial early votes on HN came from primary LW users following my link and then upvoting, possibly even people that didn't upvote it on LW.

The 'Intuitive Explanation' link has changed to http://yudkowsky.net/rational/bayes

Or take the debate we had on 9/11 conspiracy theories. Some people thought that unexplained and otherwise suspicious things in the official account had to mean that it was a government conspiracy. Others considered their prior for "the government is ready to conduct massively risky operations that kill thousands of its own citizens as a publicity stunt", judged that to be overwhelmingly unlikely, and thought it far more probable that something else caused the suspicious things.

Don't forget the prior: "The official account of big conflicts...

4fubarobfusco
"Governments in general, and the U.S. in specific, have a history of lying to justify war. I can think of several incidents where an official casus belli turned out to be either a lie, as in the second Gulf of Tonkin incident or the Iraqi WMD allegation; or at least significantly doubtful, such as the sinking of the Maine. In these cases, the 'conspiracy theorists' and peace activists were right; and I can't think of any where they were wrong. So they have more credibility than the official report."
2ChristianKl
Knowing that the official report contains information that's false, doesn't lead you to know what's true.

Others considered their prior for "the government is ready to conduct massively risky operations that kill thousands of its own citizens as a publicity stunt", judged that to be overwhelmingly unlikely,

Here I have to take objection: you framed it as a publicity stunt but actually 9-11 has shaped everything in the USA: domestic policies, foreign policies, military spending the identity of the nation as a whole(It's US vs. THEM) etc... So there is a lot at stake.

Btw, as far as the willingness of the government to kill its own citzens goes, more...

7Jack
The controlling feature for this prior isn't "willingness to kill own citizens" or "publicity stunt" but "massively risky". "Massively risky" is actually an incredible understatement. We're talking about people already at the top of the social hierarchy risky death and eternal shame for them and their families in hopes the hundreds of people part of the conspiracy keep quiet and that no damning evidence of a remarkable complicated plot is left behind. The government's willingness to kill it's own citizens, such as it is, less often carries over to civilians and even less often carries over to rich white people on Wall Street. And for something that has help shaped the country... well remarkably little has changed in the direction that administration wanted to things to go. Indeed, why in all those years of waning popularity, wouldn't they try something like it again (maybe foil the attempt this time). If they're so powerful why not get someone else elected President?
6Alicorn
You know, I have little interest in 9/11 Truth, but I have no patience for the "but it would be so obvious" reply to Truthers. Here is how that conversation translates in my head: Truther: I think the towers came down due to a deliberate demolition by our government. I think this because thus and so. Non-Truther: But the government would never have done anything so easy to find out about, because it would carry massive risk. Everybody would know about it. Truther: Well, if people were paying attention to thus and so, they'd know - Non-Truther: BUT SINCE I DIDN'T ALREADY KNOW ABOUT THUS AND SO IT'S CLEARLY NOT SOMETHING EVERYBODY KNOWS ABOUT AND I CAN'T HEAR YOU NANANANANANANANA.
2Jack
Just to clarify: Do you think that is what I'm doing here?
3Alicorn
It was at least strongly reminiscent, enough that under your comment seemed like a good place to put mine, but I did not intend to attack you specifically.
1PeerInfinity
obligatory XKCD comic: http://xkcd.com/690/ (actually, that's not as relevant as I first though, but I'll go ahead and post it here anyway)
0ata
A little bit more relevant: http://imgur.com/bx1th.png
1[anonymous]
I believe you were unfairly voted down. Your recasting shows that this is essentially an appeal to authority, with the authority being "everyone else".
-3roland
Well, there is a lot of evidence left behind and that has been cited over and over. AFAIK none of the people killed was exceptionally rich and/or powerful. Wait, what??? Someone else? What are you talking about, every President in the last decades has been a member of one of the same two parties. Obama has not significantly changed the foreign policy and is moving in the same direction.
0Jack
Well we're talking about the prior. Obviously we can then update on the evidence whatever that is. People will also disagree about what the evidence means but the point is this is a really unlikely even you guys are claiming took place. We can interpret the evidence but strange coincidences or some video footage not being released is not close to sufficient for me to suddenly start believing 9/11 was an inside job. I don't know what exceptionally means here but, ya know, the WTC wasn't a homeless shelter. ... Look, I have no idea what your particular conspiracy is. So it is a little hard to examine the supposed motivations. My comments made sense given certain assumptions about what the motivations of such a conspiracy would be. Obviously they aren't your assumptions so share yours.
-5roland
5Jonathan_Graehl
Well argued, but if you credit the U.S. government such brazen cruelty toward the citizens it nominally serves, then why would the government need a pretense at all? Why not invade with only forged documents and lies? No self-inflicted wound should be necessary; the U.S. military may not fear intervention by other nations' forces if they appear to only pick on a few small oil-rich nations.
3roland
Forged documents and lies are not enough to convince the public opinion or better to arouse strong emotions, something more salient is needed. You have to remember, at 9-11 basically the whole world stood still watching the events unfold. Wikipedia: http://en.wikipedia.org/wiki/September_11_attacks#cite_note-155 Btw article 5 allows the use of armed(military) force. This was the official NATO position even before there was any investigation as to who was supposedly behind the "attacks". Anyone arguing against military action can be and still is decried as unpatriotic, callous towards the families of those who died. You cannot achieve this with just a batch of documents.

I think this parenthetical statement should maybe be a footnote or something, because it makes the and part of the sentence too far away from the both part. Or maybe put it in the following sentence? I got a little lost.

Doesn't "Bayesianism" basically boil down to the idea that one can think of beliefs in terms of mathematical probabilities?

-1PhilGoetz
That's like saying that Sunni beliefs boil down to belief in Islam.
2brazil84
Following your analogy, what is the equivalent to Shia Islam? Put another way: Bayesianism as opposed to what?
2PhilGoetz
Frequentism, according to the posters here. Unless I misunderstand what you mean by thinking of a belief in terms of probabilities.
8wnoise
But the standard Frequentist stance is that probabilities are not degrees of belief, but solely long term frequencies in random experiments.
4PhilGoetz
Most "frequentists" aren't such sticklers about terminology. Most people who attach probabilities to beliefs in knowledge representations - say, AI systems - are more familiar with frequentist than Bayesian methodology.
4wnoise
Okay, so most people who use statistics don't know what they're talking about. I find that all too plausible.
-1brazil84
I looked up "Frequentism" on Wikipedia . . . .I don't understand your point. What concept am I omitting by characterizing "Bayesianism" the way I did?
4PhilGoetz
Google frequentist instead of frequentism. It's the usual way of doing statistics and working with probabilities.
0brazil84
I did and I still don't understand your point. Again my question: Exactly what concept am I omitting by characterizing "Bayesianism" the way I did?
0Cyan
I PM'ed you regarding this thread. (I mention it here because I seem to recall that you're subject to a bug that prevents you from getting message/reply notifications.)

Core tenet 3: We can use the concept of probability to measure our subjective belief in something. Furthermore, we can apply the mathematical laws regarding probability to choosing between different beliefs. If we want our beliefs to be correct, we must do so.

Frequently misunderstood. E.g. you have propositions A and B , you mistakenly consider that probably either one of them will happen, and you may give me money if you judge P(A)/P(B) > some threshold.

If both A and B happen to be unlikely, I can use that to make arguments which only prompt you to...

Sub-tenet 1: If you experience something that you think could only be caused by cause A, ask yourself "if this cause didn't exist, would I regardless expect to experience this with equal probability?" If the answer is "yes", then it probably wasn't cause A.

I don't understand this at all - if you experience something that you think could only be caused by A, then the question you're supposed to ask yourself makes no sense whatsoever: absent A, you would expect to never experience this thing, per the original condition! And if the a...

5JGWeissman
The point is that people can erroneously report, even to themselves, that they believe their experience could only be caused by cause A. Asking the question if you would still anticipate the experience if cause A did not exist is a way of checking that you really believe that your experience could only be caused by cause A. More generally, it is useful to examine beliefs you have expressed in high level language, to see if you still believe them after digging deeper into what that high level language means.
0FAWS
I think that the inconsistency of such a position was the point. It would probably be better phrased as "... something that has to be caused by cause A" (or possibly just "proof of A"), which is effectively equivalent, but IMO something that someone who would answer yes to the following question could plausibly have claimed to believe (i. e. I wouldn't be very surprised by the existence of people who are that inconsistent in their beliefs) .
. Further suppose that there are two reasons for why people get headaches: they might have a brain tumor, or they might have a cold.

Or, if you're very unlucky, you could have a headache and a brain tumor.... :3

A brain tumor always causes a headache, but exceedingly few people have a brain tumor. In contrast, a headache is rarely a symptom for cold, but most people manage to catch a cold every single year. Given no other information, do you think it more likely that the headache is caused by a tumor, or by a cold?

Given no other information, we don't know which is more likely. We need numbers for "rarely", "most", and "exceedingly few". For example, if 10% of humans currently have a cold, and 1% of humans with a cold have a heada...

You're missing the point. This post is suitable for an audience whose eyes would glaze over if you threw in numbers, which is wonderful (I read the "Intuitive Explanation of Bayes' Theorem" and was ranting for days about how there was not one intuitive thing about it! it was all numbers! and graphs!). Adding numbers would make it more strictly accurate but would not improve anyone's understanding. Anyone who would understand better if numbers were provided has their needs adequately served by the "Intuitive" explanation.

[-]pjeby150

Agreed, I did not find the "Intuitive Explanation" to be particularly intuitive even after multiple readings. Understanding the math and principles is one thing, but this post actually made me sit up and go, "Oh, now I see what all the fuss is about," outside a relatively narrow range of issues like diagnosing cancer or identifying spam emails.

Now I get it well enough to summarize: "Even if A will always cause B, that doesn't mean A did cause B. If B would happen anyway, this tells you nothing about whether A caused B."

Which is both a "well duh" and an important idea at the same time, when you consider that our brains appear to be built to latch onto the first "A" that would cause B, and then stubbornly hang onto it until it can be conclusively disproven.

That's a "click" right there, that makes retroactively comprehensible many reams of Eliezer's math rants and Beisutsukai stories. (Well, not that I didn't comprehend them as such... more that I wasn't able to intuitively recreate all the implications that I now think he was expecting his readers to take away.)

So, yeah... this is way too important of an idea to have math associated with it in any way. ;-)

3PlatypusNinja
Personally it bothers me that the explanation asks a question which is numerically unanswerable, and then asserts that rationalists would answer it in a given way. Simple explanations are good, but not when they contain statements which are factually incorrect. But, looking at the karma scores it appears that you are correct that this is better for many people. ^_^;
2SilasBarta
I thought Truly Part of you is an excellent introduction to rationalism/Bayesianism/Less Wrong philosophy that avoids much use of numbers, graphs, and technical language. So I think it's more appropriate for the average person, or for people that equations don't appeal to. Does anyone who meets that description agree? And could someone ask Alicorn if she prefers it?
2djcb
Hmmmm.... that's an interesting article too, but it focuses on a different question, the question what knowledge really means, and uses AI concepts to discuss that (somewhat related to Searle's Chinese Room gedankenexperiment.) However, I think the article discussed here is a bit more directly connected to Bayesianism. It's clear what Bayes Theorem means, but what many people today mean with Bayesianism, is somewhat of a loose extrapolation of that -- or even just a metaphor. I think the article does a good job at explaining the current use.
I guess this is the wrong place for this comment but i don't know where else to put it and after reading the extensive threads on 9/11 below i felt this was a valid point. If someone objects to this being here i'll move it to somewhere more appropriate. It looks like i'm a bit out of date with the discussion anyway.

