Bayes is Out-Dated, and You’re Doing it Wrong

7RobertM

5Vladimir_Nesov

2RobertM

-1AnthonyRepetto

3RobertM

1AnthonyRepetto

11Jonas Moss

3AnthonyRepetto

7quanticle

1M. Y. Zuo

6simon

-3AnthonyRepetto

9simon

-7AnthonyRepetto

3simon

0AnthonyRepetto

3simon

-2AnthonyRepetto

1simon

-2AnthonyRepetto

0AnthonyRepetto

0AnthonyRepetto

5xepo

-5AnthonyRepetto

5Adam Shai

0AnthonyRepetto

2Adam Shai

-4AnthonyRepetto

4Dagon

2LVSN

-2AnthonyRepetto

1LVSN

-3AnthonyRepetto

-1LVSN

1AnthonyRepetto

1LVSN

-1AnthonyRepetto

-5AnthonyRepetto

3LVSN

1duck_master

1[anonymous]

-2AnthonyRepetto

1[anonymous]

-3AnthonyRepetto

New Comment

This post, and many of @AnthonyRepetto's subsequent replies to comments on it, seem to be attacking a position that the named individuals don't hold, while stridently throwing out a bunch of weird accusations and deeply underspecified claims. "Bayes is persistently wrong" - about what, exactly?

Content like this should include specific, uncontroversial examples of all the claimed intellectual bankruptcy, and not include a bunch of random (and wrong) snipes.

I'm rate-limiting your ability to comment to once per day. You may consider this a warning; if the quality of your argumentation doesn't improve then you will no longer be welcome to post on the site.

Content like this should include specific, uncontroversial examples of all the claimed intellectual bankruptcy

That's not the problem here and this is a bad general rule.

That's definitely *one* of the problems with this post, and while rudeness is generally undesirable it's slightly more forgiveable when there's some evidence of the thing that "justifies" it.

"Content like this should include specific, uncontroversial examples of all the claimed intellectual bankruptcy, and not include a bunch of random (and wrong) snipes."

I did in fact include empirical metrics of Dirichlet's superiority and how Bayes' Theorem fails in contrast: industry uses it, after they did their own tests, which is empiricism at work. I also showed how Dirichlet Process allows you to compute Confidence Intervals, while Bayes' Theorem is incapable of computing Confidence Intervals. I also explained how, due to the median of the likelihood function being closer to an equal distribution than Bayes would expect, Bayes is persistently biased toward whichever extrema might be observed in the sample. Thus, Bayes' Theorem will consistently mis-estimate; it's persistently wrong, and Dirichlet was developed as the necessary adjustment. So, I did give explicit reasons why Bayes' Theorem is inadequate compared to the modern, standard approach which has empirical backing in industry.

It seems like you want to rate-limit me for an *unspecified* duration? What are the empirical metrics for that rate-limit being removed? And, the fact that you claim I "didn't provide specific, uncontroversial examples," when I just showed you those specifics again here, implies that you either weren't reading everything very carefully, or you want to mischaracterize me to silence any opposition of your preferred technique: Bayes'-Theorem-by-itself.

The missing examples are for claims of the form:

The Rationalists repeatedly rely upon sparse evidence, while claiming certainty

They have

self-selectedfor a community of people who call Bayes the be-all-end-all, all of them agreeing they’re right, and they don’t know that they’re horribly wrong… because they don’t check!

...then you DON’T know the be-all-end-all statistical technique — and neither do Scott Alexander or Eliezer Yudkowski, as much as they’d like you to believe otherwise.

I would not be surprised if some random "rationlist" you ran into somewhere was sloppy or imprecise with their usage of Bayes. I would also not be surprised if you misinterpreted some offhand comment as an unjustified claim to statistical rigor. Maybe it was some third, other thing.

As an aside, all the ways in which you claim that Bayes is wrong are... wrong? Applications of the theorem gives you wrong results insofar as the inputs are wrong, which in real life is ~always, and yet the same is true of the techniques you mention (which, notably, rely on Bayes). There is always the question of what tool is best for a given job, and here we circle back to the question of where exactly this grevious misuse of Bayes is occurring.

It seems like you want to rate-limit me for an

unspecifiedduration? What are the empirical metrics for that rate-limit being removed? And, the fact that you claim I "didn't provide specific, uncontroversial examples," when I just showed you those specifics again here, implies that you either weren't reading everything very carefully, or you want to mischaracterize me to silence any opposition of your preferred technique: Bayes'-Theorem-by-itself.

Deeply uncharitable interpretations of others' motives is not something we especially tolerate on LessWrong.

Ah, first: you DID claim that I "didn't provide specific, uncontroversial examples" and I HAD given such for why Bayes' Theorem is inadequate. Notice that you made your statement in this context:

<<"Bayes is persistently wrong" - about what, exactly?

Content like this should include specific, uncontroversial examples>>

In that context, where you precede "this" with my statement about Bayes, I naturally took "content like this" to be referring to my statement that "Bayes is persistently wrong." I hope you can see how easy it would be for me to conclude such a thing, considering "this" refers to... the prior statement?

You now move your goal-posts by insisting that my statement "Rationalists repeatedly rely upon sparse evidence, while claiming certainty" was ACTUALLY the argument I had to support with specifics... while if I were to give such specifics, I would have betrayed individual confidences, which is unethical. So, no, I'll continue to assert without specifics, for the sake of confidences, that "Rationalists repeatedly rely upon sparse evidence, while claiming certainty" because MULTIPLE rationalist over the past YEAR have done so, NOT an isolated incident or an off-hand joke, as you *assume*.

Your assumption that my "amalgam of rationalists I've met over the last year" was somehow a one-off or cursory remark is your OWN uncharitable interpretation; you are dismissing my repeated interactions with your community; such has been the *norm*. Similarly, in the EA Forum post "Doing EA Better" - a group of risk analysts had been spending a year trying to tell EA that "you're doing risk-assessment wrong; those techniques are out-dated," and EA members kept insisting their way was fine and right. Eventually, that nearly-dozen folks sat down and scribed an essay to EA... and EA pointedly ignored that fact they mentioned! "EA dismisses experts when experts tell EA they're using out-dated techniques." I'm seeing a similar pattern across the Rationalist community, NOT a one-off event or a casual remark; they were using Bayes' Theorem improperly, as the substance of arguments made in response to me.

"As an aside, all the ways in which you claim that Bayes is wrong are... wrong?"

Bayesian Inference is a good and real thing. And, Bayes' Theorem is an old formula, used in Bayesian Inference. AND Bayes' Theorem cannot produce Confidence Intervals, nor will it allocate to minimize the cost of being wrong, nor does it make adjustments for samples' bias toward the extrema. Those are all specific ways where "I just plug it into Bayes' Theorem" is factually wrong. You keep claiming that my critique is wrong - but you only do so vaguely! You skip right past these failures of Bayes' Theorem, each time I mention them. Check the math books: there is NO "question of what tool is best for a given job," as you say - rather, Bayes' Theorem alone is NEVER the tool. You'll have to adjust in many ways, not just one. And if you don't do so, you are in fact using an obsolete technique during your Bayesian Inference.

Roughly speaking, we can divide Bayesianism into two, maybe three or more, separate but related meanings:

1. **Adherence to a form of Bayesian epistemology.** You think that knowledge comes in degrees of belief, and the correct way to update your beliefs on seeing new information is to use Bayes theorem. It's usually done informally.

2. **Adherence to Bayesian statistics.** You believe that frequentist inference is invalid and that frequentist measures of an estimator's quality should not be used. Instead, you prefer to use precisely defined priors and likelihoods, derive their posteriors, and report a quantity based solely on that. Moreover, you would often espouse some form of Bayesian decision theory - i.e., you have a loss function in addition to your prior and likelihood, and report (or act on) the optimal decision according to your framework. All of this is usually done formally.

Your comments about Dirichlet don't make sense. Are you thinking about the Dirichlet distribution? If so, it is more widely used in Bayesian statistics than frequentist statistics, as it is the conjugate prior to the multinomial distribution. Regarding your comments about the SAS institute, I can say this: Most of the members of this forum are deeply interested in deep learning. Is deep learning Bayesian? No. Not even Bayesian deep learning is properly Bayesian. Does that matter to you, as a Bayesian epistemologist? No, as deep learning has little to nothing to do with epistemology. Does it matter to you, as a Bayesian statistician? No, as deep learning is not about inference or decision theory, which is what Bayesian statisticians care about (for the most part).

By the way, Bayes theorem isn't a "statistical technique", it's just a theorem. Used by all statisticians without a second thought. It's when you use it to do inference you become a Bayesian statistician.

I haven't observed any rationalists here using Dirichlet, and no, I wasn't talking about Bayesian vs. Frequentist; Bayesians are correct. Using Bayes Theorem when you didn't consider the probability of each possibly population producing your observed sample? That's definitely you doing it wrong. Instrumentation has variability; Dirichlet is how you include that, too.

Criticizing the use of Bayes Theorem because it's 260 years old is such a weird take.

The Pythagorean theorem is literally *thousands* of years old. But it's still useful, even though lots of progress has been made in trigonometry since then. Should we abandon , as a result?

This does seem like a laughable conclusion. Imagine the implications for the world if this line of reasoning became the accepted paradigm!

Though I'm hesitant to outright dismiss anyone willing to put in effort into writing a post, the author here really needs to rewrite their post to remove all the self-imposed absurdities.

If you have a real argument that the prior is reliably best obtained via a Dirichlet process and no other method of coming up with a prior is ever more useful, then make the argument.

I see:

- argument from authority/prestige
- argument from age (as if math changes over time)
- straw/weakmanning ("These Rationalists pick the Prior that they *prefer*."; "The Rationalists repeatedly rely upon sparse evidence, while claiming certainty")

Dirichlet is used by industry, NOT Bayes. What is your rebuttal to that, to show that Bayes is in fact superior to Dirichlet?

The wiki article on the Dirrchlet process includes:

In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

I.e. it isn't an alternative to Bayes, but rather a way of coming up with a prior.

And, I never claimed that priors are better obtained with Dirichlet than Bayes... I'm not sure what you were reading, could you quote the section where you thought I was making that claim?

I don't know if I'm missing something, but it sounds like you are discussing for a particular method of picking a prior within a Bayesian context, but you are not arguing against Bayes itself. If anything, it seems to me this is pro-Bayes, just using DIrilecht Processes as a prior.

Well I don't know SAS at all but a quick search of the SAS documentation for dirilecht calls it a "nonparametric Bayes approach"...

https://documentation.sas.com/doc/en/casactml/8.3/casactml_nonparametricbayes_details12.htm

SAS has developed their own trade-secret that outperforms all public methods; by definition, that MUST not be what YOU do when you apply Bayes to a few personal examples.

The downvotes are predictable - not only is it mis-stating a strawman of the group's position, it uses a lot of exclamation points to emphasize how stupid we all are.

However, it's also got some pretty good points, especially as some of the adjacent social groups are exploding, in part due to untenable extrapolation over unstable premeses.

I can't tell if you're right, because no one has ever laid out Bayesianism as a set of definition and instruction steps, explained what Bayesianism uniquely relevantly achieves, and explored the relevant consequences of making various tempting-from-some-perspective mutations on the instruction set; those are the steps required to elevate a person's grasp of Bayesianism to true understanding.

You have also not followed those steps with Dirichlet and SAS, and compared it to Bayesianism.

Still I have an intuition that your complaint about using personal experience is not virtuous. Everything you learn has to pass through personal experience. If your other ways of becoming informed had not been conceived, learning from personal experience would still be possible. Information is information no matter how seriously you take it, and I think personal experience is worth taking seriously, as a person who concerns themself with misleadingness, in a world full of people who attend only to the truth of what they hear and not to the misleadingness.

You claim of Bayes and Dirichlet that "no one has ever laid (them) out", and to prove your claim, you link to another post that YOU wrote, where you claim it again? Check math textbooks; I don't have to teach you what's already available in the public sphere.

It was not to prove my claim; the post I wrote elaborates more fully on what I believe is the correct teaching process. If you read the post, it would become clear to you that my teaching standards have never been met in textbooks, and can hardly even in principle be met through textbooks. My teaching standards are not arbitrary; if these standards are not met then I will not truly understand the subject.

Your difficulty understanding it is NOT equivalent to "no one has ever laid them out". Those are two wildly different statements. A dyslexic person would have similar difficulty reading a novel, yet that is NOT equal to "no one ever wrote a book."

Feeling like you understand is not the same as actual understanding. People who read the existing explanations and feel like they understand, when the explanations did not follow the process I described, do not truly understand. My complaint is not that when I read the explanations I don't feel like I understand them; my complaint is that the extents to which Bayesianism have ever been laid out are insufficient for creating true understanding upon first reading.

Astounding! Then my argument that "NOT including Dirichlet is wrong" must have been wrong? Or else, why are you mentioning that no one taught you to your own satisfaction?

Then my argument that "NOT including Dirichlet is wrong" must have been wrong?

It could be right, actually. The only objection I made was in response to your objection to using personal experience, and I only talked about my intuition rather than what must or must not be the case.

Or else, why are you mentioning that no one taught you to your own satisfaction?

You seem to want to proselytize better epistemic methods, and I am telling you what I need from you in order to adopt or reject your advised methods from an engineering angle (which I regard as superior); until then I can only follow clues of lesser quality (such as the correlation between caring about misleadingness and tendency to say things that impress me as insightful); the detective angle.

As I understand it, in the event that you are correct and Dirichlet is better, rational Rationalists *must switch* to the better algorithm. Because rationality is about systematized winning, and if you are correct, this is a measurably better algorithm to win.

Yes! And, even since Dirichlet was published in 1973, it has ONLY ever been run on super-computers, using statistically significant sample sizes! You CANNOT do Dirichlet in your head, unless you are a Savant, and no math class will ask you to Dirichlet on a quiz. I'm not sure how ANYONE can claim Bayes is reliable, when NO ONE in industry touches it... your community has an immense blind-spot to real-world methods, yet you claim certainty and confidence - that's the Dunning-Kreugers self-selecting into a pod that all agree they're right to use Bayes.

That's only one piece of rationality, and I think the general conclusion was "ask an artificial intelligence you can trust" would be the only *scalable *way for humans to be genuinely rational in their decision-making. It does not matter what algorithm that machine uses internally, merely it is the best performing one from the class of "sufficiently trustworthy" choices.

Note this is a feasible thing to do, for example the activation function Swish was found this way.

A lot of the rest of it was dismissing *obviously wrong* individuals and institutions? You saw how you dismissed the idea of "start with a prior from the median of mainstream knowledge" and "update with each anecdote"?

The *thing is*, that method is arguably *better* than many institutions and individuals are. At least it *uses* information to make it's decision.

One of the tenants of "what does $authority_figure claim to know and *how does he know it"* allows you to dismiss obviously wrong/misaligned authorities on subjects.

Such as the FDA or machine learning scientists setting 2060 as the date for AGI. (the FDA is misaligned, it serves it's own interests not the interests of living Americans wanting to remain that way. the ML scientists did not account for an increase in investment or recursive improvement)

There are a lot of other ideas and societal practices that are simply based on bullshit, no actual thought or process was even followed to generate them, they are usually just parroting some past flawed idea. Like what you said regarding Bayes.

Then why does industry use Dirichlet, not Bayes? You keep pretending yours is better, when everyone who has to publish physics used additional methods, from this century. None of you explain why industry would use Dirichlet, if Bayes is superior. Further, why would Dirichlet even be PUBLISHED unless it's an improvement? You completely disregard these blinding facts. More has happened in the last 260 years than just Bayes' Theorem, and your suspicion of the FDA doesn't change that fact.

<sharing it here, too, though I can already imagine the reaction...>

~

a community of Bayes-enthusiasts fumble statistical inference~TL;DR— Industry uses Dirichlet Process and SAS, NOT Bayes. Bayes is persistently *wrong* and lacks a great deal of important information. Supposed ‘rationalists’ cling to Bayes as the Ultimate Truth, without knowing enough Mathematics to know they’re wrong.“

Oh, well my Prior was <preferred assumption> but I guess I have to update with that one data-point that wandered into my life.” — multiple ‘Rationalists’ in my year of invading their gatheringsA weird thing is happening in the Bay Area, slowly creeping into the Zeitgeist: a group of

non-mathematicianshave decided they found the BEST statistical technique ever, and they want to use it to understand the whole world… but their technique is260 YEARS OLD, and we’ve done a LOT better since then. It’s called Bayes’ Theorem, published in 1763 — literally 260 candlesthisyear.Let’s get a sense of just how out-dated and bizarre it is, to insist you have the One-True-Method when it’s 260 years old: back in 1763, when Bayes was published, there was another new-fangled invention sweeping Europe — the Dutch Plough. That’s the plough used today by the Amish. Literally,

relying on Bayes to draw conclusions is like farming with an Amish plough; it’s hilariously inadequate, andcompletely dismissed by industry.That quote at the top is an amalgam of multiple conversations with the Effective Altruists and Astral Codex Ten ‘Rationalists’ (they made that term up to describe themselves); it’s a persistent theme in their conversations. And, it’s not even the *correct* use of Bayes! Let’s see why:

In Bayes’ Theorem, you begin with a Prior. These Rationalists pick the Prior that they *prefer*. Neutral Bayesian Priors, however, are the average of all possible assumptions, NOT you’re preferred place to start. These folks’

firststep is a disastrous error. Then, when they say “I guess I should update my Prior…” Wait! Why in the world would you ever feel confidence about a belief, whenthe ONLY thing you have is a Prior? A Prior is, by definition, the state of “no information” when one should have intellectual humility, not certainty!Then, they are updating their Bayesian estimate using…. a *few* examples? The Rationalists repeatedly rely upon sparse evidence, while claiming certainty, as if “Statistically Significant Sample Size” just isn’t a thing. Bayes doesn’t *need* statistically significance, apparently! Finally, those examples they use are culled from

personal experience. I hope I don’t have to explain to anyone why we need to collect a random sample from representative sub-populations? The supposedly rational Bayes-fans fail on eachpossiblecount.So, if they correct those mistakes, can they then rely on Bayes to find their precious truths?

Nope. Bayes is consistently wrong,reliably. That’s why industry doesn’t use it. They’d lose money. Dirichlet lets them make money, because it works better. That’s a stronger proof, empirically, than all the rationalizations of their community’s prominent Bayes-trumpeters: a fiction writer and a psych councilor, both of whom lack relevant experience with statistical analysis software and techniques.In particular, the blog of that psych councilor, “Astral Codex Ten” has a tag-line: it quotes Bayes’ Theorem, and follows by saying “

all else is commentary.” Everyone who reads his blog, and who then DOESN’T check what statistical techniques are used in the real world,stays there as part of the community. They haveself-selectedfor a community of people who call Bayes the be-all-end-all, all of them agreeing they’re right, and they don’t know that they’re horribly wrong… because they don’t check!Think about this for a moment: if you state Bayes’ Theorem, and then claim “

all else is commentary” while recommending readers use Bayes, you are implicitly claiming “NO further improvementsin statistical analysis have occurred in the260 yearssince Bayes was published; Student-t Distributions, Levi Distributions, they don’t even need to exist!” That’s the core tenet of the Bay Area Rationalists’ luminary, addicted to Bayes.Wait, so why and how is Dirichlet such an improvement?

Let’s imagine you took a survey in some big city, and found (unsurprisingly) a majority Democrats — it was a 60/40 split, on the nose. That sample’s split is also the “maximum likelihood” for the potential Population. Said another way, “The real-world population which is

most likelyto give you a 60/40 sample is a 60/40 population.” But, does that make 60/40 your best guess for the real population? No.Imagine each possible population, one at a time. There’s the 100% Democrat population, first — what is the *likelihood* of such a population producing a 60/40 sample? Zero. What about 99% Democrat? Well, then it’ll depend upon how *many* people you surveyed, but there is just a

tinychance the real population is 99% Democrat! Keep doing that, for every population, all the way to 99% Republican, then 100% Republican. Whew! Now, you have a *likelihood* distribution, the “likelihood of population X generating sample Y.”When we look at this distribution, for data that falls in two buckets (D/R), then we’ll notice something: the *peak* likelihood is at 60/40, but there’s ALSO a bunch of probability-mass on the 50/50 side of the curve, creating a tilt to the over-all probability. While the ‘mode’ of the likelihood distribution is still the 60/40 estimate, the actual ‘mean’ of that distribution is closer to 50/50, every time! You *should* expect that the true population is closer to an *equal division* among buckets. When you collect more samples, you narrow that distribution of likelihoods, so you see less drift toward 50/50. That’s the reason you want a ‘statistically significant sample size’.

Let’s look at that other aspect Dirichlet possesses, which Bayes wholly lacks:

Confidence!When you look at the likelihood of each population, the chance of it producing your observed sample, you can also ask: “How far AWAY from our best guess would we need to place boundaries, such that we include 95% of the possible populations’ likelihoods within our bounds?” That’s called your Confidence Interval! You may have only learned the trimmed-down simplicities and z-score tables in your Stat 101 class, but there’s a reason for why they can claim confidence: that interval of population-estimates contains 95% of the likelihood-distribution’s probability-mass!

Finally, let’s consider “the cost of being wrong”. Bayes

doesn’tbalance your prediction according to the cost of being wrong; Dirichlet’s distribution over potential populations can simply be *multiplied* by the cost of each error-distance, and then the mode of that distribution will “minimize the COST of being WRONG.” You can even multiply by costs which are discontinuous or ranges, producing high and low bounds and nuanced thresholds of risk. Definitely better than Bayes.Now, Dirichlet isn’t even the be-all-end-all… it was published in 1973,

THIS year! SAS has trade secrets since the 70’s, and invests50 years old2.5xmore into R&D than theTECH-industry average! If you want to pass muster for pharmaceuticals in front of the FDA,you send all your data to SAS. It’s required, because they’re soooo damn GOOD! So, unless you work at SAS (which has thehighest profits per employee hourof all companies onEarth, and has expanded consistently since 1976… consistently rated one of the best employers on the planet…) then you DON’T know the be-all-end-all statistical technique — and neither do Scott Alexander or Eliezer Yudkowski, as much as they’d like you to believe otherwise. Just for reference, when “you think you’re right BECAUSE you don’t know enough to know you’re wrong,” that’s called the Dunning-Kreuger Effect, dear Rationalists.