# -45

<sharing it here, too, though I can already imagine the reaction...>

~ a community of Bayes-enthusiasts fumble statistical inference ~

TL;DR — Industry uses Dirichlet Process and SAS, NOT Bayes. Bayes is persistently *wrong* and lacks a great deal of important information. Supposed ‘rationalists’ cling to Bayes as the Ultimate Truth, without knowing enough Mathematics to know they’re wrong.

Oh, well my Prior was <preferred assumption> but I guess I have to update with that one data-point that wandered into my life.” — multiple ‘Rationalists’ in my year of invading their gatherings

A weird thing is happening in the Bay Area, slowly creeping into the Zeitgeist: a group of non-mathematicians have decided they found the BEST statistical technique ever, and they want to use it to understand the whole world… but their technique is 260 YEARS OLD, and we’ve done a LOT better since then. It’s called Bayes’ Theorem, published in 1763 — literally 260 candles this year.

Let’s get a sense of just how out-dated and bizarre it is, to insist you have the One-True-Method when it’s 260 years old: back in 1763, when Bayes was published, there was another new-fangled invention sweeping Europe — the Dutch Plough. That’s the plough used today by the Amish. Literally, relying on Bayes to draw conclusions is like farming with an Amish plough; it’s hilariously inadequate, and completely dismissed by industry.

That quote at the top is an amalgam of multiple conversations with the Effective Altruists and Astral Codex Ten ‘Rationalists’ (they made that term up to describe themselves); it’s a persistent theme in their conversations. And, it’s not even the *correct* use of Bayes! Let’s see why:

In Bayes’ Theorem, you begin with a Prior. These Rationalists pick the Prior that they *prefer*. Neutral Bayesian Priors, however, are the average of all possible assumptions, NOT you’re preferred place to start. These folks’ first step is a disastrous error. Then, when they say “I guess I should update my Prior…” Wait! Why in the world would you ever feel confidence about a belief, when the ONLY thing you have is a Prior? A Prior is, by definition, the state of “no information” when one should have intellectual humility, not certainty!

Then, they are updating their Bayesian estimate using…. a *few* examples? The Rationalists repeatedly rely upon sparse evidence, while claiming certainty, as if “Statistically Significant Sample Size” just isn’t a thing. Bayes doesn’t *need* statistically significance, apparently! Finally, those examples they use are culled from personal experience. I hope I don’t have to explain to anyone why we need to collect a random sample from representative sub-populations? The supposedly rational Bayes-fans fail on each possible count.

So, if they correct those mistakes, can they then rely on Bayes to find their precious truths? Nope. Bayes is consistently wrong, reliably. That’s why industry doesn’t use it. They’d lose money. Dirichlet lets them make money, because it works better. That’s a stronger proof, empirically, than all the rationalizations of their community’s prominent Bayes-trumpeters: a fiction writer and a psych councilor, both of whom lack relevant experience with statistical analysis software and techniques.

In particular, the blog of that psych councilor, “Astral Codex Ten” has a tag-line: it quotes Bayes’ Theorem, and follows by saying “all else is commentary.” Everyone who reads his blog, and who then DOESN’T check what statistical techniques are used in the real world, stays there as part of the community. They have self-selected for a community of people who call Bayes the be-all-end-all, all of them agreeing they’re right, and they don’t know that they’re horribly wrongbecause they don’t check!

Think about this for a moment: if you state Bayes’ Theorem, and then claim “all else is commentary” while recommending readers use Bayes, you are implicitly claiming “NO further improvements in statistical analysis have occurred in the 260 years since Bayes was published; Student-t Distributions, Levi Distributions, they don’t even need to exist!” That’s the core tenet of the Bay Area Rationalists’ luminary, addicted to Bayes.

Wait, so why and how is Dirichlet such an improvement?

Let’s imagine you took a survey in some big city, and found (unsurprisingly) a majority Democrats — it was a 60/40 split, on the nose. That sample’s split is also the “maximum likelihood” for the potential Population. Said another way, “The real-world population which is most likely to give you a 60/40 sample is a 60/40 population.” But, does that make 60/40 your best guess for the real population? No.

Imagine each possible population, one at a time. There’s the 100% Democrat population, first — what is the *likelihood* of such a population producing a 60/40 sample? Zero. What about 99% Democrat? Well, then it’ll depend upon how *many* people you surveyed, but there is just a tiny chance the real population is 99% Democrat! Keep doing that, for every population, all the way to 99% Republican, then 100% Republican. Whew! Now, you have a *likelihood* distribution, the “likelihood of population X generating sample Y.”

When we look at this distribution, for data that falls in two buckets (D/R), then we’ll notice something: the *peak* likelihood is at 60/40, but there’s ALSO a bunch of probability-mass on the 50/50 side of the curve, creating a tilt to the over-all probability. While the ‘mode’ of the likelihood distribution is still the 60/40 estimate, the actual ‘mean’ of that distribution is closer to 50/50, every time! You *should* expect that the true population is closer to an *equal division* among buckets. When you collect more samples, you narrow that distribution of likelihoods, so you see less drift toward 50/50. That’s the reason you want a ‘statistically significant sample size’.

Let’s look at that other aspect Dirichlet possesses, which Bayes wholly lacks: Confidence!

When you look at the likelihood of each population, the chance of it producing your observed sample, you can also ask: “How far AWAY from our best guess would we need to place boundaries, such that we include 95% of the possible populations’ likelihoods within our bounds?” That’s called your Confidence Interval! You may have only learned the trimmed-down simplicities and z-score tables in your Stat 101 class, but there’s a reason for why they can claim confidence: that interval of population-estimates contains 95% of the likelihood-distribution’s probability-mass!

Finally, let’s consider “the cost of being wrong”. Bayes doesn’t balance your prediction according to the cost of being wrong; Dirichlet’s distribution over potential populations can simply be *multiplied* by the cost of each error-distance, and then the mode of that distribution will “minimize the COST of being WRONG.” You can even multiply by costs which are discontinuous or ranges, producing high and low bounds and nuanced thresholds of risk. Definitely better than Bayes.

Now, Dirichlet isn’t even the be-all-end-all… it was published in 1973, 50 years old THIS year! SAS has trade secrets since the 70’s, and invests 2.5x more into R&D than the TECH-industry average! If you want to pass muster for pharmaceuticals in front of the FDA, you send all your data to SAS. It’s required, because they’re soooo damn GOOD! So, unless you work at SAS (which has the highest profits per employee hour of all companies on Earth, and has expanded consistently since 1976… consistently rated one of the best employers on the planet…) then you DON’T know the be-all-end-all statistical technique — and neither do Scott Alexander or Eliezer Yudkowski, as much as they’d like you to believe otherwise. Just for reference, when “you think you’re right BECAUSE you don’t know enough to know you’re wrong,” that’s called the Dunning-Kreuger Effect, dear Rationalists.

# -45

New Comment

Roughly speaking, we can divide Bayesianism into two, maybe three or more, separate but related meanings:

1. Adherence to a form of Bayesian epistemology. You think that knowledge comes in degrees of belief, and the correct way to update your beliefs on seeing new information is to use Bayes theorem. It's usually done informally.

2. Adherence to Bayesian statistics. You believe that frequentist inference is invalid and that frequentist measures of an estimator's quality should not be used. Instead, you prefer to use precisely defined priors and likelihoods, derive their posteriors, and report a quantity based solely on that. Moreover, you would often espouse some form of Bayesian decision theory - i.e., you have a loss function in addition to your prior and likelihood, and report (or act on) the optimal decision according to your framework. All of this is usually done formally.

Your comments about Dirichlet don't make sense. Are you thinking about the Dirichlet distribution? If so, it is more widely used in Bayesian statistics than frequentist statistics, as it is the conjugate prior to the multinomial distribution. Regarding your comments about the SAS institute, I can say this: Most of the members of this forum are deeply interested in deep learning. Is deep learning Bayesian? No. Not even Bayesian deep learning is properly Bayesian. Does that matter to you, as a Bayesian epistemologist? No, as deep learning has little to nothing to do with epistemology. Does it matter to you, as a Bayesian statistician? No, as deep learning is not about inference or decision theory, which is what Bayesian statisticians care about (for the most part).

By the way, Bayes theorem isn't a "statistical technique", it's just a theorem. Used by all statisticians without a second thought. It's when you use it to do inference you become a Bayesian statistician.

I haven't observed any rationalists here using Dirichlet, and no, I wasn't talking about Bayesian vs. Frequentist; Bayesians are correct. Using Bayes Theorem when you didn't consider the probability of each possibly population producing your observed sample? That's definitely you doing it wrong. Instrumentation has variability; Dirichlet is how you include that, too.

Criticizing the use of Bayes Theorem because it's 260 years old is such a weird take.

The Pythagorean theorem is literally thousands of years old. But it's still useful, even though lots of progress has been made in trigonometry since then. Should we abandon , as a result?

This does seem like a laughable conclusion. Imagine the implications for the world if this line of reasoning became the accepted paradigm!

Though I'm hesitant to outright dismiss anyone willing to put in effort into writing a post, the author here really needs to rewrite their post to remove all the self-imposed absurdities.

If you have a real argument that the prior is reliably best obtained via a Dirichlet process and no other method of coming up with a prior is ever more useful, then make the argument.

I see:

• argument from authority/prestige
• argument from age (as if math changes over time)
• straw/weakmanning ("These Rationalists pick the Prior that they *prefer*."; "The Rationalists repeatedly rely upon sparse evidence, while claiming certainty")

Dirichlet is used by industry, NOT Bayes. What is your rebuttal to that, to show that Bayes is in fact superior to Dirichlet?

The wiki article on the Dirrchlet process includes:

In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

I.e. it isn't an alternative to Bayes, but rather a way of coming up with a prior.

And, I never claimed that priors are better obtained with Dirichlet than Bayes... I'm not sure what you were reading, could you quote the section where you thought I was making that claim?

why are you trying to attack instead of educate?

90% of your article is “rationalists do it wrong”.  Why?  Who cares?  Teach us how to do it better instead of focusing on how we’re doing it wrong.

I don't know if I'm missing something, but it sounds like you are discussing for a particular method of picking a prior within a Bayesian context, but you are not arguing against Bayes itself. If anything, it seems to me this is pro-Bayes, just using DIrilecht Processes as a prior.

Erm, is SAS using Bayes? That's the actual best in class.

Well I don't know SAS at all but a quick search of the SAS documentation for dirilecht calls it a "nonparametric Bayes approach"...

https://documentation.sas.com/doc/en/casactml/8.3/casactml_nonparametricbayes_details12.htm

SAS has developed their own trade-secret that outperforms all public methods; by definition, that MUST not be what YOU do when you apply Bayes to a few personal examples.

The downvotes are predictable - not only is it mis-stating a strawman of the group's position, it uses a lot of exclamation points to emphasize how stupid we all are.

However, it's also got some pretty good points, especially as some of the adjacent social groups are exploding, in part due to untenable extrapolation over unstable premeses.

[-]LVSN2-1

I can't tell if you're right, because no one has ever laid out Bayesianism as a set of definition and instruction steps, explained what Bayesianism uniquely relevantly achieves, and explored the relevant consequences of making various tempting-from-some-perspective mutations on the instruction set; those are the steps required to elevate a person's grasp of Bayesianism to true understanding.

You have also not followed those steps with Dirichlet and SAS, and compared it to Bayesianism.

Still I have an intuition that your complaint about using personal experience is not virtuous. Everything you learn has to pass through personal experience. If your other ways of becoming informed had not been conceived, learning from personal experience would still be possible. Information is information no matter how seriously you take it, and I think personal experience is worth taking seriously, as a person who concerns themself with misleadingness, in a world full of people who attend only to the truth of what they hear and not to the misleadingness.

You claim of Bayes and Dirichlet that "no one has ever laid (them) out", and to prove your claim, you link to another post that YOU wrote, where you claim it again? Check math textbooks; I don't have to teach you what's already available in the public sphere.

It was not to prove my claim; the post I wrote elaborates more fully on what I believe is the correct teaching process. If you read the post, it would become clear to you that my teaching standards have never been met in textbooks, and can hardly even in principle be met through textbooks. My teaching standards are not arbitrary; if these standards are not met then I will not truly understand the subject.

Your difficulty understanding it is NOT equivalent to "no one has ever laid them out". Those are two wildly different statements. A dyslexic person would have similar difficulty reading a novel, yet that is NOT equal to "no one ever wrote a book."

[-]LVSN-10

Feeling like you understand is not the same as actual understanding. People who read the existing explanations and feel like they understand, when the explanations did not follow the process I described, do not truly understand. My complaint is not that when I read the explanations I don't feel like I understand them; my complaint is that the extents to which Bayesianism have ever been laid out are insufficient for creating true understanding upon first reading.

Astounding! Then my argument that "NOT including Dirichlet is wrong" must have been wrong? Or else, why are you mentioning that no one taught you to your own satisfaction?

[-]LVSN1-2

Then my argument that "NOT including Dirichlet is wrong" must have been wrong?

It could be right, actually. The only objection I made was in response to your objection to using personal experience, and I only talked about my intuition rather than what must or must not be the case.

Or else, why are you mentioning that no one taught you to your own satisfaction?

You seem to want to proselytize better epistemic methods, and I am telling you what I need from you in order to adopt or reject your advised methods from an engineering angle (which I regard as superior); until then I can only follow clues of lesser quality (such as the correlation between caring about misleadingness and tendency to say things that impress me as insightful); the detective angle.

Screenshots are up! I'll be glad when more members of the public see the arguments you give for ignoring mine. :P cheers!

The single biggest question I have is "what is Dirichlet?"

[-][anonymous]11

As I understand it, in the event that you are correct and Dirichlet is better, rational Rationalists must switch to the better algorithm.  Because rationality is about systematized winning, and if you are correct, this is a measurably better algorithm to win.

Yes! And, even since Dirichlet was published in 1973, it has ONLY ever been run on super-computers, using statistically significant sample sizes! You CANNOT do Dirichlet in your head, unless you are a Savant, and no math class will ask you to Dirichlet on a quiz. I'm not sure how ANYONE can claim Bayes is reliable, when NO ONE in industry touches it... your community has an immense blind-spot to real-world methods, yet you claim certainty and confidence - that's the Dunning-Kreugers self-selecting into a pod that all agree they're right to use Bayes.

[-][anonymous]11

That's only one piece of rationality, and I think the general conclusion was "ask an artificial intelligence you can trust" would be the only scalable way for humans to be genuinely rational in their decision-making.  It does not matter what algorithm that machine uses internally, merely it is the best performing one from the class of "sufficiently trustworthy" choices.

Note this is a feasible thing to do, for example the activation function Swish was found this way.

A lot of the rest of it was dismissing obviously wrong individuals and institutions?  You saw how you dismissed the idea of "start with a prior from the median of mainstream knowledge" and "update with each anecdote"?

The thing is, that method is arguably better than many institutions and individuals are.  At least it uses information to make it's decision.

One of the tenants of "what does \$authority_figure claim to know and how does he know it" allows you to dismiss obviously wrong/misaligned authorities on subjects.

Such as the FDA or machine learning scientists setting 2060 as the date for AGI.  (the FDA is misaligned, it serves it's own interests not the interests of living Americans wanting to remain that way.  the ML scientists did not account for an increase in investment or recursive improvement)

There are a lot of other ideas and societal practices that are simply based on bullshit, no actual thought or process was even followed to generate them, they are usually just parroting some past flawed idea.  Like what you said regarding Bayes.

Then why does industry use Dirichlet, not Bayes? You keep pretending yours is better, when everyone who has to publish physics used additional methods, from this century. None of you explain why industry would use Dirichlet, if Bayes is superior. Further, why would Dirichlet even be PUBLISHED unless it's an improvement? You completely disregard these blinding facts. More has happened in the last 260 years than just Bayes' Theorem, and your suspicion of the FDA doesn't change that fact.