Is statistics beyond introductory statistics important for general reasoning?

Ideas such as regression to the mean, that correlation does not imply causation and base rate fallacy are very important for reasoning about the world in general. One gets these from a deep understanding of statistics 101, and the basics of the Bayesian statistical paradigm. Up until one year ago, I was under the impression that more advanced statistics is technical elaboration that doesn't offer major additional insights  into thinking about the world in general.

Nothing could be further from the truth: ideas from advanced statistics are essential for reasoning about the world, even on a day-to-day level. In hindsight my prior belief seems very naive – as far as I can tell, my only reason for holding it is that I hadn't heard anyone say otherwise. But I hadn't actually looked advanced statistics to see whether or not my impression was justified :D.

Since then, I've learned some advanced statistics and machine learning, and the ideas that I've learned have radically altered my worldview. The "official" prerequisites for this material are calculus, differential multivariable calculus, and linear algebra. But one doesn't actually need to have detailed knowledge of these to understand ideas from advanced statistics well enough to benefit from them. The problem is pedagogical: I need to figure out how how to communicate them in an accessible way.

Advanced statistics enables one to reach nonobvious conclusions

To give a bird's eye view of the perspective that I've arrived at, in practice, the ideas from "basic" statistics are generally useful primarily for disproving hypotheses. This pushes in the direction of a state of radical agnosticism: the idea that one can't really know anything for sure about lots of important questions. More advanced statistics enables one to become justifiably confident in nonobvious conclusions, often even in the absence of formal evidence coming from the standard scientific practice.

IQ research and PCA as a case study

In the early 20th century, the psychologist and statistician Charles Spearman discovered the the g-factor, which is what IQ tests are designed to measure. The g-factor is one of the most powerful constructs that's come out of psychology research. There are many factors that played a role in enabling Bill Gates ability to save perhaps millions of lives, but one of the most salient factors is his IQ being in the top ~1% of his class at Harvard. IQ research helped the Gates Foundation to recognize iodine supplementation as a nutritional intervention that would improve socioeconomic prospects for children in the developing world.

The work of Spearman and his successors on IQ constitute one of the pinnacles of achievement in the social sciences. But while Spearman's discovery of IQ was a great discovery, it wasn't his greatest discovery. His greatest discovery was a discovery about how to do social science research. He pioneered the use of factor analysis, a close relative of principal component analysis (PCA).

The philosophy of dimensionality reduction

PCA is a dimensionality reduction method. Real world data often has the surprising property of "dimensionality reduction":  a small number of latent variables explain a large fraction of the variance in data.

This is related to the effectiveness of Occam's razor: it turns out to be possible to describe a surprisingly large amount of what we see around us in terms of a small number of variables. Only, the variables that explain a lot usually aren't the variables that are immediately visibleinstead they're hidden from us, and in order to model reality, we need to discover them, which is the function that PCA serves. The small number of variables that drive a large fraction of variance in data can be thought of as a sort of "backbone" of the data. That enables one to understand the data at a "macro /  big picture / structural" level.

This is a very long story that will take a long time to flesh out, and doing so is one of my main goals. 

New Comment
132 comments, sorted by Click to highlight new comments since: Today at 9:37 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

"impression that more advanced statistics is technical elaboration that doesn't offer major additional insights"

Why did you have this impression?

Sorry for the off-topic, but I see this a lot in LessWrong (as a casual reader). People seem to focus on textual, deep-sounding, wow-inducing expositions, but often dislike the technicalities, getting hands dirty with actually understanding calculations, equations, formulas, details of algorithms etc (calculations that don't tickle those wow-receptors that we all have). As if these were merely some minor additions over the really important big picture view. As I see it this movement seems to try to build up a new backbone of knowledge from scratch. But doing this they repeat the mistakes of the past philosophers. For example going for the "deep", outlook-transforming texts that often give a delusional feeling of "oh now I understand the whole world". It's easy to have wow-moments without actually having understood something new.

So yes, PCA is useful and most statistics and maths and computer science is useful for understanding stuff. But then you swing to the other extreme and say "ideas from advanced sta... (read more)

Probably because of the human tendency to overestimate the importance of any knowledge one happens to have and underestimate the importance of any knowledge one doesn't. (Is there a name for this bias?)
Groupthink I guess: other people who I knew didn't think that it's so important (despite being people who are very well educated by conventional standards, top ~1% of elite colleges). Disclaimer: I know that I'm not giving enough evidence to convince you: I've thought about this for thousands of hours (including working through many quantitative examples) and it's taking me a long time to figure out how to organize what I've learned. I already have been using dimensionality reduction (qualitatively) in my day to day life, and I've found that it's greatly improved my interpersonal relationships because it's made it much easier to guess where people are coming from (before people's social behavior had seemed like a complicated blur because I saw so many variables without having started to correctly identify the latent ones). You seem to be making overly strong assumptions with insufficient evidence: how would you know whether this was the case, never having met me? ;-)
Qualitative day-to-day dimensionality reduction sounds like woo to me. Not a bit more convincing than quantum woo (Deepak Chopra et al.). Whatever you're doing, it's surely not like doing SVD on a data matrix or eigen-decomposition on the covariance matrix of your observations. Of course, you can often identify motivations behind people's actions. A lot of psychology is basically trying to uncover these motivations. Basically an intentional interpretation and a theory of mind are examples of dimensionality reduction in some sense. Instead of explaining behavior by reasoning about receptors and neurons, you imagine a conscious agent with beliefs, desires and intentions. You could also link it to data compression (dimensionality reduction is a sort of lossy data compression). But I wouldn't say I'm using advanced data compression algorithms when playing with my dog. It just sounds pretentious and shows a desperate need to signal smartness. So, what is the evidence that you are consciously doing something similar to PCA in social life? Do you write down variables and numbers, or how can I imagine qualitative dimensionality reduction. How is it different from somebody just getting an opinion intuitively and then justifying it with afterwards?
See Rationality is about pattern recognition, not reasoning. Your tone is condescending, far outside of politeness norms. In the past I would have uncharitably written this off to you being depraved, but I've realized that I should be making a stronger effort to understand other people's perspectives. So can you help me understand where you're coming from on an emotional level?

You asked about emotional stuff so here is my perspective. I have extremely weird feelings about this whole forum that may affect my writing style. My view is constantly popping back and forth between different views, like in the rabbit-duck gestalt image. On one hand I often see interesting and very good arguments, but on the other hand I see tons of red flags popping up. I feel that I need to maintain extreme mental efforts to stay "sane" here. Maybe I should refrain from commenting. It's a pity because I'm generally very interested in the topics discussed here, but the tone and the underlying ideology is pushing me away. On the other hand I feel an urge to check out the posts despite this effect. I'm not sure what aspect of certain forums have this psychological effect on my thinking, but I've felt it on various reddit communities as well.

Seconded, actually, and it's particular to LessWrong. I know I often joke that posting here gets treated as submitting academic material and skewered accordingly, but that is very much what it feels like from the inside. It feels like confronting a hostile crowd of, as Jonah put it, radical agnostics, every single time one posts, and they're waiting for you to say something so they can jump down your throat about it. Oh, and then you run into the issue of having radically different priors and beliefs, so that you find yourself on a "rationality" site where someone is suddenly using the term "global warming believer" as though the IPCC never issued multiple reports full of statistical evidence. I mean, sure, I can put some probability on, "It's all a conspiracy and the official scientists are lying", but for me that's in the "nonsense zone" -- I actually take offense to being asked to justify my belief in mainstream science. As much as "good Bayesians" are never supposed to agree to disagree, I would very much like if people would be up-front about their priors and beliefs, so that we can both decide whether it's worth the energy spent on long threads of trying to convince people of things.
Thanks so much for sharing. I'm astonished by how much more fruitful my relationships have became since I've started asking. I think that a lot of what you're seeing is a cultural clash: different communities have different blindspots and norms for communication, and a lot of times the combination of (i) blindspots of the communities that one is familiar with and (ii) respects in which a new community actually is unsound can give one the impression "these people are beyond the pale!" when the actual situation is that they're no less rational than members of one's own communities. I had a very similar experience to your own coming from academia, and wrote a post titled The Importance of Self-Doubt in which I raised the concern that Less Wrong was functioning as a cult. But since then I've realized that a lot of the apparently weird beliefs on LWers are in fact also believed by very credible people: for example, Bill Gates recently expressed serious concern about AI risk. If you're new to the community, you're probably unfamiliar with my own credentials which should reassure you somewhat: * I did a PhD in pure math under the direction of Nathan Dunfield, who coauthored papers with Bill Thurston, who formulated the geometrization conjecture which Perelman proved and in doing so won one of the Clay Millennium Problems. * I've been deeply involved with math education for highly gifted children for many years. I worked with the person who won the American Math Society prize for best undergraduate research when he was 12. * I worked at GiveWell, which partners with with Good Ventures, Dustin Moskovitz's foundation. * I've done fullstack web development, making an asynchronous clone of StackOverflow (link). * I've done machine learning, rediscovering logistic regression, collaborative filtering, hierarchical modeling, the use of principal component analysis to deal with multicollinearity, and cross validation. (I found the expositions so poor that it was faster
Of course, Christiano tends to issue disclaimers with his MIRI-branded AGI safety work, explicitly stating that he does not believe in alarmist UFAI scenarios. Which is fine, in itself, but it does show how people expect someone associated with these communities to sound. And Jacob Steinhardt hasn't exactly endorsed any "Twilight Zone" community norms or propaganda views. Errr, is there a term for "things everyone in a group thinks everyone else believes, whether or not they actually do"?
I'm not claiming otherwise: I'm merely saying that Paul and Jacob don't dismiss LWers out of hand as obviously crazy, and have in fact found the community to be worthwhile enough to have participated substantially.
I think in this case we have to taboo the term "LWers" ;-). This community has many pieces in it, and two large parts of the original core are "techno-libertarian Overcoming Bias readers with many very non-mainstream beliefs that they claim are much more rational than anyone else's beliefs" and "the SL4 mailing list wearing suits and trying to act professional enough that they might actually accomplish their Shock Level Four dreams." On the other hand, in the process of the site's growth, it has eventually come to encompass those two demographics plus, to some limited extent, almost everyone who's willing to assent that science, statistical reasoning, and the neuro/cognitive sciences actually really work and should be taken seriously. With special emphasis on statistical reasoning and cognitive sciences. So the core demographic consists of Very Unusual People, but the periphery demographics, who now make up most of the community, consist of only Mildly Unusual People.
Yes, this seems like a fair assessment o the situation. Thanks for disentangling the issues. I'll be more precise in the future.
Those are indeed impressive things you did. I agree very much with your post from 2010. But the fact that many people have this initial impression shows that something is wrong. What makes it look like a "twilight zone"? Why don't I feel the same symptoms for example on Scott Alexander's Slate Star Codex blog? Another thing I could pinpoint is that I don't want to identify as a "rationalist", I don't want to be any -ist. It seems like a tactic to make people identify with a group and swallow "the whole package". (I also don't think people should identify as atheist either.)
Nobody forces you to do so. Plenty of people in this community don't self identify that way.
I'm sympathetic to everything you say. In my experience there's an issue of Less Wrongers being unusually emotionally damaged (e.g. relative to academics) and this gives rise to a lot of problems in the community. But I don't think that the emotional damage primarily comes from the weird stuff that you see on Less Wrong. What one sees is them having born the brunt of the phenomenon that I described here disproportionately relative to other smart people, often because they're unusually creative and have been marginalized by conformist norms Quite frankly, I find the norms in academia very creepy: I've seen a lot of people develop serious mental health problems in connection with their experiences in academia. It's hard to see it from the inside: I was disturbed by what I saw, but I didn't realize that math academia is actually functioning as a cult, based on retrospective impressions, and in fact by implicit consensus of the best mathematicians of the world (I can give references if you'd like) .
I'm sure you're aware that the word "cult" is a strong claim that requires a lot of evidence, but I'd also issue a friendly warning that to me at least it immediately set off my "crank" alarm bells. I've seen too many Usenet posters who are sure they have a P=/!=NP proof, or a proof that set theory is false, or etc. who ultimately claim that because "the mathematical elite" are a cult that no one will listen to them. A cult generally engages in active suppression, often defamation, and not simply exclusion. Do you have evidence of legitimate mathematical results or research being hidden/withdrawn from journals or publicly derided, or is it more of an old boy's club that's hard for outsiders to participate in and that plays petty politics to the damage of the science? Grothendieck's problems look to be political and interpersonal. Perelman's also. I think it's one thing to claim that mathematical institutions are no more rational than any other politicized body, and quite another to claim that it's a cult. Or maybe most social behavior is too cult-like. If so; perhaps don't single out mathematics. I question the direction of causation. Historically many great mathematicians have been mentally and socially atypical and ended up not making much sense with their later writings. Either mathematics has always had an institutional problem or mathematicians have always had an incidence of mental difficulties (or a combination of both; but I would expect one to dominate). Especially in Thurston's On Proof and Progress in Mathematics I can appreciate the problem of trying to grok specialized areas of mathematics. The terminology and symbology is opaque to the uninitiated. It reminds me of section 1 of the Metamath Book which expresses similar unhappiness with the state of knowledge between specialist fields of mathematics and the general difficulty of learning mathematics. I had hoped that Metamath would become more popular and tie various subfields together through unifyi
Thanks, yeah, people have been telling me that I need to be more careful in how I frame things. :-) The latter, but note that that's not necessarily less damaging than active suppression would be. Yes, this is what I believe. The math community is just unusually salient to me, but I should phrase things more carefully. Most of the people who I have in mind did have preexisting difficulties. I meant something like "relative to a counterfactual where academia was serving its intended function." People of very high intellectual curiosity sometimes approach academia believing that it will be an oasis and find this not to be at all the case, and that the structures in place are in fact hostile to them. This is not what the government should be supporting with taxpayer dollars. What are your own interests?
I suppose there's one scant anecdote for estimating this; cryptography research seemed to lag a decade or two behind actively suppressed/hidden government research. Granted, there was also less public interest in cryptography until the 80s or 90s, but it seems that suppression can only delay publication, not prevent it. The real risk of suppression and exclusion both seem to be in permanently discouraging mathematicians who would otherwise make great breakthroughs, since affecting the timing of publication/discovery doesn't seem as damaging. I think I would be surprised if Basic Income was a less effective strategy than targeted government research funding. Everything from logic and axiomatic foundations of mathematics to practical use of advanced theorems for computer science. What attracted me to Metamath was the idea that if I encountered a paper that was totally unintelligible to me (say Perelman's proof of Poincaire's conjecture or Wiles' proof of Fermat's Last Theorem) I could backtrack through sound definitions to concepts I already knew, and then build my understanding up from those definitions. Alas, just having a cross-reference of related definitions between various fields would be helpful. I take it that model theory is the place to look for such a cross-reference, and so that is probably the next thing I plan to study. Practically, I realize that I don't have enough time or patience or mental ability to slog through formal definitions all day, and so it would be nice to have something even better. A universal mathematical educator, so to speak. Although I worry that without a strong formal understanding I will miss important results/insights. So my other interest is building the kind of agent that can identify which formal insights are useful or important, which sort of naturally leads to an interest in AI and decision theory.
I would like to see some of those references (simply because I have no relation to Academia, and don't like things I read somewhere to gestate into unfounded intuitions about a subject).
I've only been in CS academia, and wouldn't call that a cult. I would call it, like most of the rest of academia, a deeply dysfunctional industry in which to work, but that's the fault of the academic career and funding structure. CS is even relatively healthy by comparison to much of the rest. How much of our impression of mathematics as a creepy, mental-health-harming cult comes from pure stereotyping?
Jonah happens to be a math phd. How can you engage in pure stereotyping of mathematicians while you get your PHD?
I was more positing that it's a self-reinforcing, self-creating effect: people treat Mathematics in a cultish way because they think they're supposed to.
I don't believe there's any such thing, on the general grounds of "no fake without a reality to be a fake of."
Who do you mean when you say "people"?
For what its worth, I have observed a certain reverence in the way great mathematicians are treated by their lesser-accomplished colleagues that can often border on the creepy. This is something specific to math, in that it seems to exist in other disciplines with lesser intensity. But I agree, "dysfunctional" seems to be a more apt label than "cult." May I also add "fashion-prone?"
Er, what? Who do you mean by "we"? The link says of Turing: This is a staggeringly wrong account of how he died.
Hence my calling it "pure stereotyping"!
I don't have direct exposure to CS academia, which, as you comment, is known to be healthier :-). I was speaking in broad brushstrokes , I'll qualify my claims and impressions more carefully later.
I don't really understand what you mean about math academia. Those references would be appreciated.

The top 3 answers to the MathOverflow question Which mathematicians have influenced you the most? are Alexander Grothendieck, Mikhail Gromov, and Bill Thurston. Each of these have expressed serious concerns about the community.

  • Grothendieck was actually effectively excommunicated by the mathematical community and then was pathologized as having gone crazy. See pages 37-40 of David Ruelle's book A Mathematician's Brain.

  • Gromov expresses strong sympathy for Grigory Perelman having left the mathematical community starting on page 110 of Perfect Rigor. (You can search for "Gromov" in the pdf to see all of his remarks on the subject.)

  • Thurston made very apt criticisms of the mathematical community in his essay On Proof and Progress In Mathematics. See especially the beginning of Section 3: "How is mathematical understanding communicated?" Terry Tao endorses Thurston's essay in his obituary of Thurston. But the community has essentially ignored Thurston's remarks: one almost never hears people talk about the points that Thurston raises.

I don't know about Grothendieck, but the two other sources appear to have softer criticism of the mathematical community than "actually functioning as a cult".
The links you give are extremely interesting, but, unless I am missing something, it seems that they fall short of justifying your earlier statement that math academia functions as a cult. I wonder if you would be willing to elaborate further on that?
I'll be writing more about this later. The most scary thing to me is that the most mathematically talented students are often turned off by what they see in math classes, even at the undergraduate and graduate levels. Math serves as a backbone for the sciences, so this may badly undercutting scientific innovation at a societal level. I honestly think that it would be an improvement on the status quo to stop teaching math classes entirely. Thurston characterized his early math education as follows: I hated much of what was taught as mathematics in my early schooling, and I often received poor grades. I now view many of these early lessons as anti-math: they actively tried to discourage independent thought. One was supposed to follow an established pattern with mechanical precision, put answers inside boxes, and "show your work," that is, reject mental insights and alternative approaches. I think that this characterizes math classes even at the graduate level, only at a higher level of abstraction. The classes essentially never offer students exposure to free-form mathematical exploration, which is what it takes to make major scientific discoveries with significant quantitative components.
I distinctly remember having points taken off of a physics midterm because I didn't show my work. I think I dropped the exam in the waste basket on the way out of the auditorium. I've always assumed that the problem is three-fold; generating a formal proof is NP-hard, getting the right answer via shortcuts can include cheating, and the faculty's time is limited. Professors/graders do not have the capacity to rigorously demonstrate to themselves that the steps a student has written down actually pinpoint the unique answer. Without access to the student's mind graders are unable to determine if students cheat or not; being able to memorize and/or reproduce the exact steps of a calculation significantly decrease the likelihood of cheating. Even if graders could do one or both of the previous for a single student, they are not 30x or 100x as smart as their students, making it impractical to repeat the process for every student. That said, I had some very good mathematics teachers in higher level courses who could force students to think, and one in particular who could encourage/demand novelty from students simply by asking them to solve problems that they hadn't yet learned to solve. I didn't realize the power of the latter approach until later (and at the time everyone complained about exams with a median score well under 50%), but his classes were always my favorite.
Thank you for all these interesting references. I enjoyed reading all of them, and rereading in Thurston's case. Do people pathologize Grothendieck as having gone crazy? I mostly think people think of him as being a little bit strange. The story I heard was that because of philosophical disagreements with military funding and personal conflicts with other mathematicians he left the community and was more or less refusing to speak to anyone about mathematics, and people were sad about this and wished he would come back.
His contribution of math is too great for people to have explicitly adopted a stance that was too unfavorable to him, and many mathematicians did in fact miss him a lot. But as Perelman said: Of course, there are many mathematicians who are more or less honest. But almost all of them are conformists. They are more or less honest, but they tolerate those who are not honest." He has also said that "It is not people who break ethical standards who are regarded as aliens. It is people like me who are isolated. If pressed, many mathematicians downplay the role of those who behaved unethically toward him and the failure of the community to give him a job in favor of a narrative "poor guy, it's so sad that he developed mental health problems."
What failure? He stepped down from the Steklov Institute and has refused every job offer and prize given to him.
From the details I'm aware of "gone crazy" is not a bad description of what happened.

I would probably use different words, but I believe I fit Jonah's description. Before finding LW, I felt strongly isolated. Like, surrounded by human bodies, but intellectually alone. Thinking about topics that people around me considered "weird", so I had no one to debate them with. Having a large range of interests, and while I could find people to debate individual interests with, I had no one to talk with about the interesting combinations I saw there.

I felt "weird", and from people around me I usually got two kinds of feedback. When I didn't try to pretend anything, they more or less confirmed that I am weird (of course, many were gentle, trying not to hurt me). When I tried to play a role of someone "less weird" (that is, I ignored most of the things I considered interesting, and just tried to fit)... well, it took a lot of time and practice to do this correctly, but then people accepted me. So, for a long time it felt like the only way to be accepted would be to supress a large part of what I consider to be "myself"; and I suspect that it would never work perfectly, that there would still be some kind of intellectual hunger.

Then I fou... (read more)

I am not giving up, and I hope I will still achieve some big success. In the shortest term... I have a baby now, which turned my life upside down a bit, so I need to solve some logistic problems first (e.g. to buy a new flat) and get used to the new situation. It might take a year. -- Not complaining here; I always wanted to have children, but it's taking time and energy and money, so my options are now more limited than usual. I believe it will be okay in a few months, but today, I am rather busy and tired. Also, having a family limits my options; for example if I would decide that moving to another city would make my life better, it is no longer only my own decision. My hands are a bit more tied than they would be if I were 25 again. I still didn't give up completely on starting a rationalist community in my own city, and I have two specific plans. (1) These days I am finishing the translation of the LW Sequences book; when it is ready, I will distribute it freely and try to make it popular, and hope that people who enjoy it will contact me. (2) In September, I plan to do some rationality "lectures" (advertising for LW and for the translated book) on at least one high school, and one university. I will probably not do anything scientific, ever; that train has already gone. Cannot compete with 20-years olds with fresh brains and fresh memories of their university lectures, who don't have a family to feed. It would be wiser to focus fully on my personal life and making money, because that's what I have to do anyway. -- The current plan is writing computer games, because the entry costs are almost zero, and I can do it at home in the evenings when the baby sleeps. (I have to keep the day job to pay bills.) Later, when the baby grows up and starts attenting school, I may try something more ambitious. But still, even if my plans succeed and I live till 80, I will not be able to do as much as in the hypothetical parallel universe where I would find a LW community as
It is so painful to have an easily available possible world in which you find LessWrong earlier than in the real world. I ran into LW/OB five times since I was 16 and didn't stick around until I was 21. I can't imagine what I would be like with five years of exposure to the important things that I've been exposed to in the past six months, as well as having grown alongside the community, seeing as how I came around near the time that LW began.
I also didn't stick with LW at the first time. I found an article linked from somewhere, I believe it was "Well-Kept Gardens Die By Pacifism", I was impressed, but then I left. A year or two later, I again randomly found an article, then I saw it was the same website as the previous one, so I was like "Oh, this website contains multiple interesting articles" and started clicking on random links in text. Then I cautiously posted a few comments in the Open Thread -- some got downvotes, some got upvotes -- and kept reading... So, somewhere in the parallel Everett branch there is a version of me that didn't return to LW anymore, or just returned, read one article, and left again. Poor guy; he probably spends a lot of time having stupid debates on other websites. What do you believe you would have done differently, if you would stick around here at 16?
I'm speaking based on many interactions with many members of the community. I don't think this is true of everybody, but I have seen a difference at the group level.
This doesn't address the issue of the claimed difference in Jonah's perception of LWers from his perception of other groups.
I've always thought that calling yourself a "rationalist" or "aspiring rationalist" is rather useless. You're either winning or not winning. Calling yourself by some funny term can give you the nice feeling of belonging to a community, but it doesn't actually make you win more, in itself.
That sounds like you engage in binary thinking and don't value shades of grey of uncertainty enough. You feel to need to judge arguments for whether they are true or aren't and don't have mental categories for "might be true, or might not be true". Jonah makes strong claims for which he doesn't provide evidence. He's clear about the fact that he hasn't provided the necessary evidence. Given that you pattern match to "crackpot" instead of putting Jonah in the mental category where you don't know whether what Jonah says is right or wrong. If you start to put a lot of claims into the "I don't know"-pile you don't constantly pop between belief and non-belief. Popping back and forth means that the size of your updates when presented new evidence are too large. Being able to say "I don't know" is part of genuine skepticism.
I'm not talking about back and forth between true and false, but between two explanations. You can have a multimodal probability distribution and two distant modes are about equally probable, and when you update, sometimes one is larger and sometimes the other. Of course one doesn't need to choose a point estimate (maximum a posteriori), the distribution itself should ideally be believed in its entirety. But just as you can't see the rabbit-duck as simultaneously 50% rabbit and 50% duck, one sometimes switches between different explanations, similarly to an MCMC sampling procedure. I don't want to argue this too much because it's largely a preference of style and culture. I think the discussions are very repetitive and it's an illusion that there is much to be learned by spending so much time thinking meta. Anyway, I evaporate from the site for now.
I would be very interested in hearing elaboration on this topic, either publicly or privately.

I prefer public discussions. First, I'm a computer science student who took courses in machine learning, AI, wrote theses in these areas (nothing exceptional), I enjoy books like Thinking Fast and Slow, Black Swan, Pinker, Dawkins, Dennett, Ramachandran etc. So the topics discussed here are also interesting to me. But the atmosphere seems quite closed and turning inwards.

I feel similarities to reddit's Red Pill community. Previously "ignorant" people feel the community has opened a new world to them, they lived in darkness before, but now they found the "Way" ("Bayescraft") and all this stuff is becoming an identity for them.

Sorry if it's offensive, but I feel as if many people had no success in the "real world" matters and invented a fiction where they are the heroes by having joined some great organization much higher above the general public, who are just irrational automata still living in the dark.

I dislike the heavy use of insider terminology that make communication with "outsiders" about these ideas quite hard because you get used to referring to these things by the in-group terms, so you get kind of isolated from your real-l... (read more)

Thanks for the detailed response! I'll respond to a handful of points:

Previously "ignorant" people feel the community has opened a new world to them, they lived in darkness before, but now they found the "Way" ("Bayescraft") and all this stuff is becoming an identity for them.

I certainly agree that there are people here who match that description, but it's also worth pointing out that there are actual experts too.

the general public, who are just irrational automata still living in the dark.

One of the things I find most charming about LW, compared to places like RationalWiki, is how much emphasis there is on self-improvement and your mistakes, not mistakes made by other people because they're dumb.

It seems that people try to prove they know some concept by using the jargon and including links to them. Instead, I'd prefer authors who actively try to minimize the need for links and jargon.

I'm not sure this is avoidable, and in full irony I'll link to the wiki page that explains why.

In general, there are lots of concepts that seem useful, but the only way we have to refer to concepts is either to refer to a label or to explain the concept. A nu... (read more)

I agree that LW is much better than RationalWiki, but I still think that the norms for discussion are much too far in the direction of focus on how other commenters are wrong as opposed to how one might oneself be wrong. I know that there's a selection effect (with respect to the more frustrating interactions standing out). But people not infrequently mistakenly believe that I'm wrong about things that I know much more about than they do, with very high confidence, and in such instances I find the connotations that I'm unsound to be exasperating. I don't think that this is just a problem for me rather than a problem for the community in general: I know a number of very high quality thinkers in real life who are uninterested in participating on LW explicitly because they don't want to engage with commenters who are highly confident that their own positions are incorrect. There's another selection effect here: such people aren't salient because they're invisible to the online community.
I agree that those frustrating interactions both happen and are frustrating, and that it leads to a general acidification of the discussion as people who don't want to deal with it leave. Reversing that process in a sustainable way is probably the most valuable way to improve LW in the medium term.
There's also the whole Lesswrong-is-dying thing that might be contribute to the vibe you're getting. I've been reading the forum for years and it hasn't felt very healthy for a while now. A lot of the impressive people from earlier have moved on, we don't seem to be getting that many new impressive people coming in and hanging out a lot on the forum turns out not to make you that much more impressive. What's left is turning increasingly into a weird sort of cargo cult of a forum for impressive people.
Actually, I think that LessWrong used to be worse when the "impressive people" were posting about cryonics, FAI, many-world interpretation of quantum mechanics, and so on.
It has seemed to me that a lot of the commenters who come with their own solid competency are also less likely to get unquestioningly swept away following EY's particular hobbyhorses.
The applicable word is metaphysics. Acausal trade is dabbling in metaphysics to "solve" a question in decision theory, which is itself mere philosophizing, and thus one has to wonder: what does Nature care for philosophies? By the way, for the rest of your post I was going, "OH MY GOD I KNOW YOUR FEELS, MAN!" So it's not as though nobody ever thinks these things. Those of us who do just tend to, in perfect evaporative cooling fashion, go get on with our lives outside this website, being relatively ordinary science nerds.
Sorry avoiding metaphysics doesn't work. You just end up either reinventing them (badly) or using a bad 5th hand version of some old philospher's metaphysics. Incidentally, Eliezer also tried avoiding metaphysics and wound up doing the former.
I don't like Eliezer's apparent mathematical/computational Platonism myself, but most working scientists manage to avoid metaphysical buggery by simply dealing with only those things with which what they can actually causally interact. I recall an Eliezer post on "Explain/Worship/Ignore", and would add myself that while "Explain" eventually bottoms out in the limits of our current knowledge, the correct response is to hit "Ignore" at that stage, not to drop to one's knees in Worship of a Sacred Mystery that is in fact just a limit to current evidence. EDIT: This is also one of the reasons I enjoy being in this community: even when I disagree with someone's view (eg: Eliezer's), people here (including him) are often more productive and fun to talk to than someone who hits the limits of their scientific knowledge and just throws their hands up to the tune of "METAPHYSICS, SON!", and then joins the bloody Catholic Church, as if that solved anything.
That works up until the point where you actually have to think about what it means to "causally interact" with something. Also questions like "does something that falls into a black hole cease to exist since it's no longer possible to interact with it"?
But there are trivially easy answers to questions like that. Basically you have to ask "Cease to exist for whom?" i.e. it obviously ceases to exist for you. You just have to taboo words like "really" here such "does it really cease to exist" as they are meaningless, they don't lead to predictions. What often people consider "really" reality is the perception of a perfect god-like omniscient observer but there is no such thing. Essentially there are just two extremes to avoid, the po-mo "nothing is real, everything is mere perception" and the traditional, classical "but how things really really REALLY are?" and the middle way here is "reality is the sum of what could be perceived in principle". A perception is right or wrong based on how much it meshes with all the other things that can in principle be perceived. Everything that cannot even be perceived in theory is not part of reality. There is no how things "really" are, the closest we have to that what is the sum of all potential, possible perceivables about a thing. I picked up this approach from Eric S. Raymond, I think he worked it out decades before Eliezer did, possibly both working from Peirce. This is basically anti-metaphysics.
Does this imply that only things that exist in my past light cone are real for me at any given moment?
I don't know what real-for-me means here. Everything that in principle, in theory, could be observed, is real. Most of those you didn't. This does not make them any less real. I meant the "for whom?" not in the sense of me, you, or the barkeeper down the street. I meant it in the sense of normal beings who know only things that are in principle knowable, vs. some godlike being who can know how things really "are" regardless of whether they are knowable or not.
Well, that's where it starts to break down; because what you can, in theory, observe is different from what I can, in theory, observe. This is because, as far as anyone can tell, observations are limited by the speed of light. I cannot, even in principle, observe the 2015 Alpha Centauri until at least 2019 (if I observe it now, I am seeing light that left it around 2011). If Alpha Centauri had suddenly exploded in 2013, I have no way of observing that until at least 2018 - even in principle. So if the barkeeper, instead of being down the street, is rather living on a planet orbiting Alpha Centauri, then the set of what he can observe in principle is not the same as the set of what I can observe in principle.
I'd like to congratulate you on developing your own "makes you sound insane to the man in the street" theory of metaphysics.
Man on the street needs to learn what counterfactual definiteness is.
Ilya, can you give me a definition of "counterfactual definiteness" please?
Physicists are not very precise about it, may I suggest looking into "potential outcomes" (the language some statisticians use to talk about counterfactuals): Potential outcomes let you think about a model that contains a random variable for what happens to Fred if we give Fred aspirin, and a random variable for what happens to Fred if we give Fred placebo. Even though in reality we only gave Fred aspirin. This is "counterfactual definiteness" in statistics. This paper uses potential outcomes to talk about outcomes of physics experiments (so there is an exact isomorphism between counterfactuals in physics and potential outcomes):
Sounds like this is perhaps related to the counterfactual-consistency statement? In its simple form, that the counterfactual or potential outcome under policy "a" equals the factual observed outcome when you in fact undertake policy "a", or formally, Y^a = Y when A = a. Pearl has a nice (easy) discussion in the journal Epidemiology ( Is this what you are getting at, or am I missing the point?
No, not quite. Counterfactual consistency is what allows you to link observed and hypothetical data (so it is also extremely important). Counterfactual definiteness is even more basic than that. It basically sets the size of your ontology by allowing you to talk about Y(a) and Y(a') together, even if we only observe Y under one value of A. ---------------------------------------- edit: Stephen, I think I realized who you are, please accept my apologies if I seemed to be talking down to you, re: potential outcomes, that was not my intention. My prior is people do not know what potential outcomes are. ---------------------------------------- edit 2: Good talks by Richard Gill and Jamie Robins at JSM on this:
No offense taken. I am sorry I did not get to see Gill & Robins at JSM. Jamie also talks about some of these issues online back in 2013 at
Well, this whole thread started because minusdash and eli_sennesh objected to the concept of accusal trade for being too metaphysical.
I just need to translate that for him to street lingo. "There is shit we know, shit we could know, and shit could not know no matter how good tech we had, we could not even know the effects it has on other stuff. So why should we say this later stuff exists? Or why should we say this does not exist? We cannot prove either."
My serious point is that one cannot avoid metaphysics, and that way too many people start out from "all this metaphysics stuff is BS, I'll just use common sense" and end up with there own (bad) counter-intuitive metaphysical theory that they insist is "not metaphysics".
You could charitably understand everything that such people (who assert that metaphysics is BS) say with a silent "up to empirical equivalence". Doesn't the problem disappear then?
No because you need a theory of metaphysics to explain what "empirical equivalence" means.
To be honest, I don't see that at all.
So how would you define "empirical equivalence"?
Its insufficiently appreciated that physicalism is metaphysics too.
How about you just jump right to the details of your method, and then backtrack to help other people understand the necessary context to appreciate the method? Otherwise, you will lose your audience.
See my edit. Part of where I'm coming from is realizing how socially undeveloped people's in our reference class are tend to be, such that apparent malice often comes from misunderstandings.
Interesting - what are some examples of the latent ones?
I think having the concept of PCAs prevents some mistakes in reasoning on an intuitive day to day level of reasoning. It nudges me towards fox thinking instead of hedgehog thinking. Normal folk intuition grasps at the most cognitively available and obvious variable to explain causes, and then our System 1 acts as if that variable explains most if not all the variance. Looking at PCAs many times (and being surprised by them) makes me less likely to jump to conclusions about the causal structure of clusters of related events. So maybe I could characterize it as giving a System 1 intuition for not making the post hoc ergo propter hoc fallacy. Maybe part of the problem Jonah is running in to explaining it is that having done many many example problems with System 2 loaded it into his System 1, and the System 1 knowledge is what he really wants to communicate?
What do you mean by getting surprised by PCAs? Say you have some data, you compute the principal components (eigenvectors of the covariance matrix) and the corresponding eigenvalues. Were you surprised that a few principal components were enough to explain a large percentage of the variance of the data? Or were you surprised about what those vectors were? I think this is not really PCA or even dimensionality reduction specific. It's simply the idea of latent variables. You could gain the same intuition from studying probabilistic graphical models, for example generative models.
Surprised by either. Just finding a structure of causality that was very unexpected. I agree the intuition could be built from other sources.
PCA doesn't tell much about causality though. It just gives you a "natural" coordinate system where the variables are not linearly correlated.
Right, one needs to use additional information to determine causality.
Yes, you seem to have a very clear understanding of where I'm coming from. Thanks.
Don't say the p-word, please ;-). I do agree that more real-life understanding is gained from just obtaining a broad scientific education than from going wow-hunting. But of course, I would say that, since I'm a fanatical textbook purchaser.

I don't believe you can obtain an understanding of the idea that "correlation does not imply causation" from even a very deep appreciation of the material in Statistics 101. These courses usually make no attempt to define confounding, comparability etc. If they try to define confounding, they tend to use incoherent criteria based on changes in the estimate. Any understanding is almost certainly going to have to originate from outside of Statistics 101; unless you take a course on causal inference based on directed acyclic graphs it will be very challenging to get beyond memorizing the teacher's password

Agree completely, and I'll also point out that at least for me, a very shallow understanding of the ideas in Causality did much more to help me understand correlation vs. causation, confounding etc. than any amount of work with Statistics 101. And this was enormously practical–I was able to make significantly better financial decisions at Fundation due to understanding concepts like Simpson's Paradox on a system 1 level.

To chime in as well: my own understanding of 'correlation does not imply causation' does not come from the basic statistics courses and articles and tutorials I read. While I knew the saying and the concepts and a little bit about causal graphs, it took years of failed self-experiments and the intensely frustrating experience of seeing correlate after correlate fail randomized experiments before I truly accepted it.

I don't know how helpful, exactly, this has been on a practical level, but at least it's good for me on an epistemic level in that I have since accepted many fewer new beliefs than I would otherwise have.

Me four. ---------------------------------------- Although you know, there is no reason in principle you couldn't get all that stuff Anders_H is talking about from intro stats, it's just that stats isn't taught as well as it can be.

PCA and other dimensionality reduction techniques are great, but there's another very useful technique that most people (even statisticians) are unaware of: dimensional analysis, and in particular, the Buckingham pi theorem. For some reason, this technique is used primarily by engineers in fluid dynamics and heat transfer despite its broad applicability. This is the technique that allows scale models like wind tunnels to work, but it's more useful than just allowing for scaling. I find it very useful to reduce the number of variables when developing models and conducting experiments.

Dimensional analysis recognizes a few basic axioms about models with dimensions and sees what they imply. You can use these to construct new variables from the old variables. The model is usually complete in a smaller number of these new variables. The technique does not tell you which variables are "correct", just how many independent ones are needed. Identifying "correct" variables requires data, domain knowledge, or both. (And sometimes, there's no clear "best" variable; multiple work equivalently well.)

Dimensional analysis does not help with categorical variables, or nu... (read more)

In general, if your problem displays any kind of symmetry* you can exploit that to simplify things. I think most people are capable of doing this intuitively when the symmetry is obvious. The Buckingham pi theorem is a great example of a systematic way to find and exploit a symmetry that isn't so obvious. * By "symmetry" I really mean "invariance under a group of transformations".
This is a great point. Other than fairly easy geometric and time symmetries, do you have any advice or know of any resources which might be helpful towards finding these symmetries? Here's what I do know: Sometimes you can recognize these symmetries by analyzing a model differential equation. Here's a book on the subject that I haven't read, but might read in the future. My PhD advisor tells me I already know one reliable way to find these symmetries (e.g., like how to find the change of variables used here), so reading this would be a poor use of time in his view. This approach also requires knowing a fair bit more about a phenomena than just which variables it depends on.
The book you linked is the sort of thing I had in mind. The historical motivation for Lie groups was to develop a systematic way to use symmetry to attack differential equations.
Are you familiar with Noether's Theorem? It comes up in some explanations of Buckingham pi, but the point is mostly "if you already know that something is symmetric, then something is conserved." The most similar thing I can think of, in terms of "resources for finding symmetries," might be related to finding Lyapunov stability functions. It seems there's not too much in the way of automated function-finding for arbitrary systems; I've seen at least one automated approach for systems with polynomial dynamics, though.
Not familiar with Noether's theorem. Seems useful for constructing models, and perhaps determining if something else beyond mass, momentum, and energy is conserved. Is the converse true as well, i.e., does conservation imply that symmetries exist? I'm also afraid I know nearly nothing about non-linear stability, so I'm not sure what you're referring to, but it sounds interesting. I'll have to read the Wikipedia page. I'd be interested if you know any other good resources for learning this.
I think this is what Lie groups are all about, but that's a bit deeper in group theory than I'm comfortable speaking on. I learned it the long way by taking classes, and don't recall being particularly impressed by any textbooks. (I can lend you the ones I used.) I remember thinking that reading through Akella's lecture notes was about as good as taking the course, and so if you have the time to devote to it you might be able to get those from him by asking nicely.
Conservation gives a local symmetry but there may not be a global symmetry. For instance, you can imagine a physical system with no forces at all, so everything is conserved. But there are still some parameters that define the location of the particles. Then the physical system is locally very symmetric, but it may still have some symmetric global structure where the particles are constrained to lie on a surface of nontrivial topology.
Noether's theorem has nothing to do with Buckingham's theorem. Buckingham's theorem is quite general (and vacuous), while Noether's theorem is only about hamiltonian/lagrangian mechanics. Added: Actually, Buckingham and Noether do have something in common: they both taught at Bryn Mawr.
Both of them are relevant to the project of exploiting symmetry, and deal with solidifying a mostly understood situation. (You can't apply Buckingham's theorem unless you know all the relevant pieces.) The more practical piece that I had in mind is that someone eager to apply Noether's theorem will need to look for symmetries; they may have found techniques for hunting for symmetries that will be useful in general. It might be worth looking into material that teaches it, not because it itself is directly useful, but because the community that knows it may know other useful things.
It's a quite bit more general than Lagrangian mechanics. You can extend it to any functional that takes functions between two manifolds to complex numbers.
In what sense do you mean Buckingham's theorem is vacuous?
I've always been amazed at the power of dimensional analysis. To me the best example is the problem of calculating the period of an oscillating mass on a spring. The relevant values are the spring constant K (kg/s^2) and the mass M (kg), and the period T is in (s). The only way to combine K and M to obtain a value with dimensions of (s) is sqrt(M/K), and that's the correct form of the actual answer - no calculus required!
Actually, there's another parameter, the displacement. It turns out that the spring period does not depend on the displacement, but that's a miracle that is special to springs. Instead, look at the pendulum. The same dimensional analysis gives the square root of the length divided by gravitational acceleration. That's off by a dimensionless constant, 2π. Moreover, even that is only approximately correct. The real answer depends on the displacement in a complicated way.
This is a good point. At best you can figure out that period is proportional to (not equal to) sqrt(M/K) multiplied by some function of other parameters, say, one involving displacement and another characterizing the non-linearity (if K is just the initial slope, as I've seen done before). It's a fortunate coincidence if the other parameters are unimportant. You can not determine based solely on dimensional analysis whether certain parameters are unimportant.
That's because outside of physics (and possibly chemistry) there are enough constants running around that all quantities are effectively dimensionless. I'm having a hard time seeing a situation in say biology where I could propose dimensional analysis with a straight face, to say nothing of softer sciences.
As I said, dimensional analysis does not help with categorical variables. And when the number of dimensions is low and/or the number of variables is large, dimensional analysis can be useless. I think it's a necessary component of any model builder's toolbox, but not a tool you will use for every problem. Still, I would argue that it's underutilized. When dimensional analysis is useful, it definitely should be used. (For example, despite its obvious applications in physics, I don't think most physics undergrads learn the Buckingham pi theorem. It's usually only taught to engineers learning fluid dynamics and heat transfer.) Two very common dimensionless parameters are the ratio and fraction. Both certainly appear in biology. Also, the subject of allometry in biology is basically simple dimensional analysis. I've seen dimensional analysis applied in other soft sciences as well, e.g., political science, psychology, and sociology are a few examples I am aware of. I can't comment much on the utility of its application in these cases, but it's such a simple technique that I think it's worth trying whenever you have data with units. Speaking more generally, the idea of simplification coming from applying transformations to data has broad applicability. Dimensional analysis is just one example of this.
One thing that most scientists in these soft scientists already have a good grasp on, but a lot of laypeople do not, is the idea of appropriately normalizing parameters. For instance dividing something by the mass of the body, or the population of a nation, to do comparisons between individuals/nations of different sizes. People will often make bad comparisons where they don't normalize properly. But hopefully most people reading this article are not at risk for that.

What resources would you recommend for learning advanced statistics?

What would you call "advanced" statistics? But let's start listing classes: 1) Intro to Discrete and Continuous Probability -- you'll need this for every possible path Now we need to start branching out. Choose your adventure: applied or theoretical? Frequentist, Bayesian, Likelihoodist, or "Machine" Learning? Your normal university statistics sequence will probably give you Intro to Frequentist Statistics 1 at this point. That's a fine way to go, but it's not the only way. In fact, many departments in the empirical sciences will teach Data Analysis classes, or the like, which introduce applied statistics before teaching you the theory, which would mean you've actually dealt with real data before you learn the theory. I think that might be a Very Good Idea. Now let's hope you've taken one of the following paths: * Data Analysis and Intro to Frequentist Stats 1 * Intro to Bayesian Statistics 1 * Intro to Machine Learning (with laboratory exercises to get experience) From there I would recommend knowing linear algebra decently well before moving on. Then you can start taking courses/reading textbooks in more advanced/theoretical machine learning, computational Bayesian methods, multidimensional frequentist statistics, causal analysis, or just more and more applied data analysis. You should probably check what sort of statistical methods are favored "in the field" that you actually care about.

Real world data often has the surprising property of "dimensionality reduction": a small number of latent variables explain a large fraction of the variance in data.

Why is that surprising? The causal structure of the world is very sparse, by the nature of causality. One cause has several effects, so once you scale up to lots of causative variables, you expect to find that large portions of the variance in your data are explained by only a few causal factors.

Causality is indeed the skeleton of data. And oh boy, wait until you hit hierarchic... (read more)

Can you expand your reasoning? We do see around us sparse — that is, understandable — causal systems. And even chaotic ones often give rise to simple properties (e.g. motion of huge numbers of molecules → gas laws). But why (ignoring anthropocentric arguments) would one expect to see this?
There are really just three ways the causal structure of reality could go: * Many causes -> one effect * One cause -> one effect, strictly * One cause -> many effects Since the latter will generate more (apparent) random variables, most observables will end up deriving from a relatively sparse causal structure, even if we assume that the causal structures themselves are sampled uniformly from this selection of three. So, for instance, parameter-space compression (which is its own topic to explain, but oh well), aka: the hierarchical structure of reality, actually does follow that first item: many micro-level causes give rise to a single macro-level observable. But you'll still find that most observables come from non-compressive causal structures. This is why we actually have to work really hard to find out about micro-scale phenomena (things lower on the hierarchy than us): they have fewer observables whose variance is uniquely explicable by reference to a micro-scale causal structure.
I need that expanded a lot more. Why not many causes -> many effects, for example?
Ah, you mean a densely interconnected "almost all to almost all" causal structure. Well, I'd have to guess: because that would look far more like random behavior than causal order, so we wouldn't even notice it as something to causally analyze!
We do notice turbulence as something doesn't look random, and is hard-to-impossible to causally analyze. Here's an anecdote. I can't copy and paste it, but it's in the middle column.
This is a very interesting point. PCA (or as its time and/or space series version is called, the Karhunen-Loève expansion and/or POD) has not been found to be useful for turbulence modeling, as I recall. There's a brief section in Pope's book on turbulence about modeling with this. From what I understand, POD is mostly used for visualization purposes, not to help build models. (It's worth noting that while my background in fluid dynamics is strong, I know little to nothing about PCA and the like aside from what they apparently do.) Maybe I don't actually understand causality, but I think in terms of modeling, we do have a good model (the Navier-Stokes, or N-S, equations) and so in some sense, it's clear what causes what. In principle, if you run a computer simulation with these equations and the correct boundary conditions, the result will be reasonably accurate. This has been demonstrated through direct simulations of some relatively simple cases like flow through a channel. So that's not the issue. The actually issue is that you need a lot of computing power to simulate even basic flows, and attempts to develop lower order models have been fairly unsuccessful. So as a model, N-S is of limited utility as-is. In my view, the "turbulence problem" comes down to two facts: 1. the N-S equations are chaotic (sensitive to initial conditions, so small changes can cause big effect) and 2. they exhibit large scale separation (so the smallest details you need to resolve, the Kolmolgorov scales in most cases are much smaller than the physical dimensions of a problem, say the length of a wing). To understand these points better, imagine that rigid body dynamics was inaccurate (say, modeling the trajectory of a baseball), and you had to model all the individual atoms to get it right. And if one was off that might possibly have a big effect. Obviously that's a lot harder, and it's probably computationally intractable outside of a few simple cases. (The chaos part is "avoided" b

I disagree that you can get an understanding of the idea that "correlation does not imply causation" from Stats 101. I don

[This comment is no longer endorsed by its author]Reply