Epistemic Status: This was written in 2010 and existed in LW's editing purgatory ever since. It doesn't seem extremely wrong. ("Remember 'light side epistemology'? Pepperidge Farms remembers.") Perhaps the biggest flaw is that I didn't hit the publication button sooner and this meta-flaw is implicit in the perspective being described? Maybe the last 10 years would have gone better, but it is hard to reverse the action of publishing. Some links don't work. I fixed a few typos.. Perhaps link archiving will one day receive more systematic attention and links here which once worked, and now do not, can be restored via some kind of archival reconciliation?

A classic information cascade is a situation where early decisions by individuals in a group bias the decisions of other people in same group in the same direction. When people talk about information cascades they're generally talking about "dumb herd behavior". 

If one person observed something and justified a behavior of their own in a robust way, their behavior, observed, is indirect evidence of their observations and reasoning. If 10 people update based on a shallow observation of the first person, their own behavior is not actually 10 times more evidence. It would be more informative for there to be 11 independent people who all just happened to think and act alike independently.

The classic example of multi-agent information cascades is probably an economic bubble, where early investment decisions are "irrationally" mimicked by later speculators creating a temporary increase in demand for the investment.  Purchasing behavior expands through a community like a wave and the "genuine" opportunity here is to buy ahead of the wave and "flip" the investment shortly afterwards to a "greater fool" who found out about the bubble later than yourself.  Aside from obvious moral concerns, the selfish worry is that bubbles are basically self organized Ponzi schemes where the last round of investors is the largest, and they lose a substantial amount of money to early adopters who successfully executed the "flip" maneuver before the crash.  The "average investor" has no easy way to tell that they are a late entrant who will lose money.

In this community, RichardKennaway called attention to a paper about classical cascades in academic citations and Johnicholas wrote about similar potential in voting and karma on lesswrong.

The rest of this article will discuss a process similar to classical information cascades except happening within the mind of a single person in a way that requires no mass delusion of any kind.  The similarity between "internal information cascades" and the more traditional "external" information cascades arises from the fact that an initial idea non-obviously predetermines subsequent conclusions. Perhaps a way to think about it is that your own past behaviors and circumstances are not independent evidence, because they all had you in common.

Below, there are two rather different examples of the phenomenon.  One example involves moral behavior based on theories of human nature. The other example uses economically rational skill acquisition.  The post concludes with abstract observations and possible behavioral implications.

Expert Specialization As An Internal Information Cascade

In this essay I wanted to make sure to provide two examples of internal cascades and I wanted one of the examples to be positive because the critical feature of an internal information cascade is not that they are "bad".  The critical feature is that "initial" prior beliefs can feed forward into evidence gathering processes with potentially dramatic effects on subsequent beliefs.  The final beliefs that grow from internal information cascades are deeply justified for the agent that believes them in the sense that they explain the agent's sensory expectations... the thing these beliefs don't have is the property of predicting the sensory data that other agents should expect (unless, perhaps, they start experiencing the world via the behavioral patterns and environmental contexts that are consistent with the belief's implied actions performed long enough and skillfully enough).

The easiest way I can think to explain this is with the example of a friend of mine who adjusted her life in her 40's.  She'd been a programmer for years and had a sort of aura of wisdom for technology and consulting and startups.  Some of my jobs in the past have involved computers and it was uncanny the way she could predict what was going on in my jobs from snippets of story about work.  The thing is, she's a nurse now, because she got sick of programming and decided to change careers.  The career change was a doozy in terms of personal finance but the real cost wasn't from the nursing school tuition... the really expensive part was the opportunity cost from not utilizing her programming skills while learning to be a nurse.

Something to notice here is how the pop culture "ten thousand hours till expertise" rule implies that you can work on almost anything and get progressively better and better at it.  There is also a substantial academic literature on "expertise" but the key point in terms of internal information cascades is that people get better at whatever they do, not what they don't do.  My friend is a counterexample of an internal information cascade (she successfully changed contexts) but the costs she bore in doing so highlight the natural barriers to novel skill acquisition: the more useful your skills in any domain become, the larger the opportunity costs for subsequent skill acquisition in other areas.

Economists are very big on the benefits of specialization, and point to the real benefits of trade as, in part, simply enabling expertise to develop so that people can do something enough times to become really good at it.  From Adam Smith's theories about people to individual bees gaining productivity from flower specialization the economic value of specialization seems relatively clear1.  

So while gaining expertise is generally a good thing, one should keep in mind that your world model after you become an expert will gain ridiculously high resolution (compared to non experts) in your area of expertise, while other areas of your world model will have much less resolution.  The high resolution will tend to become ever more pronounced over time, so the cascade here is primarily in the "quantity" of beliefs within a domain, but the quality of the beliefs might also go up. Despite the benefits, the process may bring along certain "free riding beliefs" if you generate the occasional motivated belief at the edges. Perhaps justifying the value of your expertise (which is probably real) by processes that don't track for everyone? When people from different professions interact in an area outside either of their expertise you'll see some of this, and the biases revealed in this way can make for amusing jokes about "The mathematician, the physicist, and the engineer..."

Uncooperative Misanthropy As An Internal Information Cascade

Game theory has been inspiring psychological studies for many decades now.  Kelley & Stahelski were early researchers in this area who proposed the "Triangle Hypothesis" which states "that competitors hold homogeneous views of others by assuming that most others are competitive, whereas cooperators or pro-social people hold more heterogeneous views by assuming that others are either cooperative or competitive" (source).

To vividly understand the triangle hypothesis, imagine that you're a participant in a study on actual *human performance* in an iterated prisoner's dilemma (especially in the 1960's and 1970's before the prisoner's dilemma paradigm had diffused into popular culture).  The standard "tit for tat" strategy involves cooperating as your first move and thereafter simply mirroring the other person's behavior back at them.  It is a very simple strategy that frequently works well.  Suppose, however, you defected initially, just to see what would happen?  When the other person defected on the next turn (as a natural response) there's a significant chance that repeated retaliation would be the outcome.  It is a tragic fact that retaliatory cycles are sometimes actually observed with real human participants even in iterated prisoners dilemmas where the structure of the game should push people into cooperation.

Supposing you found yourself in a retaliatory cycle at the beginning of the study, then a grumpy mood from your first session could lead to another retaliatory cycle with a second partner.  At this point you might start to wonder if everyone in this experiment was just automatically hostile?  Perhaps maybe the world is just generally full of assholes and idiots?  The more you believe something like this, the more likely you are to preemptively defect in your subsequent sessions with other experimental participants.  Your preemptive cheating will lead to more conflict, thereby confirming your hypothesis.  At the end of the experiment you'll have built up an impressive body of evidence supporting the theory that the world is full of evil idiots.

In 1974 Braver confirmed that the perceived intentions of partners in game theoretic contexts were frequently more important for predicting behavior than a subject's personal payoff matrix, and in 1978 Goldman tried pre-sorting participants into predicted cooperators and defectors and found (basically as predicted) that defectors tended not to notice opportunities to cooperate even when those opportunities actually existed.

Consider the tragedy here: People can update on the evidence all they want, but initial social hypotheses can still channel them into a social dynamics where they generate the evidence necessary to confirm their prior beliefs, even when those beliefs lead to suboptimal results.  This is especially worrisome in light of the way the first few pieces of evidence can be acquired based on guesses growing out of marginally related environment noise in the moments before learning "formally starts".

Summarizing The Concept And Its Implications

First, here are four broad observations about internal information cascades:

  1. Internal information cascades happen within a single mind, by definition, with no intermediate step requiring the agreement of others. I was tempted to call just them single-agent information cascades. They probably even happen in the minds of solitary agents, as with skill specialization for a survivor marooned on a desert island who repeats their relatively random early successes in food acquisition.  I have a pet hypothesis that isolation-induced unusual internal information cascades around academic subjects are part of the story for autodidacts who break into academic respectability like Ramanujan and Jane Jacobs.
  2. The beliefs that result from internal information cascades may be either good or bad for an individual or for the group they are a part of.  In the broader scheme of things (taking into account the finite channel capacity of humans trying to learn, and the trivial ease of barter) it would probably be objectively bad for everyone in the world to attempt to acquire precisely the same set of beliefs about everything.  Nonetheless, at least one example already exists (the triangle hypothesis mentioned above) of misanthropic beliefs cascading into anti-social behavioral implications that reinforce the misanthropy. Being precise, the misanthropy almost certainly deserves a clear negative emotional valence, not the cascade as such.
  3. Internal information cascades put one in a curious state of "Bayesian sin" because one's prior beliefs "contaminate" the evidence stream from which one forms posterior beliefs.  There is a sense in which they may be inescapable for people running on meat brains, because our "prior beliefs" are in some sense a part of the "external world" that we face moment-to-moment.  Perhaps there are people who can update their entire belief network instantaneously? Perhaps computer based minds will be able to do this in the future?  But I don't seem to have this ability.  And despite the inescapability of this "Bayesian sin" the curiosity is that such cascades can sometimes be beneficial, which is a weird result from a sinful situation...
  4. Beliefs formed in the course of internal information cascades appear to have very weird "universality" properties.  Generally "rationality", "reason", and "science" are supposed to have the property of universality because, in theory, people should all converge to the same conclusions by application of "reason".  However, using a pragmatic interpretation of "the universe" as the generator of sense data that one can feasibly access, it may be the case that some pairs of people could find themselves facing "incommensurate data sources".  This would happen when someone is incapable of acting "as if" they believed what another person believes because the inferential distance is too great.  The radically different sensory environment of two people deep into their own internal information cascades may pose substantial barriers to coordination and communication especially if they are not recognized and addressed by at least one of the parties.

Second, in terms of behavioral implications, I honestly don't have any properly validated insights for how to act in light of internal information cascades.  Experimentally tested advice for applying these insights in predictably beneficial ways would be great, but I'm just not aware of any.  Here are some practical "suggestions" to take with a grain of salt:

  1. Pay a lot of attention to epistemic reversibility.  Before I perform an experiment like (1) taking a drug, (2) hanging out with a "community of belief", or (3) taking major public stands on an issue, I try to look farther into the process to see how hard it is for people to get out later on.  A version of rationality that can teach one to regret the adoption of that version of rationality is one that has my respect, and that I'm interested in trying out.  If none of your apostates are awesome, maybe your curriculum is bad? For a practical application, my part in this comment sequence was aimed at testing Phillip Eby's self help techniques for reversibility before I jumped into experimenting with them.
  2. Try to start with social hypotheses that explain situations by attributing inadequacy and vice to "me" and wisdom and virtue to "others" (but keep an exit option in your back pocket in case you're wrong). The triangle hypothesis was my first vivid exposure (I think I learned about it in 2003 or so) to the idea that internal information cascades can massively disrupt opportunities to cooperate with people.  I have personally found it helpful to keep in mind.  While hypotheses that are self-insulting and other-complimenting are not necessarily the place that "perfectly calibrated priors" would always start out, I'm not personally trying to be perfectly calibrated at every possible moment.  My bigger picture goal is to have a life I'll look back from the near and far future as rewarding and positive taken as an integrated whole.  The idea is that if my initial working hypothesis about being at fault and able to learn from someone is wrong then the downside isn't that bad, and I'll be motivated to change, but if I'm right then I may avoid falling into the "epistemic pit" of an internal information cascade with low value.  I suspect that many traditional moral injunctions work partly to help here, keeping people out of dynamics from which they are unlikely to escape without substantial effort, and in which they may become trapped because they won't even notice that their situation could be otherwise.
  3. Internal information cascades generally make the world that you're dealing with more tractable.  For example, if you think that everyone around you is constantly trying to rip you off and/or sue you, you can take advantage of this "homogeneity in the world" by retaining a good lawyer for the kind of lawsuits you find yourself in over and over.  I was once engaged in a "walk and talk" and the guy I was with asked out of the blue whether I noticed anything weird about people we'd passed.  Apparently, a lot of people had been smiling at us, but people don't smile at strangers (or to my conversational partner?) very much.  I make a point of trying to smile, make eye contact, and wave any time I pass someone on a sparsely populated street but I didn't realize that this caused me to be out of calibration with respect to the generically perceivable incidence of friendly strangers.  I was tempted to not even mention it here? Practicing skills to gain expertise in neighborliness seems like an obvious application of the principle, but maybe it is complicated.
  4. Another use for the concept involves "world biasing actions" as a useful way of modeling "dark-side epistemology". My little smile-and-waves to proximate strangers were one sort of world biasing action.  Threatening someone that you'll call your lawyer is another sort of world biasing action.  In both cases they are actions that are potentially part of internal information cascades.  I think one reason (in the causal explanation sense) that people become emotionally committed to dark-side epistemology is that some nominally false beliefs really do imply personally helpful world biasing actions.  When people can see these outcomes without mechanistically understanding where the positive results come from, they may (rightly) think that loss of "the belief" might remove life benefits that they are, in actual fact, deriving from the belief.  My guess is that a "strict light side epistemologist" would argue that false beliefs, are false beliefs, are false beliefs and the correct thing to do is (1) engage in positive world biasing actions anyway, (2) while refraining from negative world biasing actions, (3) knowing full well that reality probably isn't the way the actions "seem to assume".  Personally, I think strict light side epistemology gives humans more credit for mindfulness than we really have.  On the one hand I don't want to believe crazy stuff, but on the other hand I don't want to have to remember five facts and derive elaborate consequences from them just to say hi to someone.  I have found it helpful to sometimes just call the theory that endorses my preferred plan my "operating hypothesis" without worry about the priors or what I "really believe".  Its easier for me to think about theories held at a distance and their implications, than to "truly believe" one thing and "do" another.  In any case, internal information cascades help me think about the questions of the "practically useful beliefs" and world biasing actions in a more mechanistic and conceptually productive way.

Notes:

1 = While hunting down the link about bee specialization I found an anomalous species of ants where nothing seemed to be gained from specialization.  I'm not sure what to make of this, but it seemed irresponsible to suppress the surprise.

New Comment
8 comments, sorted by Click to highlight new comments since:

In 1974 Braver confirmed that the perceived intentions of partners in game theoretic contexts were frequently more important for predicting behavior than a subject's personal payoff matrix, and in 1978 Goldman tried pre-sorting participants into predicted cooperators and defectors and found (basically as predicted) that defectors tended not to notice opportunities to cooperate even when those opportunities actually existed.

Consider the tragedy here: People can update on the evidence all they want, but initial social hypotheses can still channel them into a social dynamics where they generate the evidence necessary to confirm their prior beliefs, even when those beliefs lead to suboptimal results.  

Seems related to different worlds:

A few years ago I had lunch with another psychiatrist-in-training and realized we had totally different experiences with psychotherapy.

We both got the same types of cases. We were both practicing the same kinds of therapy. We were both in the same training program, studying under the same teachers. But our experiences were totally different. In particular, all her patients had dramatic emotional meltdowns, and all my patients gave calm and considered analyses of their problems, as if they were lecturing on a particularly boring episode from 19th-century Norwegian history.

I’m not bragging here. I wish I could get my patients to have dramatic emotional meltdowns. As per the textbooks, there should be a climactic moment where the patient identifies me with their father, then screams at me that I ruined their childhood, then breaks down crying and realizes that she loved her father all along, then ???, and then their depression is cured. I never got that. I tried, I even dropped some hints, like “Maybe this reminds you of your father?” or “Maybe you feel like screaming at me right now?”, but they never took the bait. So I figured the textbooks were misleading, or that this was some kind of super-advanced technique, or that this was among the approximately 100% of things that Freud just pulled out of his ass.

And then I had lunch with my friend, and she was like “It’s so stressful when all of your patients identify you with their parents and break down crying, isn’t it? Don’t you wish you could just go one day without that happening?”

And later, my supervisor was reviewing one of my therapy sessions, and I was surprised to hear him comment that I “seemed uncomfortable with dramatic expressions of emotion”. I mean, I am uncomfortable with dramatic expressions of emotion. I was just surprised he noticed it. As a therapist, I’m supposed to be quiet and encouraging and not show discomfort at anything, and I was trying to do that, and I’d thought I was succeeding. But apparently I was unconsciously projecting some kind of “I don’t like strong emotions, you’d better avoid those” field, and my patients were unconsciously complying.

This reminds me of 'The Medium is the Message' and the Sapir-Whorf hypothesis and Quine's ontological commitments. Namely, that leaky abstractions don't just leak sideways across your different abstractions, but also up and down across levels of abstraction. Thus your epistemology leaks into your ontology and vice versa, which leak into which goals you can think about etc.

One takeaway from thinking this way was that I radically increased the priority on figuring out which skills are worth putting serious time into. Which were more 'upstream' of more good things. Two answers I came up with were expert judgment, since I can't do the vast majority of things on my own I need to know who to listen to, and introspection in order to not be mistaken about what I actually want.

Poetic summary: priors lay heavy.

Thanks for posting this! I was wondering if you might share more about your "isolation-induced unusual internal information cascades" hypothesis/musings! Really interested in how you think this might relate to low-chance occurrences of breakthroughs/productivity.

So, I think Thomas Kuhn can be controversial to talk about, but I feel like maybe "science" isn't even "really recognizable science" maybe until AFTER it becomes riddled with prestige-related information cascades?

Kuhn noticed, descriptively, that when you look at actual people trying to make progress in various now-well-defined "scientific fields" all the way back at the beginnings, you find heterogeneity of vocabulary, re-invention of wheels, arguments about epistemology, and so on.  This is "pre-science" in some sense. The books are aimed at a general audience. Everyone starts from scratch. There is no community that considers itself able to ignore the wider world and just geek out together but instead there is just a bunch of boring argumentative Tesla-caliber geniuses doing weird stuff that isn't much copied or understood by others.

THEN, a Classic arises. Historically almost always a book. Perhaps a mere monograph. There have been TWO of them named Principia Mathematica already! 

It sweeps through a large body of people and everyone who reads it can't help but feel like conversations with people who haven't read it are boring retreads of old ideas. The classic lays out a few key ideas, a few key experiments, and a general approach that implies a bunch of almost-certainly-tractable open problems. Then people solve those almost-certainly-tractable problems like puzzles, one after another, and write to each other about it, thereby "making progress" with durable logs of the progress in the form of the publications. That "puzzle and publish" dynamic is "science as usual".

Subtract the classic, and you don't have a science... and it isn't that you don't necessarily have something fun or interesting or geeky or gadgety or mechanistic or relevant to the effecting of all things possible... its just that it lacks that central organizing "memetic sweep" (which DOES kind of look like a classic sociological information cascade in some ways) and lacks a community that will replicate and retain the positive innovations over deep time while talking about them the same way and citing the heroes forever.

There was no textbook or professor (not that I'm aware of anyway) that taught John Carmack how to create the doom 3D engine. Alex Krizhevsky's GPU work for computer vision was sort of out of left field, and a lot of it related to just hunkering down with the minutiae of how a GPU's memory pipeline could be reconciled with plausible vision-centric neural net architectures.  One layer of meta up from there, Peter Thiel has a line he has repeated for years about how being socially attuned might actually make people less good at doing interesting projects. Oh... and Montessori kids showing up all over the place doing new shit.

I'm not saying here that left-field innovators SHOULD be role models. There could be good public and private reasons for optimizing in a way that is more socially aware and build around identifying and copying the greats? But cascades are sort of violating Bayes, and the cascade perspective suggests that not all the low hanging fruit has yet been picked in science, and there are reasons to suspect that being aware of the choices "to cascade or not to cascade" might make the choice more of a CHOICE rather than an accidental default. Mostly people seem to DEFAULT into copying. Then a weird number of innovators also have weird starts.

This reminds me of the garden of forking paths a la Andrew Gelman. Good post with helpful suggestions.

Corollary: If you want to invent an at-least-human-level AI then it needs to have a mechanism for internal information cascades.

Would deep dives into ethnic facial feature extraction for proximal ethnicities when compared to ethnicities one is not exposed to on a regular basis be a form of internal information cascade? Seems like it would also prompt an externality of seeming reduced respect between affected groups.

It stands to reason that when I formulate a fear relation to dogs due to an early experience, that the same world model would predispose me to judging cats when first encountering them in a different light.