[ Question ]

Understanding information cascades

by jacobjacob, Ben Pace2 min read13th Mar 201942 comments


Information CascadesBounties (closed)Rationality

Meta: Because we think understanding info cascades are important, we recently spent ~10 hours trying to figure out how to quantitatively model them, and have contributed our thinking as answers below. While we currently didn't have the time to continue exploring, we wanted to experiment with seeing how much the LW community could together build on top of our preliminary search, so we’ve put up a basic prize for more work and tried to structure the work around a couple of open questions. This is an experiment! We’re looking forward to reading any of your contributions to the topic, including things like summaries of existing literature and building out new models of the domain.


Consider the following situation:

Bob is wondering whether a certain protein injures the skeletal muscle of patients with a rare disease. He finds a handful papers with some evidence for the claim (and some with evidence against it), so he simply states the claim in his paper, with some caution, and adds that as a citation. Later, Alice comes across Bob’s paper and sees the cited claim, and she proceeds to cite Bob, but without tracing the citation trail back to the original evidence. This keeps happening, in various shapes and forms, and after a while a literature of hundreds of papers builds up where it’s common knowledge that β amyloid injures the skeletal muscle of patients with inclusion body myositis -- without the claim having accumulated any more evidence. (This real example was taken from Greenberg, 2009, which is a case study of this event.)

An information-cascade occurs when people update on each others beliefs, rather than sharing the causes of those beliefs, and those beliefs end up with a vestige of support that far outstrips the evidence for them. Satvik Beri might describe this as the problem of only sharing the outputs of your thinking process, not your inputs.

The dynamics here are perhaps reminiscent of those underlying various failures of collective rationality such as asset bubbles, bystander effects and stampedes.

Note that his effect is different from other problems of collective rationality like the replication crisis, which involve low standards for evidence (such as unreasonably lax p-value thresholds or coordination problems preventing publishing of failed experiments), or the degeneracy of much online discussion, which involves tribal signalling and UI encouraging problematic selection effects. Rather, information cascades involve people rationally updating without any object-level evidence at all, and would persist even if the replication crisis and online outrage culture disappeared. If nobody lies or tells untruths, you can still be subject to an information cascade.


Ben and I are confused about how to think about the negative effects of this problem. We understand the basic idea, but aren't sure how to reason quantitatively about the impacts, and how to trade-off solving these problems in a community versus doing other improvements to overall efficacy and efficiency of a community. We currently know only how to think about these qualitatively.

We’re posting a couple of related questions that we have some initial thoughts on, that might help clarify the problem.

If you have something you’d like to contribute, but that doesn’t seem to fit into the related questions above, leave it as an answer to this question.


We are committing to pay at least either $800 or (No. of answers and comments * $25), whichever is smaller, for work on this problem recorded on LW, done before May 13th. The prize pool will be split across comments in accordance with how valuable we find them, and we might make awards earlier than the deadline (though if you know you’ll put in work in x weeks, it would be good to mention that to one of us via PM).

Ben and Jacob are each responsible for half of the prize money.

Jacob is funding this through Metaculus AI, a new forecasting platform tracking and improving the state-of-the-art in AI forecasting, partly to help avoid info-cascades in the AI safety and policy communities (we’re currently live and inviting beta-users, you can sign-up here).

Examples of work each of us are especially excited about:


  • Contributions to our Guesstimate model (linked here), such as reducing uncertainty on the inputs or using better models.

  • Extensions of the Guesstimate model beyond biomedicine, especially in ways that make it more directly applicable to the rationality/effective altruism communities

  • Examples and analysis of existing interventions that deal with this and what makes them work, possibly suggestions for novel ones (though avoiding the trap of optimising for good-seeming ideas)

  • Discussion of how the problem of info-cascades relates to forecasting


  • Concise summaries of relevant papers and their key contributions

  • Clear and concise explanations of what other LWers have found (e.g. turning 5 long answers into 1 medium sized answer that links back to the others while still conveying the key info. Here’s a good example of someone distilling an answer section).


New Answer
Ask Related Question
New Comment

5 Answers

I'm unfortunately swamped right now, because I'd love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.

First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. http://dx.doi.org/10.1287/deca.2014.0293) seems like evidence that this isn't typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., ... & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )

Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point - in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say "My model says 25%, but I'm giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%")

Third, I'd be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven't thought much about how to do it other than to note that it's not as easy as it sounded at first.

Here's a quick bibliography we threw together.


Previous LessWrong posts referring to info cascades:

And then here are all the LW posts we could find that used the concept (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) . Not sure how relevant they are, but might be useful in orienting around the concept.

Two recent articles that review the existing economic literature on information cascades:

  • Sushil Bikhchandani, David Hirshleifer and Ivo Welch, Information cascades, The new Palgrave dictionary of economics (Macmillan, 2018), pp. 6492-6500.
  • Oksana Doherty, Informational cascades in financial markets: review and synthesis, Review of behavioral finance, vol. 10, no. 1 (2018), pp. 53-69.
  • An earlier review:

  • Maria Grazia Romano, Informational cascades in financial economics: a review, Giornale degli Economisti e Annali di Economia, vol. 68, no. 1 (2009), pp. 81-109.
  • Information Cascades in Multi-Agent Models by Arthur De Vany & Cassey Lee has a section with a useful summary of the relevant economic literature up to 1999. (For more recent overviews, see my other comment.) I copy it below, with links to the works cited (with the exception of Chen (1978) and Lee (1999), both unpublished doctoral dissertations, and De Vany and Walls (1999b), an unpublished working paper):

    A seminal paper by Bikhchandani et al (1992) explains the conformity and fragility of mass behavior in terms of informational cascades. In a closely related paper Banerjee (1992) models optimizing agents who engage in herd behavior which results in an inefficient equilibrium. Anderson and Holt (1997) are able to induce information cascades in a laboratory setting by implementing a version of Bikhchandani et al (1992) model.
    The second strand of literature examines the relationship between information cascades and large fluctuations. Lee (1998) shows how failures in information aggregation in a security market under sequential trading result in market volatility. Lee advances the notion of “informational avalanches” which occurs when hidden information (e.g. quality) is revealed during an informational cascade thus reversing the direction of information cascades.
    The third strand explores the link between information cascades and heavy tailed distributions. Cont and Bouchaud (1998) put forward a model with random groups of imitators that gives rise to stock price variations that are heavy-tailed distributed. De Vany and Walls (1996) use a Bose-Einstein allocation model to model the box office revenue distribution in the motion picture industry. The authors describe how supply adapts dynamically to an evolving demand that is driven by an information cascade (via word-of-mouth) and show that the distribution converges to a Pareto-Lévy distribution. The ability of the Bose-Einstein allocation model to generate the Pareto size distribution of rank and revenue has been proven by Hill (1974) and Chen (1978). De Vany and Walls (1996) present empirical evidence that the size distribution of box office revenues is Pareto. Subsequent work by Walls (1997), De Vany and Walls (1999a), and Lee (1999) has verified this finding for other markets, periods and larger data sets. De Vany and Walls (1999a) show that the tail weight parameter of the Pareto-Levy distribution implies that the second moment may not be finite. Lastly, De Vany and Walls (1999b) have shown that motion picture information cascades begin as action-based, noninformative cascades, but undergo a transition to an informative cascade after enough people have seen it to exchange “word of mouth” information. At the point of transition from an uninformed to an informed cascade, there is loss of correlation and an onset of turbulence, followed by a recovery of week to week correlation among high quality movies.

    Generally, there is a substantial literature on the topic within the field of network science. The right keywords for Google scholar are something like spreading dynamics in complex networks. Information cascades does not seem to be the best choice of keywords.

    There are many options how you can model the state of the node (discrete states, oscillators, continuous variables, vectors of anything of the above,...), multiple options how you may represent the dynamics (something like Ising model / softmax, versions of voter model, oscillator coupling, ...) and multiple options how you model the topology (graphs with weighted or unweighted edges, adaptive wiring or not, topologies based on SBM, or scale-free networks, or Erdős–Rényi, or Watts-Strogatz, or real-world network data,... This creates somewhat large space of options, which were usually already explored somewhere in the literature.

    What is possibly the single most important thing to know about this, there are universality classes of systems which exhibit similar behaviour; so you can often ignore the details of the dynamics/topology/state representation.

    Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)