Understanding information cascades

Ben Pace

New Answer

New Comment

5 Answers sorted by
top scoring

Mar 14, 2019

300

I'm unfortunately swamped right now, because I'd love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.

First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. http://dx.doi.org/10.1287/deca.2014.0293) seems like evidence that this isn't typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., ... & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )

Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point - in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say "My model says 25%, but I'm giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%")

Third, I'd be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven't thought much about how to do it other than to note that it's not as easy as it sounded at first.

[-]Pablo7y180

Thanks for this.

Re extremizing, the recent (excellent) AI Impacts overview of good forecasting practices notes that "more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke."

[-]Davidmanheim7y140

That's a great point. I'm uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .

Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It's almost the equivalent of betting a dollar more than the current high bid in price is right - you don't need to be close, you just need to beat the other people's scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.

3Bird Concept7y

Do you have a link to this data?

3Davidmanheim7y

As I replied to Pablo below, "...it's an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9/10 times you do better due to extremizing. "

2Pablo7y

I only read the AI Impacts article that includes that quote, not the data to which the quote alludes. Maybe ask the author?

6Davidmanheim7y

You don't need the data - it's an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9/10 times you do better due to extremizing.

3Bird Concept7y

One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I'm surprised by the suggestion that GJP didn't do enough, unless their extremizations were frequently in the >90% range.

3Davidmanheim7y

Each season, there were too few questions for this to be obvious, rather than a minor effect, and the "misses" were excused as getting an actually unlikely event wrong. It's hard to say, post-hoc, that the ~1% consensus opinion about a "freak event" were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident. (I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)

3Bird Concept7y

I did, he said a researcher mentioned it in conversation.

3Pablo7y

[meta] Not sure why the link to the overview isn't working. Here's how the comment looks before I submit it: https://imgur.com/MF5Z2X4 (The same problem is affecting this comment.) In any case, the URL is: https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project-an-accompanying-blog-post/

4habryka7y

It's because I am a bad developer and I broke some formatting stuff (again). Will be fixed within the hour. Edit: Fixed now

4Pablo7y

Thanks, Oli!

[-]abramdemski5y100

I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades.

There has been work on this. I believe this is a relevant reference, but I can't tell for sure without paying to access the article:

Protocols Forcing Consensus, Paul Krasucki

The idea is this: Aumann agreement is typically studied with two communicating agents. We can instead study networks of agents, with various protocols (ie, rules for when agents talk to each other). However, not all such protocols reach consensus, the way we see with two agents!

I believe the condition for reaching consensus is directly analogous to the condition for correctness of belief prop in Bayesian networks, IE, the graph should be a tree.

4Davidmanheim5y

Good find - I need to look into this more. The paper is on scihub, and it says it needs to be non-cyclical, so yes. "All the examples in which communicating values of a union-consistent function fails to bring about consensus... must contain a cycle; if there are no cycles in the communication graph, consensus on the value of any union consistent function must be reached."

6abramdemski5y

So, epistemically virtuous social graphs should contain no cycles? ;3 "I can't be your friend -- we already have a mutual friend." "I can't be your friend -- Alice is friends with you and Bob; Bob is friends with Carol; Carol is friends with Dennis; Dennis is friends with Elane; and Elane is my friend already." "Fine, I could be your friend so long as we never discuss anything important."

4Davidmanheim5y

Or perhaps less unreasonably, we need clear epistemic superiority hierarchies, likely per subject area. And it occurs to me that this could be a super-interesting agent-based/graph theoretic modeling study of information flow and updating. As a nice bonus, this can easily show how ignoring epistemic hierarchies will cause conspiracy cascades - and perhaps show that it will lead to the divergence of rational agent beliefs which Jaynes talks about in PT:LoS.

4abramdemski5y

Another more reasonable solution is to always cite sources. There is an analogous solution in belief propagation, where messages carry a trace of where their information came from. Unfortunately I've forgotten what that algorithm is called.

[-]Bird Concept6y20

We (jacobjacob and Benito) decided to award $150 (out of the total bounty of $800) to this answer (and the additional points made in the discussion).

It offers relevant and robust evidence about the role of info-cascades in forecasting environments, together with a discussion of its interpretation.

I'll PM you about payment details.

Bird Concept

Mar 13, 2019

170

Here's a quick bibliography we threw together.

Background:

Information Cascades and Rational Herding: An Annotated Bibliography and Resource Reference (Bikchandani et al. 2004). The best resource on the topic, see in particular the initial papers on the subject.
Y2K Bibliography of Experimental Economics and Social Science Information Cascades and Herd Effects (Holt, 1999. Less thorough, but catches some papers the first one misses.
“Information cascade” from Wikipedia. An excellent introduction.
“Understanding Information Cascades” from Investopedia.

Previous LessWrong posts referring to info cascades:

Information cascades, by Johnicholas, 2009
Information cascades in scientific practice, by RichardKennaway, 2009
Information cascades, LW Wiki

And then here are all the LW posts we could find that used the concept (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) . Not sure how relevant they are, but might be useful in orienting around the concept.

Pablo

Mar 19, 2019

140

Two recent articles that review the existing economic literature on information cascades:

Sushil Bikhchandani, David Hirshleifer and Ivo Welch, Information cascades, The new Palgrave dictionary of economics (Macmillan, 2018), pp. 6492-6500.

Oksana Doherty, Informational cascades in financial markets: review and synthesis, Review of behavioral finance, vol. 10, no. 1 (2018), pp. 53-69.

An earlier review:

Maria Grazia Romano, Informational cascades in financial economics: a review, Giornale degli Economisti e Annali di Economia, vol. 68, no. 1 (2009), pp. 81-109.

Pablo

Mar 19, 2019

130

Information Cascades in Multi-Agent Models by Arthur De Vany & Cassey Lee has a section with a useful summary of the relevant economic literature up to 1999. (For more recent overviews, see my other comment.) I copy it below, with links to the works cited (with the exception of Chen (1978) and Lee (1999), both unpublished doctoral dissertations, and De Vany and Walls (1999b), an unpublished working paper):

A seminal paper by Bikhchandani et al (1992) explains the conformity and fragility of mass behavior in terms of informational cascades. In a closely related paper Banerjee (1992) models optimizing agents who engage in herd behavior which results in an inefficient equilibrium. Anderson and Holt (1997) are able to induce information cascades in a laboratory setting by implementing a version of Bikhchandani et al (1992) model.

The second strand of literature examines the relationship between information cascades and large fluctuations. Lee (1998) shows how failures in information aggregation in a security market under sequential trading result in market volatility. Lee advances the notion of “informational avalanches” which occurs when hidden information (e.g. quality) is revealed during an informational cascade thus reversing the direction of information cascades.

The third strand explores the link between information cascades and heavy tailed distributions. Cont and Bouchaud (1998) put forward a model with random groups of imitators that gives rise to stock price variations that are heavy-tailed distributed. De Vany and Walls (1996) use a Bose-Einstein allocation model to model the box office revenue distribution in the motion picture industry. The authors describe how supply adapts dynamically to an evolving demand that is driven by an information cascade (via word-of-mouth) and show that the distribution converges to a Pareto-Lévy distribution. The ability of the Bose-Einstein allocation model to generate the Pareto size distribution of rank and revenue has been proven by Hill (1974) and Chen (1978). De Vany and Walls (1996) present empirical evidence that the size distribution of box office revenues is Pareto. Subsequent work by Walls (1997), De Vany and Walls (1999a), and Lee (1999) has verified this finding for other markets, periods and larger data sets. De Vany and Walls (1999a) show that the tail weight parameter of the Pareto-Levy distribution implies that the second moment may not be finite. Lastly, De Vany and Walls (1999b) have shown that motion picture information cascades begin as action-based, noninformative cascades, but undergo a transition to an informative cascade after enough people have seen it to exchange “word of mouth” information. At the point of transition from an uninformed to an informed cascade, there is loss of correlation and an onset of turbulence, followed by a recovery of week to week correlation among high quality movies.

[-]Bird Concept6y20

We (jacobjacob and Ben Pace) decided to award $100 (out of the total bounty of $800) to this answer.

It compiles a useful summary of the literature (we learnt a lot from going through on of the papers linked), and it attaches handy links to everything, which is a task which is on the one hand very helpful to other people, and on the other tedious and without many marginal benefits for the writer, and so likely to be under-incentivised.

I'll PM you for payment details.

Jan_Kulveit

Mar 14, 2019

100

Generally, there is a substantial literature on the topic within the field of network science. The right keywords for Google scholar are something like spreading dynamics in complex networks. Information cascades does not seem to be the best choice of keywords.

There are many options how you can model the state of the node (discrete states, oscillators, continuous variables, vectors of anything of the above,...), multiple options how you may represent the dynamics (something like Ising model / softmax, versions of voter model, oscillator coupling, ...) and multiple options how you model the topology (graphs with weighted or unweighted edges, adaptive wiring or not, topologies based on SBM, or scale-free networks, or Erdős–Rényi, or Watts-Strogatz, or real-world network data,... This creates somewhat large space of options, which were usually already explored somewhere in the literature.

What is possibly the single most important thing to know about this, there are universality classes of systems which exhibit similar behaviour; so you can often ignore the details of the dynamics/topology/state representation.

Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)

[-]Bird Concept7y90

I haven't looked through your links in much detail, but wanted to reply to this:

Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)

I either disagree or am confused. It seems good to use resources to outsource your ability to do literature reviews, distillation or extrapolation, to someone with higher comparative advantage... (read more)

8Jan_Kulveit7y

I was a bit confused by we but aren't sure how to reason quantitatively about the impacts, and how much the LW community could together build on top of our preliminary search, which seemed to nudge toward original research. Outsourcing literature reviews, distillation or extrapolation seem great.

[-]Ben Pace7y110

Agreed. I realise the OP could be misread; I've updated the first paragraph with an extra sentence mentioning that summarising and translating existing work/literature in related domains is also really helpful.

[-]Ben Pace7y70

Thanks for the pointers to network science Jan, I don't know this literature, and if it's useful here then I'm glad you understand it well enough to guide us (and others) to key parts of it. I don't see yet how to apply it to thinking quantitatively about scientific and forecasting communities.

If you (or another LWer) thinks that the theory around universality classes is applicable in thinking about how to ensure good info propagation in e.g. a scientific community, and you're right, then I (and Jacob and likely many others) would ... (read more)

[-]Jan_Kulveit7y150

Short summary of how is the lined paper important: you can think about bias as some sort of perturbation. You are then interested in the "cascade of spreading" of the perturbation, and especially factors like the distribution of sizes of cascades. The universality classes tell you this can be predicted by just a few parameters (Table 1 in the linked paper) depending mainly on local dynamic (forecaster-forecaster interactions). Now if you have a good model of the local dynamic, you can determine the parameters and determine into which universality class the problem belongs. Also you can try to infer the dynamics if you have good data on your interactions.

I'm afraid I don't know enough about how "forecasting communities" work to be able to give you some good guesses what may be the points of leverage. One quick idea, if you have everybody on the same platform, may be to do some sort fo A/B experiment - manipulate the data so some forecasters would see the predictions of other with an artificially introduced perturbation, and see how their output will be different from the control group. If you have data on "individual dynamics" liken that, and some knowledge of network structure, the theory can help you predict the cascade size distribution.

(I also apologize for not being more helpful, but I really don't have time to work on this for you.)

[-]Pablo7y50

Information cascades does not seem to be the best choice of keywords.

I wouldn't say that 'information cascades' isn't the best choice of keywords. What's happening here is that the same phenomenon is studied by different disciplines in relative isolation from each other. As a consequence, the phenomenon is discussed under different names, depending on the discipline studying it. 'Information cascades' (or, as it is sometimes spelled, 'informational cascades') is the name used in economics, while network science seems to use a variety of related expressions, such as the one you mention.

[-]Bird Concept6y20

We (jacobjacob and Ben Pace) decided to award $200 (out of the total bounty of $800) to this answer (and the additional comment below).

It seems to offer a learnt summary of the relevance of network science (which offers a complementary perspective on the phenomenon to the microeconomic literature linked by other commenters), which not implausibly took Jan at least an order of magnitude less time to compile than it would have taken us. (For example, the seemingly simple fact of using a different Google scholar keyword than "information cascade" m... (read more)

Rendering 11/37 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 7:04 PM

[-]DanielFilan7y130

This paper looks at the dynamics of information flows in social networks using multi-agent reinforcement learning. I haven't read it, but am impressed by the work of the second author. Abstract:

We model the spread of news as a social learning game on a network. Agents can either endorse or oppose a claim made in a piece of news, which itself may be either true or false. Agents base their decision on a private signal and their neighbors' past actions. Given these inputs, agents follow strategies derived via multi-agent deep reinforcement learning and receive utility from acting in accordance with the veracity of claims. Our framework yields strategies with agent utility close to a theoretical, Bayes optimal benchmark, while remaining flexible to model re-specification. Optimized strategies allow agents to correctly identify most false claims, when all agents receive unbiased private signals. However, an adversary's attempt to spread fake news by targeting a subset of agents with a biased private signal can be successful. Even more so when the adversary has information about agents' network position or private signal. When agents are aware of the presence of an adversary they re-optimize their strategies in the training stage and the adversary's attack is less effective. Hence, exposing agents to the possibility of fake news can be an effective way to curtail the spread of fake news in social networks. Our results also highlight that information about the users' private beliefs and their social network structure can be extremely valuable to adversaries and should be well protected.

[-]Davidmanheim7y70

There's better, simpler results that I recall but cannot locate right now on doing local updating that is algebraic, rather than deep learning. I did find this, which is related in that it models this type of information flow and shows it works even without fully Bayesian reasoning; Jadbabaie, A., Molavi, P., Sandroni, A., & Tahbaz-Salehi, A. (2012). Non-Bayesian social learning. Games and Economic Behavior, 76(1), 210–225. https://doi.org/https://doi.org/10.1016/j.geb.2012.06.001

Given those types of results, the fact that RL agents can learn to do this should be obvious. (Though the social game dynamic result in the paper is cool, and relevant to other things I'm working on, so thanks!)

[-]Shmi7y120

Have you read the backreaction blog where Sabine Hossenfelder details much the same phenomenon in high-energy physics? She claims that the prevailing groupthink ended up believing into String Theory without a shred of evidence for (only some vague hints), and so far with every single prediction of it refuted?

[-]Bird Concept6y110

UPDATE.

We (jacobjacob and Ben Pace) have finally settled on the allocation of the $800 bounty for this question. All the motivations are summarised in this comment, together with links to the relevant prize-winning answer/comment.

We will also post individual notices with motivations next to each comment for ease of discussing them.

We'll PM all prize winners to sort out logistical details of payment.

Main post

David Manheim (answer and additional points made in discussion) $150

This answer offers relevant and robust evidence about the role of info-cascades in forecasting environments, together with a discussion of its interpretation.

Jan Kulveit (answer and additional comment below) $200

This answer seems to offer a learnt summary of the relevance of network science (which offers a complementary perspective on the phenomenon to the microeconomic literature linked by other commenters), which not implausibly took Jan at least an order of magnitude less time to compile than it would have taken us. (For example, the seemingly simple fact of using a different Google scholar keyword than "information cascade" might have taken several hours to realise for a non-expert.) It also attempts to apply these to the case of forecasting (despite Jan's limited knowledge of the domain), which is a task that would likely have been even harder to do without deep experience of the field.

Pablo (1 and 2) $100

These answers compile a useful summary of the literature (we learnt a lot from going through on of the papers linked), and it attaches handy links to everything, which is a task which is on the one hand very helpful to other people, and on the other tedious and without many marginal benefits for the writer, and so likely to be under-incentivised.

Michael McLaren $50

This answer:

It offers a novel mechanism which is relevant to the context of intellectual progress, and ties it in with literature cited in the OP
Rather than just linking the paper, it distills a technical paper, which is a valuable service that is usually underfunded (academic institutions comparatively incentivise novel and surprising insights)

Ways of responding

David Manheim $50

This answer offers a practical example of a cascade-like phenomenon, which is both generally applicable and has real economic consequences. Also, the fact that it comes with a game to understand and practice responding is rare and potentially quite valuable (I (jacobjacob) am of the opinion that deliberate practice is currently a neglected virtue in the rationality/EA spheres).

rossry $250

This answer does several important things.

It references existing (and novel) work in economics and mechanism design, which might have been time-consuming to discover otherwise
It distills a technical paper, which is a valuable service that is usually underfunded (academic institutions comparatively incentivise novel and surprising insights)
The insights provided are quite action-guiding, and caused me (jacobjacob) to have ideas for how one can experiment with new kinds of forecasting tournaments that use a threshold-mechanism to change participant incentives

[-]Kaj_Sotala7y90

Is it necessarily a good idea to break up the topic into so many separate questions before having a general discussion post about it first? I would imagine that people might have comments which were related to several of the different questions, but now the discussion is going to get fragmented over many places.

E.g. if someone knows about a historical info cascade in academia and how people failed to deal with that, then that example falls under two different questions. So then the answer with that example either has to be be split into two or to be posted in an essentially similar form on both pages, neither of which is good for keeping the entire context of the discussion in one place.

[-]Kaj_Sotala7y110

Separately, there's a part of me that finds it viscerally annoying to have multiple questions around the same theme posted around the same time. It feels like it incentivizes people with a pet topic to promote that topic by asking a lot of questions about it so that other topics get temporarily drowned out. Even if the topic is sometimes important enough to be worth it, it still feels like the kind of thing to discourage.

[-]Trinley Goldenberg7y90

I also have this visceral feeling. It feels like a "subquestions" feature could fix both these issues.

[-]Bird Concept7y60

Seems like a sensible worry, and we did consider some version of it. My reasoning was roughly:

1) The questions feature is quite new, and if it will be very valuable, most use-cases and the proper UI haven't been discovered yet (these can be hard to predict in advance without getting users to play around with different things and then talking to them).

No one has yet attempted to use multiple questions. So it would be valuable for the LW team and the community to experiment with that, despite possible countervailing considerations (any good experiment will have sufficient uncertainty that such considerations will always exist).

2) Questions 1/2, 3 and 4 are quite different, and it seems good to be able to do research on one sub-problem without taking mindshare from everyone working on any subproblem.

[-]Bird Concept7y50

This is an update on the timeline for paying out the bounties on this question. They will be awarded for work done before May 13th, but we're delayed by another few weeks in deciding on the allocation. Apologies!

[-]Michael McLaren7y40

Nissen et al 2016 ("Publication bias and the canonization of false facts") give a simple model for how publication bias in academic research can have a similar effect to the "information cascades" described in the OP. False scientific claims are likely to be falsified by an experiment, but will sometimes be found to be true. Positive results supporting a claim may be more likely to be published than negative results against the claim. The authors' model assumes that the credence of the scientific community in the claim is determined by the number of published positive and negative results, and that new studies will be done to repeatedly test the claim until the credence becomes sufficiently close to 0 or 1. The publication bias favoring false results can overpower the odds against getting a positive result in any given experimental replication and lead to false claims becoming canonized as fact with a non-negligible probability.

The mechanism here differs in a sense from the "information cascade" examples in the OP and on the Wikipedia page in that the false claim is being repeatedly tested with new experiments. However, I think it could be seen as fundamentally the same as the citation bias example of Greenberg 2009 in the OP, if we think of the scientific community rather than an individual scientist as being the actor. In the Greenberg 2009 example, the problem is that individual scientists tend only to cite positive findings; in the Nissen et al model, the scientific community tends to only publish positive findings. (Of course, this second problem feeds into the first.)

[-]Bird Concept7y30

See this post for a good, simple mathematical description of the discrete version of the phenomenon.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

50

[ Question ]

Understanding information cascades

50

50

5 Answers sorted by
top scoring

Mar 14, 2019

Mar 13, 2019

Mar 19, 2019

Mar 19, 2019

Mar 14, 2019

Background

Questions

Bounties

50

[ Question ]

Understanding information cascades

50

50

5 Answers sorted by top scoring

Mar 14, 2019

Mar 13, 2019

Mar 19, 2019

Mar 19, 2019

Mar 14, 2019

Background

Questions

Bounties

5 Answers sorted by
top scoring