Cross-posted from the EA forum here
The Unjournal commissioned two evaluations of "Meaningfully reducing consumption of meat and animal products is an unsolved problem: A meta-analysis" by Seth Ariel Green, Benny Smith, and Maya B Mathur. See our evaluation package here.
My take: the research was ambitious and useful, but it seems to have important limitations, as noted in the critical evaluations; Matthew Janés evaluation provided constructive and actionable insights and suggestions.
I'd like to encourage follow-up research on this same question, starting with this paper's example and its shared database (demonstrating commendable transparency), taking these suggestions on board, and building something even more comprehensive and rigorous.
Do you agree? I come back to some 'cruxes' below:
The authors discussed this paper in a previous post.
We conclude that no theoretical approach, delivery mechanism, or persuasive message should be considered a well-validated means of reducing MAP [meat and anumal products'] consumption
Characterizing this as evidence of "consistently small effects ... upper confidence bounds are quite small" for most categories of intervention.[1]
From the Evaluation Manager's summary (Tabare Capitan)
... The evaluators identified a range of concerns regarding the transparency, design logic, and robustness of the paper’s methods—particularly in relation to its search strategy, outcome selection, and handling of missing data. Their critiques reflect a broader tension within the field: while meta-analysis is often treated as a gold standard for evidence aggregation, it remains highly sensitive to subjective decisions at multiple stages.
Paraphrasing these -- mostly from E2, Matthew Jané, but many of the critiques were mentioned by both evaluators
Improper missing data handling: Assigning SMD = 0.01 to non-significant unreported effects introduces systematic bias by ignoring imputation variance
Single outcome selection wastes data: Extracting only one effect per study discards valuable information despite authors having multilevel modeling capacity
Risk-of-bias assessment is inadequate: The informal approach omits critical bias sources like selective reporting and attrition
Missing "a fully reproducible search strategy, clearly articulated inclusion and exclusion criteria ..., and justification for screening decisions are not comprehensively documented in the manuscript or supplement."
No discussion of attrition bias in RCTs... "concerning given the known non-randomness of attrition in dietary interventions"
... And a critique that we hear often in evaluations of meta-analyses: "The authors have not followed standard methods for systematic reviews..."
Epistemic audit: Here is RoastMyPoast's epistemic and factual audit of Janés evaluation. It gets a B- grade (which seems like the modal grade with this tool.) RMP is largely positive, but some constructive criticism (asking for "more explicit discussion of how each identified flaw affects the magnitude and direction of potential bias in the meta-analysis results.")
Seth Ariel Green responded here.
Epistemic/factual audit: Here is RoastMyPoast's epistemic and factual audit of Seth's response. It gets a C- grade, and it raises some (IMO) useful critiques of the response, and a few factual disagreements about the cited methodological examples (these should be doublechecked). It flags "defensive attribution bias" and emphasizes that "the response treats innovation as self-justifying rather than requiring additional evidence of validity."
"Why no systematic search?"
...We were looking at an extremely heterogeneous, gigantic literature — think tens of thousands of papers — where sifting through it by terms was probably going to be both extremely laborious and also to yield a pretty low hit rate on average.
we employed what could be called a ‘prior-reviews-first’ search strategy. Of the 985 papers we screened, a full 73% came from prior reviews, . ... we employed a multitude of other search strategies to fill in our dataset, one of which was systematic search.
David Reinstein:
Seth's response to these issues might be characterized as ~"the ivory tower protocol is not practical, you need to make difficult choices if you want to learn anything in these messy but important contexts and avoid 'only looking under the streetlamp' -- so we did what seemed reasonable."
I'm sympathetic to this. The description intuitively seems like a reasonable approach to me. I'm genuinely uncertain as to whether 'following the meta-analysis rules' is the most useful approach for researchers aiming at making practical recommendations. I'm not sure if the rules were built for the contexts and purposes we're dealing with.
On the other hand, I think a lack of a systematic protocol limits our potential to build and improve on this work, and to make transparent fair comparisons.
And I would have liked the response to directly take on the methodogical issues raised directly -- yes there are always tradeoffs, but you can justify your choices explicitly, especially when you are departing from conversation.
"Why no formal risk of bias assessment?"
The main way we try to address bias is with strict inclusion criteria, which is a non-standard way to approach this, but in my opinion, a very good one (Simonsohn, Simmons & Nelson (2023) articulates this nicely).
After that baseline level of focusing our analysis on the estimates we thought most credible, we thought it made more sense to focus on the risks of bias that seemed most specific to this literature.
... I hope that our transparent reporting would let someone else replicate our paper and do this kind of analysis if that was of interest to them.
David: Again, this seems reasonable, but also a bit of a false dichotomy: you can have both strict inclusion criteria and risk of bias assessment.
"About all that uncertainty"
Matthew Jané raises many issues about ways in which he thinks our analyses could (or in his opinion, should) have been done differently. Now I happen to think our judgment calls on each of the raised questions were reasonable and defensible. Readers are welcome to disagree.
Matthew raises an interesting point about the sheer difficulty in calculating effect sizes and how much guesswork went into it for some papers. In my experience, this is fundamental to doing meta-analysis. I’ve never done one where there wasn’t a lot of uncertainty, for at least some papers, in calculating an SMD.
More broadly, if computing effect sizes or variance differently is of interest, by all means, please conduct the analysis, we’d love to read it!
David: This characterizes Seth's response to a number of the issues: 1. This is challenging, 2. You need to make judgment calls, 3. We are being transparent, and allowing others to follow up.
I agree with this, to a point. But again, I'd like to see them explicitly engage with the issues, careful and formal treatments, and specific practical solutions that Matthew provided. And as I get to below -- there are some systemic barriers to anyone actually following up on this.
Again from the evaluation manager's synthesis (mostly Tabare Capitan)
... the authors themselves acknowledge many of these concerns, including the resource constraints that shaped the final design. Across the evaluations and the author response, there is broad agreement on a central point: that a high degree of researcher judgment was involved throughout the study. Again, this may reflect an important feature of synthesis work beyond the evaluated paper—namely, that even quantitative syntheses often rest on assumptions and decisions that are not easily separable from the analysts' own interpretive frameworks. These shared acknowledgements may suggest that the field currently faces limits in its ability to produce findings with the kind of objectivity and replicability expected in other domains of empirical science.
David Reinstein:
... I’m more optimistic than Tabaré about the potential for meta-analysis. I’m deeply convinced that there are large gains from trying to systematically combine evidence across papers, and even (carefully) across approaches and outcomes. Yes, there are deep methodological differences over the best approaches. But I believe that appropriate meta-analysis will yield more reliable understanding than ad-hoc approaches like ‘picking a single best study’ or ‘giving one’s intuitive impressions based on reading’. Meta-analysis could be made more reliable through robustness-checking, estimating a range of bounded estimates under a wide set of reasonable choices, and enabling data and dashboards for multiverse analysis, replication, and extensions.
I believe a key obstacle to this careful, patient, open work is the current system of incentives and tools offered by academia and the current system of traditional journal publications as a career outcome an ‘end state’. The author’s response “But at some point, you declare a paper ‘done’ and submit it” exemplifies this challenge.The Unjournal aims to build and facilitate a better system.
Will anyone actually follow up on this? Once the "first paper" is published in an academic journal, can anyone be given a career incentive, or direct compensation, to improve upon it? Naturally, this gets at one of my usual gripes with the traditional academic journal model, a problem that The Unjournal's continuous evaluation tries to solve.
This also depends on... whether the animal welfare and EA community believes that rigorous/academic-style research is useful in this area. And wants to fund and support a program to gradually and continually improve our understanding and evidence on perhaps a small number of crucial questions like this.
(And, preaching to the choir here, I also think it depends on good epistemic norms.)
However they say "the largest effect size, ... choice architecture, comes from too few studies to say anything meaningful about the approach in general. So for that case we're dealing with an absence of evidence, i.e., wide posteriors.