OpenPhil on "GiveWell’s Top Charities Are (Increasingly) Hard to Beat"

Raemon

This is a linkpost for https://www.openphilanthropy.org/blog/givewells-top-charities-are-increasingly-hard-beat

This post by Alex Berger of OpenPhil outlines some shifts in thinking at OpenPhil, about what bar they set for their grantmaking. It seemed noteworthy...

as potentially relevant to the Drowning Children are Hard to Find discussion.
as an update on how OpenPhil thinks about making grants relating to US policy. (I think the intended thesis of the post was 'It's harder than we thought for US Giving to outperform Givewell Top Charities.')

Americans giving random Americans dollars as "null hypothesis."

[note: I'm not 100% sure I understood this framework, but here's my understanding]

For awhile, there was a background assumption that "unconditional cash transfers to the world's poorest" was a default charitable option.

This blogpost articulated something a root benchmark underlying that: if you're an American, you might consider yourself representative of the "random American" reference class. The default thing you can do with money is spend it about as productively as another random US citizen. If you're altruistic, the root benchmark is something like "would you spend this money in a way that generated more utility than giving it to a random member of your reference class?"

[obviously this is all pretty American centric, I'm guessing this applies roughly equally throughout the western world]

This article claims:

Unconditional cash transfers to the world's poorest are about 100x more useful than giving a dollar to a random other American
Givewell top charities are about 10x more impactful than GiveDirectly (i.e. 1000x to baseline)

The application of this is in evaluating US policy and scientific research. These are domains that seem relevant if you have near-termist "help people alive today" goals. They were also domains that OpenPhil had expected to outperform Givewell type charities (even paying for cost-of-living in the US)

OpenPhil used to set it's "near-termist" grantmaking standards to "100x". But their current experience is that it's increasingly hard to beat the 1000x value of top Givewell charities.

While we think a lot of our “near-termist, human-centric” grantmaking clears the 100x bar, we see less evidence that it will clear a ~1,000x bar.

Since adopting the cash transfer benchmark in 2015, we’ve made roughly 300 grants totalling $200 million in our near-termist, human-centric focus areas of criminal justice reform, immigration policy, land use reform, macroeconomic stabilization policy, and scientific research. To get a sense of our estimated returns for these grants, we looked at the largest grants and found 33 grants totalling $73M for which the grant investigator conducted an ex ante “back-of-the-envelope-calculation” (“BOTEC”) to roughly estimate the expected cost-effectiveness of the potential grant for Open Philanthropy decision-makers’ consideration.

These 33 grants were estimated by their investigator to have an expected cost-effectiveness of at least 100x. (This makes sense given the existence of our “100x bar.”) Of those 33, only eight grants, representing approximately $32 million, had BOTECs of 1,000x or greater. Our large grant to Target Malaria accounts for more than half of that.

Although we don’t typically make our internal BOTECs public, we compiled a set here (redacted somewhat to protect some grantees’ confidentiality) to give a flavor of what they look like. As you can see, they are exceedingly rough, and take at face value many controversial and uncertain claims (e.g., the cost of a prison-year, the benefit of a new housing unit in a supply-constrained area, the impact of monetary policy on wages, the likely impacts of various other policy changes, stated probabilities of our grantees’ work causing a policy change).

We would guess that these uncertainties would generally lead our BOTECs to be over-optimistic (rather than merely adding unbiased noise) for a variety of reasons:

Program officers do the calculations themselves, and generally only do the calculations for grants they’re already inclined to recommend. Even if there’s zero cynicism or intentional manipulation to get “above the bar,” grantmakers (including me) seem likely to be more charitable to their grants than others would be.

Many of these estimates don’t adjust for considerations that would systematically push towards lower estimated cost-effectiveness, like declining marginal returns to funding at the grantee level, time discounting, or potential non-replicability of the research our policy goals are based on. The comparison with the level of care in the GiveWell cost-effectiveness models on these features is pretty stark.

We think it’s notable that despite likely being systematically over-optimistic in this way, it’s still rare for us to find grant opportunities in U.S. policy and scientific research that appear to score better than GiveWell’s top charities.

Of course, compared to GiveWell, we make many more grants, to more diverse activities, and with an explicit policy of trying to rely more on program officer judgment than these BOTECs. So the idea that our models look less robust than GiveWell’s is not a surprise – we’ve always expected that to be the case – but combining that with GiveWell’s rising bar is a more substantive update.

In spite of these calculations, we think there are some good arguments to consider in favor of our current grantmaking in these areas. [More]

We continue to think it is likely that there are causes aimed at helping people today (potentially including our current ones) that could be more cost-effective than GiveWell’s top charities, and we are hiring researchers to work on finding and evaluating them. More.

They discuss some reasons to think science and policy may still be better bets for OpenPhil over Givewell style charities. Of the reasons listed, the one that made most sense within my own worldview is "hits based giving", i.e. even if most science and policy interventions seem to be in the 100x range, they have stronger chance of extreme upside than Givewell style charities.

What bar to hold donations to?

How many opportunities should we expect to find to channel dollars into "relatively straightforward charitable interventions?" What bar does it make sense to hold charitable donations to?

In 2015, when we first wrote about adopting the cash transfer benchmark, it looked like GiveWell could plausibly “run out” of their more-cost-effective-than-cash giving opportunities. At the time, they had three non-cash-transfer top charities they estimated to be in the 5-10x cash range (i.e., 5 to 10 times more cost-effective than cash transfers), with ~$145 million of estimated short-term room for more funding. That, plus uncertainty about the amount of weight to put on these figures, led us to adopt the cash transfer benchmark. (In the remainder of this post, I occasionally shorten “cash transfer” to just “cash.”)

But by the end of 2018, GiveWell had expanded to seven non-cash-transfer top charities estimated to be in the ~5-15x cash range, with $290 million of estimated short-term room for more funding, and with the top recommended unfilled gaps at ~8x cash transfers. If we combine cash transfers at “100x” and large unfilled opportunities at ~5-15x cash transfers, the relevant “bar to beat” going forward may be more like 500-1,500x.

And earlier this year GiveWell suggested that they expected to find more cost-effective opportunities in the future, and they are staffing up in order to do so.

Another approach to this question is to ask, how much better than direct cash transfers should we expect the best underfunded interventions to be? I find scalable interventions worth ~5-15x cash a bit surprising, but not wildly so. It’s not obvious where to look for a prior on this point, and it seems to correlate strongly with general views about broad market efficiency: if you think broad “markets for doing good” are efficient, finding a scalable ~5-15x baseline intervention might be especially surprising; conversely if you think markets for doing good are riddled with inefficiencies, you might expect to find many even more cost-effective opportunities.

One place to potentially look for priors on this point might be compilations of the cost-effectiveness of various evidence-based interventions. I know of five compilations of the cost-effectiveness of different interventions within a given domain that contain easily available tabulations of the interventions reviewed:

— The Washington State Institute for Public Policy benefit-costs results database (archive), focused on U.S. social policies.

— Two reviews of public health interventions considered by the UK’s National Institute for Health and Care Excellence (NICE).

— The Disease Control Priorities report 2nd Edition (archive), focused on global health interventions.

— The Disease Control Priorities report 3rd Edition (archive), focused on global health interventions.

— WHO Choice results (archive) for the AFR E region (archive), focused on global health interventions.

For this purpose, I was just curious about the general distribution of the estimates, and didn’t attempt to verify any of them, and was very rough in discarding estimates that were negative or didn’t have numerical answers, which may bias my conclusions. In general, we regard the calculations included in these compilations as challenging and error-prone, and we would caution against over-reliance on them.12

I made a sheet summarizing the sources’ estimates here. All five distributions appear to be (very roughly) log-normal, with standard deviations of ~0.7-1, implying that a one-standard-deviation increase in cost-effectiveness would equate to a 5-10x improvement. However, any errors in these calculations would typically inflate that figure, and we think they are structurally highly error-prone, so these standard deviations likely substantially overstate the true ones.

We don’t know what the mean of the true distribution of cost-effectiveness of global development opportunities might be, but assuming it’s not more than a few times different from cash transfers (in either direction), and that measurement error doesn’t make up more than half of the variance in the cost-effectiveness compilations reviewed above (a non-trivial assumption), then these figures imply we shouldn’t be too surprised to see top opportunities ~5-15x cash. A normal distribution would imply that an opportunity two standard deviations above the mean is in the ~98th percentile. These figures would support more skepticism towards an opportunity from the same rough distribution (evidence-based global health interventions) that is claimed to be even more cost-effective (e.g., 100x or 1,000x cash rather than 10x).

Stepping back from the modeling, given the vast difference in treatment costs per person for different interventions (~$5 for bednets, $0.33-~$1 for deworming, ~$250 for cash transfers), it does seem plausible to have large (~10x) differences in cost-effectiveness.

Even if scalable global health interventions were much worse than we currently think, and, say, only ~3x as cost-effective as cash transfers, I expect GiveWell’s foray into more leveraged interventions to yield substantial opportunities that are at least several times more cost-effective, pushing back towards ~10x cash transfers as a more relevant future benchmark for unfunded opportunities.

Overall, given that GiveWell’s numbers imply something more like “1,000x” than “100x” for their current unfunded opportunities, that those numbers seem plausible (though by no means ironclad), and that they may find yet-more-cost-effective opportunities in the future, it looks like the relevant “bar to beat” going forward may be more like 1,000x than 100x.

LESSWRONG
LW

OpenPhil on "GiveWell’s Top Charities Are (Increasingly) Hard to Beat"

17

Americans giving random Americans dollars as "null hypothesis."

What bar to hold donations to?

New to LessWrong?

17