‘GiveWell for AI Safety’: Lessons learned in a week
On prioritizing orgs by theory of change, identifying effective giving opportunities, and how Manifund can help. Epistemic status: I spent ~20h thinking about this. If I were to spend 100+ h thinking about this, I expect I’d write quite different things. I was surprised to find early GiveWell ‘learned in public’: perhaps this is worth trying. The premise: EA was founded on cost-effectiveness analysis—why not try this for AI safety, aside from all the obvious reasons¹? A good thing about early GiveWell was its transparency. Some wish OpenPhil were more transparent today. That seems sometimes hard, due to strategic or personnel constraints. Can Manifund play GiveWell’s role for AI safety—publishing rigorous, evidence-backed evaluations? With that in mind, I set out to evaluate the cost-effectiveness of marginal donations to AI safety orgs². Since I was evaluating effective giving opportunities, I only looked at nonprofits³. I couldn’t evaluate all 50+ orgs in one go. An initial thought was to pick a category like ‘technical’ or ‘governance’ and narrow down from there. This didn’t feel like the most natural division. What’s going on here? I found it more meaningful to distinguish between ‘guarding’ and ‘robustness’⁴ work. Org typeGuardingRobustnessWhat they’re trying to doDevelop checks to ensure AI models can be developed and/or deployed only when it is safe to do so. Includes both technical evals/audits and policy (e.g. pause, standards) advocacyDevelop the alignment techniques and safety infrastructure (e.g. control, formal verification) that will help models pass such checks, or operate safely even in the absence of such checks Some reasons you might boost ‘guarding’: 1. You think it can reliably get AI developers to handle ‘robustness’, and you think they can absorb this responsibility well 2. You think ‘robustness’ work is intractable, slow, or unlikely to be effective outside large AI companies 3. You prioritize introducing disinterested third-party