Lydia Nottingham

‘GiveWell for AI Safety’: Lessons learned in a week

On prioritizing orgs by theory of change, identifying effective giving opportunities, and how Manifund can help. Epistemic status: I spent ~20h thinking about this. If I were to spend 100+ h thinking about this, I expect I’d write quite different things. I was surprised to find early GiveWell ‘learned in public’: perhaps this is worth trying. The premise: EA was founded on cost-effectiveness analysis—why not try this for AI safety, aside from all the obvious reasons¹? A good thing about early GiveWell was its transparency. Some wish OpenPhil were more transparent today. That seems sometimes hard, due to strategic or personnel constraints. Can Manifund play GiveWell’s role for AI safety—publishing rigorous, evidence-backed evaluations? With that in mind, I set out to evaluate the cost-effectiveness of marginal donations to AI safety orgs². Since I was evaluating effective giving opportunities, I only looked at nonprofits³. I couldn’t evaluate all 50+ orgs in one go. An initial thought was to pick a category like ‘technical’ or ‘governance’ and narrow down from there. This didn’t feel like the most natural division. What’s going on here? I found it more meaningful to distinguish between ‘guarding’ and ‘robustness’⁴ work. Org typeGuardingRobustnessWhat they’re trying to doDevelop checks to ensure AI models can be developed and/or deployed only when it is safe to do so. Includes both technical evals/audits and policy (e.g. pause, standards) advocacyDevelop the alignment techniques and safety infrastructure (e.g. control, formal verification) that will help models pass such checks, or operate safely even in the absence of such checks Some reasons you might boost ‘guarding’: 1. You think it can reliably get AI developers to handle ‘robustness’, and you think they can absorb this responsibility well 2. You think ‘robustness’ work is intractable, slow, or unlikely to be effective outside large AI companies 3. You prioritize introducing disinterested third-party

41May 30, 2025

Lydia Nottingham

Message

Why falling labor share ≠ falling employment

Edit (27/01/25): Thank you everyone who gave thoughtful comments and feedback! On reflection, and after reading in 'Deep Utopia' about Keynes' 1930 prediction that we would each work 15-hour work-weeks today[1], I think it's both likely and desirable that humans will work less hours/week in future. That would be 'falling...

Jan 16-5

We need to make ourselves people the models can come to with problems

Suppose the models to be sophisticated consequentialist reasoners.1 Sometimes, it’s hard for consequentialist reasoners to coordinate with outside parties running different algorithms. Take Agent A. A has the option to do some thing that A perceives advances both A and B’s interests. Still, A is unsure if B would approve...

Jan 1421