The case for AI safety capacity-building work

abergal

TL;DR:

I think many of the marginal hires at larger organizations doing AI safety technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding (or being early employees of) organizations focused on building capacity in AI safety, and would have more impact by doing so.
I think the impact case for this kind of work is supported by first-principles arguments (the multiplier effect), larger-scale survey work my team at Coefficient Giving has done, and many individual conversations we’ve had with people who are working on AI risk, which suggest that past capacity-building work has had large and predictable effects on people working on AI safety now.

Cross-posted from Multiplier

I work on the capacity-building team on the Global Catastrophic Risks-half of Coefficient Giving (formerly known as Open Philanthropy). Our remit is, roughly, to increase the amount of talent aiming to prevent unprecedented, globally catastrophic events. These days, we’re mostly focused on AI, and we’ve funded a number of projects and grantees that readers of this post might be familiar with– including MATS, BlueDot Impact, Constellation, 80,000 Hours, CEA, the Curve, FAR.AI’s events, university groups, and many other workshops and projects.

The post aims to make the case that broadly, capacity-building work (including on AI risk) has been and continues to be extremely impactful, and to encourage people to consider pursuing relevant projects and careers.

This post is written from my personal perspective; that said, my sense is that a number of CG staff and others in the AI safety space share my views. I include some quotes from them at the end of this post.

I’m writing this post partly out of a desire to correct what I perceive as an asymmetry in terms of how excited I and others at Coefficient Giving are about this kind of work vs. how much people in the EA and AI safety communities seem excited to work on it. The capacity-building team is one of three major teams working on AI risk at Coefficient; we currently have 11 staff, which is ⅓ of the total AI grantmaking capacity, and gave away over $150M in 2025. I started my stint at Coefficient Giving in 2021, working half-time on technical AI safety grantmaking and half-time on capacity-building grantmaking; among other reasons, I ultimately switched to working full-time on capacity-building, because my sense was that team was several times (maybe an order of magnitude) more impactful. Things seem somewhat different to me now (I think the set of opportunities in technical AI safety grantmaking looks significantly better than it did in 2021), but my sense is capacity-building as an area of work is still massively underrated relative to its impact.

The case for capacity-building work

The naive case for this kind of work (often called the multiplier effect argument) goes something like this: say you can spend a little time doing direct work yourself, or spend that same amount of time getting one of your equally talented friends into direct work for the rest of their life. Getting your friend into direct work is most likely the more impactful option, because you get to “multiply” your lifetime impact (in this case, by almost a factor of 2) by getting a whole additional person to spend their career on work you think is important.

In fact, whether this argument goes through depends on a few premises: namely, how good the direct work you would have done would be, and how tractable it is to convince others who are similarly talented to you. I’m going to skip over the first premise for now (and attempt to address it in a later section) and present evidence that our team has collected over the years that makes me think that this work is very tractable– and in particular, that there are easy-to-execute interventions that reliably influence people’s career trajectories in substantial ways. A priori, you might think that people’s career choices happen randomly and chaotically enough that it’s difficult to make a substantive impact trying to change what people work on. But in fact, both anecdotal evidence we’ve observed and larger scale data collection we’ve attempted (both presented below) suggest that intentional efforts make a big difference to individual career trajectories (including the career trajectories of individuals who go on to do highly impactful work). I think that core stylized fact makes up the main case for why capacity-building work is worthwhile.

I will briefly note that while the below case is focused on successes from capacity-building, I do think this work has the potential for harm, though my overall view is that efforts in this space executed by thoughtful, high-context individuals will be very positive in expectation. I briefly discuss this in this appendix.

Surveys

In 2020 and 2023, our team ran two similar, in-depth surveys where we asked low-hundreds of people currently working on (or relatively likely to work on) impactful GCR work what influenced their career trajectories. Survey respondents included employees at AI labs, staff at key technical, policy, and capacity-building organizations in AI, and promising-seeming early career individuals. The aim of the surveys was to provide some evaluation of the impacts of the grants our team had made, as well as to generate some evidence informing Coefficient Giving’s views on capacity-building work as a whole.

The survey used a variety of prompts to elicit evidence from respondents about what had influenced their career choices. One of the sections asked respondents to unpromptedly list the top 4 influences that they thought were most important to their current career trajectory (these included things like “my partner”, “inherent curiosity”, etc).

In 2023, 60% of respondents listed a capacity-building program or organization that our team was funding in their top four influences, with the most common being university groups (listed by 25% of respondents), 80,000 Hours (listed by 20% of respondents), and EAG/EAGxes (listed by 12% of respondents).

See the table below for a longer list of the commonly listed influences, sorted manually into (somewhat subjectively decided) buckets. Note that:

There are various reasons to think the self-reports that generated this table may be skewed or non-representative-- survey respondents were sourced in an ad-hoc way, partially from organizations doing capacity-building work themselves, and respondents may have been primed to think about Coefficient Giving-funded programs or organizations, given that we were the ones administering the survey. (In our own use of this data, we try to correct for some of these effects.)
Crucially, this survey was conducted in 2023, and primarily captured effects from 2020 - 2022, i.e., shouldn’t be taken as up-to-date evidence about these influences, or about what influences have the biggest effects now (though I think many of the ones listed above continue to have very sizeable effects).

Unprompted item	% of respondents who listed as top-4 influence (in 2023)	Count (of 329)
University group	25%	82
80,000 Hours	20%	66
EAGs/EAGxes	12%	38
Eliezer's writing	11%	37
Broad group	7%	22
Will MacAskill's writing	5%	17
Lightcone	5%	15
- LessWrong	4%	12
Peter Singer's writing	4%	14
Open Philanthropy	4%	14
Bostrom's writing	4%	12
Toby Ord's writing	4%	12
EA Forum	3%	11
Redwood	3%	9
- MLAB or REMIX	2%	7
FHI	3%	9
Scott Alexander's writing	3%	9
FTXF	2%	7
ESPR	2%	7
GCP	2%	7
CEA	2%	6
SERI MATS	2%	6
Atlas Fellowship	2%	6
AGISF online	2%	5
Cold Takes	2%	5
GPI	2%	5
Rethink Priorities	2%	5

Testimonials

I’m not able to share the individual free-write responses from the survey above, but I recently personally asked some individuals who I think are doing high-impact work to tell me how they came to be doing that work, followed by what they thought the most important or counterfactual influences on their trajectories were.

Below, I include Claude summaries of their overall stories along with their description of the most important influences, lightly edited. Some notes on the testimonials I've included:

They're obviously to some extent cherry-picked by me, and are meant to give a flavor of the kind of data we've seen, rather than a faithful representation of all the ways people tend to come to be doing this work.
I chose to include individuals who started doing AI safety-relevant work relatively recently (within the last 5 years), but who I think are doing at least somewhat legibly impactful work now. This includes many people who got involved in 2022 or earlier, and similar to the survey data above, I would advise against directly extrapolating the effectiveness of the exact influences they discuss from that time period, though I think the broad classes of influences (university and local groups, certain content and works of writing, programs and events) continue to be very impactful today– see below.
- Among other effects, I think prior to 2023, many more people doing impactful work in the AI safety space got involved via effective altruism rather than directly via AI safety-- nowadays, I think it's more common that people encounter AI safety right away.

Neel Nanda (Senior Research Scientist at Google DeepMind)

“Here's a list of the salient influences on me:

When I was 14, I read Harry Potter and the Methods of Rationality, and via this came across LessWrong and Effective Altruism, and got generally curious about these spaces.
When I was 17-18, going to ESPR, absorbing a bunch of ideas about ambition, agency, thinking more clearly, and getting various in-person ties to the community.
At uni [Cambridge], hanging out with the Effective Altruism group, forming friends via it. Meeting the EA community, especially outside of Cambridge, and generally absorbing the culture more and getting more of a sense of "maybe I should do something about this career-wise."
- By default, I was pretty sure I was going to go into finance while being highly uncertain, and considered AI safety kind of weird and vague.
I had a call with 80,000 Hours in my second year of uni (out of a three-year degree), where my main two updates were:
- I was being way too perfectionist about AI safety careers, and I should just try to gather information rather than trying to figure out if I was highly confident I wanted to do this for the rest of my life.
- Also, connections with various people actually working on this stuff, including empirical work at labs, which gave me a much crisper sense of what this could actually look like. (I actually already knew some of the people I was connected to, just lacked the agency/inspiration to reach out myself)
After I finished my undergrad, I was planning on doing a master's, but this was in 2020, so that was pretty unappealing because of COVID. I came close to accepting a Jane Street full-time offer, but instead decided I was being too risk-averse and I should instead gather information by doing a year of back-to-back AI safety internships.
I then interned at FHI doing AI safety theory stuff, which was kind of useless impact wise. Then DeepMind doing some empirical but not particularly impactful work - this was high value for just giving me a much more legible sense of "this is a career path" and feeling more like a tangible thing that I could learn and do, and where I was learning real skills. And CHAI, which was also a bit of a mess due to being remote + 8 hour time difference, and not being a great fit.
- I think the key salient updates from this year were just thinking a lot more about AI safety, talking to real people in the space, and getting a much more visceral sense of "something big is happening here, I can be involved, I can help, and I have realistic job options."
Then I got a job offer from Anthropic, decided to accept, had a fantastic mentor in Chris Olah and discovered that mech interp was a great fit, and from then on it was pretty overdetermined that I was going to stay in the AI safety space.”

Max Nadeau (Associate Program Officer (Technical AI Safety) at Coefficient Giving)

Claude’s summary:

Max got it into his head in high school that human-level AI was coming during his lifetime and that it was important to make sure the process went well, but he had no idea anyone was working on it. In college, he got connected with Stephen Casper, where he learned practical ML skills, and to someone who connected him to the people running the Impact Generator retreat [Asya note: this was a small GCR-focused workshop series run in the Bay in 2022], which he was later invited to. He talked to Tao Lin at that retreat, and Tao offered him a TA position at the ML bootcamp Redwood was running, with three weeks to learn the material. He thought he'd be in the Bay for three days, but stayed six weeks. TA'ing turned into an internship at Redwood, which he took a semester off college to do. While interning he got to know Ajeya, and by the time he graduated she offered him a job.

Max on what was most important:

"Going to the Impact Generator workshop. That was, like, extremely random. And, like, resulted in a major acceleration to my career."
"I think, like, getting connected with the existing AI safety community [at Harvard] in the first place was really counterfactual. I went from, like, having some vague sense of, like, maybe this is something I want to do, to oh, there are already people working on this issue, and they have a whole way of thinking about it."

Rachel Weinberg (founder and former head of The Curve, currently at AI Futures Project)

Claude’s summary:

Rachel got into effective altruism in high school through friends, and started a group at her university. She spent some time interning running retreats and ended-up helping with Future Forum, a futurism conference that required a last-minute venue switch. She took a semester off to study AI safety, but decided she wasn't interested in research, and did web dev for a while. After running Manifest 2024, she started The Curve, and is now working on other field-building projects.

Rachel on what was most important:

“(obviously) Getting into effective altruism in the first place via friends (Nick Gabrieli in particular).
Helping with Future Forum, which came from meeting Leilani at Impact Generator [Asya note: this was a small GCR-focused workshop series run in the Bay in 2022] and then from her agreeing to take on the project last minute (which I encouraged at the time, but some people were against). I at least benefited a lot from being on a small, messy team where it was easier to stand out by being eager to take on more responsibility and following through.
Deciding to run The Curve, which was largely dependent on my ~personality (tbh Austin gets counterfactual credit for pushing my confidence/conviction to the threshold of being willing to take that kind of risk), but was also very inspired by having seen Future Forum bootstrap. I also probably wouldn't have been/felt qualified to do this if I didn't live in the Bay, maybe SF in particular.”

Marius Hobbhann (CEO and founder of Apollo Research)

Claude’s summary:

During his first week of university in 2015, someone handed him Superintelligence. He studied cognitive science, did a CS bachelor's in parallel, then a machine learning master's and PhD to prepare for AI safety work. In 2022 he started doing AI safety research on the side with a grant from the Long-Term Future Fund. He paused his PhD, did MATS in early 2023, concluded that deceptive alignment was the biggest problem and that no one was doing evals for it, and started Apollo, which he’s been running since.

Marius on what was most important:

"In the early days, I would say it was people who were already considering AI safety back then. These were the kind of people I looked up to and who also nudged me towards doing AI safety because they also thought it would be super important.
The personal grants were super important for me too. Because it basically meant there was no excuse anymore. I really wanted to do work on AI safety, but there were always questions like: Is this financially responsible? And what about having a stable career? And at the point where you have a grant, these concerns didn’t seem like a big deal anymore. And it also was a motivational boost, of course, because someone else thought I’m good enough to bet some money on.
MATS was very impactful. Apollo definitely wouldn't exist without MATS. That’s where I had the time to develop an agenda, do a lot of experimentation and find great starting members.
There were also just a bunch of people in the Bay who I talked to who suggested it would be a good idea to start Apollo and give it a try, even if there's a good chance that it doesn't work out. For example, Evan Hubinger, my MATS mentor, was supportive and helpful.
Oh, and then the AI safety philanthropic ecosystem. That’s how we received our starting capital and which allowed us to try Apollo.

Adam Kaufman (member of technical staff at Redwood Research)

Claude’s summary:

Adam knew from an early age that superintelligence would be scary if someone built it, but assumed it wasn't going to happen in his lifetime. When he got to college, he joined the AI Safety Fundamentals reading group that the Harvard AI Safety group (HAIST) was running, thought the people were extremely cool, and made most of his close friends there. He became increasingly convinced the problem was urgent as language models kept getting smarter. He met Buck Shlegeris at a HAIST retreat, talked to him, and applied to MATS. He did MATS at Redwood, enjoyed it so much he took time off school, and has been working there since.

Adam on what was most important:

“Definitely I think that being around a community of smart people who were planning to do careers in AI safety, were convinced that this was a really important problem, was probably necessary for getting me to like seriously consider that I should work on this myself. [...] I think HAIST (the Harvard AI Safety Team) was probably fairly counterfactual for me. It's like if that club didn't exist, I imagine I would have just been more depressed and confused about what I should do."
"I think that talking to Buck in a hot tub once [at the HAIST/MAIA retreat] was probably counterfactual for getting my current job."
"Definitely having the opportunity to intern at Redwood [through MATS or otherwise] was necessary for me [taking a full-time job there]."

Gabriel Wu (member of technical staff (alignment) at OpenAI)

Claude’s summary:

Gabe was given a copy of The Precipice when he started as a freshman at Harvard. There was no formal AI safety team at the time, but a group of 7-10 people would gather weekly to talk about x-risk in a dining hall, so he joined, and ended up going to a long workshop in Orinda [California]. He did REMIX [Asya note: this was a mechanistic interpretability bootcamp] the following winter, which introduced him to the Constellation community, and then applied for a Redwood internship for the next summer. After others graduated, he became the new director of HAIST (the Harvard AI Safety Team). He worked with the Alignment Research Center, applied to labs, and was eventually convinced by several people to join OpenAI.

Gabe on what was most important:

"I think a huge part of it was being identified by [students at HAIST (the Harvard AI Safety Team)]. They made sure to bump me to apply to things. It really felt like they believed in me and wanted to make sure that I didn't get lost along the way. And I think that was pretty counterfactual because it made me a lot more likely to end up doing REMIX and so on. [...] The fact that HAIST itself existed was a big part of it."
"Another thing I mentioned was just like, having the opportunity to visit Constellation."

Catherine Brewer (Senior Program Associate (AI Governance) at Coefficient Giving)

Claude’s summary:

Catherine found 80,000 Hours before university through internet searching about careers, then read Doing Good Better. They engaged with the Oxford effective altruism university group, going to events and helping run programming. Through the group they made friends who were into AI safety and argued with them a bunch, which got them interested in AI safety. They applied for the ERA fellowship (then called CERI) after someone from the group told them to, and spent a summer thinking about AI safety with other people. Then they did the GovAI fellowship, which they found even more helpful, via meeting people and developing her own takes on relevant topics. After that they were interested in AI governance, and applied to Open Philanthropy when they were graduating.

Catherine on what was most important:

"Maybe just having a bunch of people in Oxford who are already thinking about AI safety a bunch... that feels like contingent and could have easily not been the case and maybe like it sped me up by taking AI safety seriously by like six months or something. But then that led me to doing the summer fellowship."
"I think the GovAI summer fellowship was super helpful. I guess it's just a lot of time to spend with lots of other people doing work on the thing. And I think I had a better network then and also just had more time to be like, what are people actually doing? What are they working on? And maybe improve my thinking somewhat."

Aric Floyd (video host for AI in Context)

Claude’s summary:

Aric found GiveWell by Googling for the most effective charities in his late teens, but didn't find the broader effective altruism community until 2020, when a friend found an online student summit that CEA ran. He knew the people who led the Stanford effective altruism group, but never had time to get involved, and was then invited by those people to help with some community-building efforts at MIT. He was also invited to Icecone [Asya note: this was an AI-risk-focused workshop run in 2022], and came out of it persuaded that AI safety was a big deal, but less convinced that theoretical alignment work was the way to proceed. He did a bunch of short sprints of community-building work and met Chana Messinger while teaching at the Atlas Fellowship, and later the Apollo program in the UK. When 80K started thinking about video production, Chana brought him on because they'd worked well together before, and because Aric had prior experience in film & television acting. Aric had previously been encouraged by [experienced EA leaders / Will MacAskill, among others] to do public-facing content creation, and decided to give it a shot.

Aric on what was most important:

"Definitely specific reach outs from people who are much more embedded in the community. So, like... getting to do like a call with Will early on was cool and I think made me feel much more like, oh, I actually maybe have a niche where I could contribute significant value to this community because otherwise this person wouldn't be paying attention to me. People at Stanford asking me in particular to come help."
"Icecone was obviously, like, a bigger event, but the amount of resources put in per attendee was also kind of absurd, and so also felt like a big, costly signal of, like, it is actually worth it for you specifically to think about this stuff. After Icecone, I was pretty bought in on I should do something about this with my life."

Ryan Kidd (Director of MATS)

Claude’s summary:

Ryan read HPMOR and LessWrong in high school, but he didn't anticipate near-term AGI until rediscovering the idea through effective altruism around 2020. He co-organized the effective altruism group at the University of Queensland during his physics PhD, where his interest in catastrophic risk evolved from climate change activism to nuclear winter modeling to AI risk after reading The Precipice. He completed the first AI Safety Fundamentals course, applied unsuccessfully to FHI and CLR, then did the SERI MATS pilot program. He attended Icecone [Asya note: this was a AI-risk-focused workshop run in 2022] in Berkeley, where he met Holden Karnofsky, Ajeya Cotra, Buck Shlegeris, and many future colleagues. While completing the MATS research phase with John Wentworth as his mentor, he sent the co-organizer a document explaining how he would improve the program and got invited to join the organizing team. He's co-led MATS with Christian Smith since late 2022.

What Ryan says was most significant (in order of importance):

“University effective altruism group: introduced me to ITN framework, AI safety, and a community with values I endorse; gave me project management and field-building experience.
The Precipice: convinced me that AI was the most pressing x-risk and I should work on it now.
Icecone: brought me over from Australia; connected me with the top experts, funders in AI safety; empowered me to scale MATS, LISA.
HPMOR: exposed me to the concept of 'heroic responsibility' and Eliezer Yudkowsky thought; introduced me LessWrong, the Sequences, and later ACX.
SERI MATS online reading group exposed me to Paul Christiano, Evan Hubinger, and John Wentworth thought; empowered me to do MATS research phase in Berkeley, which kicked off my career."
CLR application: exposed me to Jesse Clifton thought and deepened my understanding of Nick Bostrom, Anders Sandberg thought, all of which have been very influential to my work at MATS, etc.
SERI MATS research phase: gave me space to think deeply and read widely about AI safety, which was crucial to scaling MATS."

What tends to work?

While some of the interventions affecting people’s career trajectories are fairly idiosyncratic, we’ve noticed a few broad categories that tend to be impactful on people’s careers (many of which are featured in the testimonials above).

Content: Books, blogs, videos, or other content– e.g. the works of Yudkowsky, MacAskill, Singer, Bostrom, Ord, Rob Miles, Scott Alexander, Kelsey Piper, and 80,000 Hours.
- Note that while the most successful content has clearly been extremely influential, in our experience content production is fairly heavy-tailed– i.e., most individuals producing content should expect that it won’t achieve a ton of reach.
- Recent popular content includes a lot of AI safety-specific work focused on broad audiences, including Situational Awareness, AI 2027, AI in Context, and If Anyone Builds It, Everyone Dies.
Groups: University and local groups (at the national or city level), historically largely focused on AI safety or effective altruism, have been very impactful according to our data, and we suspect other group types (including those inside of companies or focused on specific professionals) would also do well.
Upskilling programs: Courses, fellowships, bootcamps (often in-person, but sometimes online)-- e.g. BlueDot’s online programs, MATS, ARENA, Tarbell, and many other similar programs.
Events: Conferences, workshops, retreats– e.g. EAGs, FAR AI’s alignment workshops, The Curve, GCP’s workshops, ESPR.
- I think one way these tend to have impact is through giving people who are newer to a relevant space the opportunity to interact (ideally one-on-one) with professionals or those with more expertise (see Testimonials above).

Notably, unlike content, in our experience programs and events can have a sizable impact even if they don’t meet an exceedingly high-quality bar, making them a good bet for a wider range of people to work on. Generalizing from anecdotes, I speculate that programs and events (especially in-person ones with other participants at a similar point in their careers) often have the effect of causing someone to take changing their career more seriously as a possibility, whereas previously they had been engaging e.g. online in a fairly abstract or detached way.

[Others:] While the above constitute a sizeable fraction of the kinds of effective work we see often, there are many other impactful interventions (e.g. LessWrong and other discussion platforms, career advising like 80,000 Hours’s, coworking spaces like Constellation) that don’t fit cleanly into the categories above.

What’s good to do now?

Our recent request for proposals gives some examples of the kinds of projects we’d be interested in seeing on the current margin. Briefly highlighting some specific things that I or others on my team think would be good, based on our sense of both what’s worked in the past and the current AI risk landscape:

More high-quality written or video content about AI risk, especially the kind that might reach new audiences
Retreats connecting promising university students to professionals working in the AI safety space
Bridge-building events (similar to the Curve) bringing together thoughtful people in different camps around AI
Introductory AI risk workshops designed for elite audiences (policymakers, journalists, academics, etc.)
Larger-scale AI risk-specific events featuring newcomers, similar to EAG
Bay-Area based AI risk programming for mid-career professionals

Who should be doing this work?

The above makes the case for why you might think capacity-building work is valuable, but doesn’t in itself provide a point of comparison for what someone could be doing otherwise, (namely direct work, which itself could have its own capacity-building benefits, e.g. by creating evidence that there’s important work to be done in an area).

I don’t have a rigorous method of comparing the value of potential direct vs. CB interventions, and I think there’s room to make a variety of plausible cases. That said, I will share my intuitions, as well as the intuitions of some others at Coefficient.

I generally encourage people to think about their career choices at an individual level, but from an overall talent allocation perspective, my current take is that many of the marginal hires at larger organizations doing technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding or being an early strategy-setting employee at a top capacity-building organization, and would have more impact by doing so.

I think individuals who are most well-suited to capacity-building work are those who are (some subset of) entrepreneurial, socially skilled, operationally strong, or strong communicators in the relevant subject areas. I think work running programs or events is particularly loaded on the first three of these, whereas e.g. producing content is much more loaded on the last.

What would doing this work look like?

If you think you might be someone who should plausibly be doing capacity-building work, here are some things you could consider:

Working at an organization doing good work in the space

There are a number of actively-hiring organizations that I think are doing impactful capacity-building work (see some of them in this filtered 80K job board), but here I’m going to plug some organizations where I feel a strong hire could be particularly impactful.

If you think you might be interested in any of the below but are on the fence, you can DM me or fill out this form and I’ll aim to take an at least 15-minute call with you (and longer if it seems useful; up to a limit of 20 such calls).

Constellation - CEO

Constellation is a research center and field-building organization located in Berkeley, California, that hosts a number of organizations and individuals doing impactful work in the AI safety space. In addition to running the space itself, it’s historically run programming through the space, including the Astra Fellowship, the Visiting Fellows Program, and a number of one-off workshops and events.

Given the dense concentration of high-context talent working there, I think Constellation has huge potential to be impactful both as a convening place for people doing this work, and as a host of a number of programs and events, including (potentially) ones aiming to engage policymakers, AI lab employees, and other high-stakes actors relevant to the AI space.

Constellation is looking for a new CEO who I expect to be the primary individual setting Constellation’s strategic direction. I think that position will be extremely impactful and I'd like them to get a strong hire.

Kairos – various early generalist positions

Kairos runs SPAR, a remote AI safety research mentorship program, provides advice and monetary support for AI safety university groups, and has taken on running workshops for promising young people. I think there’s massive amounts of evidence about the effectiveness of all three of these interventions (some of which you can see in the testimonials above), and I think university groups and workshops for young people in particular are (still) extremely neglected relative to their historic impact.

I think Kairos has a very strong leadership team and important, neglected priorities (plus, Agus is a great Tweeter), and I think it would be very impactful for them to have early hires who are strong generalists that could own priority areas-- they plan to open multiple new hiring rounds very soon, and you can fill out their General Expression of Interest form to be added to their potential candidate pool for those roles.

Starting or running your own capacity-building project or organization

Our team is always accepting applications for funding. This section above as well as our request for proposals describes some kinds of projects in AI capacity-building that we might be particularly excited to fund, but I also encourage people to form their own views about what might be effective and not anchor too strongly to past work.

Working on a capacity-building project part-time

We’ve seen a lot of successful capacity-building work start or run completely by people or organizations doing it on the side of their day-to-day work, including MATS (which was started by full-time Stanford students), a number of impactful workshops and events, and a lot of widely-read public communications.

Subscribing to Multiplier, a Substack with thoughts from our team (and other AI grantmaking staff at CG)

Letting our team know

If you think you might be interested or a good fit for this kind of work, but aren’t sure where to start, we would love it if you let us know by filling out this very short expression of interest form. We’ll reach out if there are projects or opportunities on our radar that we think might be a particularly good fit for you. (Note that we don’t expect to reach out to most respondents).

Social proof

This post is coming from my personal perspective, but my sense is my position here is directionally shared by at least some at CG and elsewhere in the AI safety space. I asked a few people who were not working on capacity-building, but I felt had substantial context on capacity-building efforts, to share their takes below:

Julian Hazell, AI governance and policy at Coefficient Giving

“As I've written about before, I'm really into capacity building.

Funny enough, a Coefficient Giving career development grant and the GovAI fellowship were very important inputs into my current career trajectory. I probably would've eventually found my way into AI governance work regardless, but these programs jumpstarted my career and turned me into a useful contributor much faster than I otherwise would've been.

On the grantmaking side, I funded a number of projects where capacity building was a core part of the theory of change, and I've seen results that have been genuinely exciting.

If I could wave a magic wand to reorganize talent allocation in the AI safety community at my whim, I'd move a decent number of people currently in research and policy roles into capacity building. I think it's that underrated.”

Trevor Levin, AI governance and policy at Coefficient Giving

“I co-sign this post. There's so much to do to make the world more ready for transformative AI, and the ecosystem is full of projects that need a founder or are a couple more great hires from being much more impactful. We desperately need more talented and motivated people to keep showing up. Also, for me and I think for many others, the work can be deeply rewarding -- it often has more social contact and shorter feedback loops than other types of work.”

Ryan Greenblatt, Chief Scientist at Redwood Research:

"I agree with Asya's post and think that capacity building work is underdone and underrated. One delta is that I would emphasize the importance of capacity building type work by people who are doing object level work in the field. Both that I think that doing object level work is complementary to capacity building but also that people doing object level work should spend a larger fraction of their time doing/helping with capacity building."

Buck Shlegeris, CEO of Redwood Research

Asya: I'd broadly be interested in you giving your take on the kind of work that my team funds.

Buck: I don’t know the current distribution.

Asya: Our biggest grantees are MATS, CEA, Constellation, BlueDot, LISA, Tarbell, 80K, FAR AI's events, a bunch of university groups, and a bunch of other stuff.

Buck: Many of those seem pretty good. I think that overall, trying to do capacity building where you try to cause people to think through a bunch of issues related to transformative AI, especially having people with scope-sensitive beliefs relate to it-- I think that kind of work has gone quite well historically and put us in probably a much better position than we'd be without it. I'm excited for that work happening on the margin and I feel like every year we're somewhat better off because of capacity-building that was done that year or the previous year. Or like projects done by those organizations. That all seems great.

Asya: A claim I make in my post is that ‘many of the marginal hires at larger organizations doing technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding or being an early strategy-setting employee at a top capacity-building organization, and would have more impact by doing so.’ I'm curious for your immediate takes on that proposition.

Buck: I don't know how many of them have that capability. I think if they have that capability, they should strongly consider doing so.

Maybe something is like-- I think MATS and Redwood represented two different kinds of philosophies on how to increase the technical AI safety research done. And I think it's very unclear which one-- I think MATS looks at the very least competitive. It's been involved in the production of a huge amount of AI safety research that I'm happy exists. And a heuristic that would have suggested you shouldn't work on MATS early seems to have gotten wrecked by posterity.

Asya: Cool, those are the main questions I want to ask you. Any other commentary you'd want to include here?

Buck: Capacity-building work seems good. I encourage Redwood staff to participate in capacity-building work; I think it's worth their time on the margin. I'm going to be involved in a bunch of it myself.

Appendix

My post in large part focuses on the case for successes from capacity-building, but I do think there are a number of mechanisms through which work in the capacity-building category can do harm, e.g. by misrepresenting key ideas to broad audiences, alienating people who would otherwise have been sympathetic to this work, or empowering individuals who ultimately make the ecosystem worse. While I think these effects are real and material, my overall view is that the negative impacts in the space have likely been substantially outweighed by the positives, and my expectation is that most efforts in this space executed by thoughtful, high-context individuals will be very positive in expectation, such that I feel good about publishing broad encouragement to pursue this work on the current margin.

Without going into detail, my intuitions here come from an overall assessment of the work done by global catastrophic-risk focused groups over the years, which my personal best guess is have been very positive on net, even accounting for substantial negatives (e.g. the actions of Sam Bankman-Fried). That said, I’ve heard a number of arguments for why that may not be the case, or for why certain large classes of efforts may have been disproportionately harmful, which I largely won’t cover here– ultimately, addressing these is not the main focus of this post, and if this feels to you like a major crux around your views on this kind of work, I encourage you to come chat with me about it in-person sometime.

I will briefly say that I think it makes sense to think about capacity-building work on the level of individual interventions affecting specific groups of people, and that I think being skeptical of certain work is compatible with being excited about others-- given that this work is (according to me) very high-leverage, I'd encourage even broadly skeptical individuals to think about whether there are specific interventions that it would make sense for them to pursue.

It seems likely to me that maybe 50% of people who start seriously studying or working on AI safety in the next year will be below the intelligence escape velocity, where they forever lag behind frontier AI systems in AI research ability. If I were working in capacity building, I would already start to deprioritize the earlier parts of the funnel for this reason. For reference, my TEDAI timelines are 2028.

Very little of the impact of people working in AI Safety is downstream of their research, so this seems wrong. Maybe you also think the same will be true for policy/communications/evals work?

Separately, I don't super buy this even for research. Capacity-building work attracts the smartest people, and those people will have been using the models to upgrade themselves anyways. It's not the case that you need to have 5+ years of research to usefully contribute, especially with models providing uplift to fill in the gaps.

Very little of the impact of people working in AI Safety is downstream of their research, so this seems wrong.

Where do you think the impact comes from? And is this coming from a background belief that most current alignment work is useless?

Most of the impact comes from helping people understand better what's up with AI, what the stakes of the whole situation are, what might be ways we could mitigate the risk (which at the moment mostly don't look like technical safety research, though the effect of that is not zero), and generally helping people grapple with what's going on. Another big avenue for positive impact is building tools, or deploying AI systems, in contexts where they might help end the acute risk period (e.g. via facilitating coordination among nations, or companies or politicians).

I do think most current alignment work is useless, but that's actually not really upstream of my beliefs here. Most work in any field is useless, and it would be surprising if alignment would be any different. I do think marginal technical alignment work is quite unlikely to make a big differences, though there are some things that I do think are worth quite a bit of investment, though they seem relatively neglected (and people have a surprising skill to take pointers to promising approaches and then co-opt them to mean something else in an attempt to recruit talent or investment or field off political attacks, so pointing at them is quite tricky).

Even if new people's impact isn't downstream of research, the inability to contribute to research significantly hinders their progress, because potential mentors and collaborators won't benefit from working with them, and they generally get fewer opportunities to advance their careers.

Yeah this argument applies much less to policy/communications work.

I agree that you don't need 5+ years of research to conntribute, but I expect that at some point soon, if you don't have some minimum amount of context on AI safety, your job (managing a bunch of LLM agents) can just be automated by an LLM agent.

and they generally get fewer opportunities to advance their careers.

I don't know what this means? Like, they won't get invited to talk to policy makers? Or won't be successful writers informing people about the risks? Are you talking about lab employment here?

I agree that you (hopefully) won't get that many new lab employees out of recent recruitment efforts, but that's a plus not a minus!

One additional reason that capacity-building for AI safety seems good right now, is that very soon I expect there to be a lot more funding available for AI safety work, from Anthropic donors (see Front-Load Giving Because of Anthropic Donors? and this comment of mine) as well as a broader societal wakeup about risks from AI.

When money becomes more available, the bottleneck becomes "good opportunities/people to spend money on", which is what capacity-building produces. Also starting asap seems important -- capacity-building takes time to set up and bear fruit, and some kinds of capacity building have snowball-y effects (eg MATS).

I guess one consideration for prioritizing direct work over capacity-building is AI timelines. Do you think this should be an important consideration?

I think it bites somewhat on work engaging with much younger age groups (i.e. high school students). Outside of that, I think empirically the turnaround time I've seen between people getting engaged and doing useful work is really short in many cases (like < 1 year, even for some university students), such that in even very short timelines, most capacity-building work still looks very good.

The turnaround time is sometimes < 1 year for high-school students!

I think a major blocker to this kind of thing is that people feel like 'it's not a real career' and worry what would happen if they tried to leave, or just didn't see success in their fieldbuilding startup.

IMO this is very incorrect above a certain threshold of ability, especially for people already working in EA or AIS technical/policy/generalist roles. But it would be very helpful if your team could offer some stronger guarantees to these people!

Here's one basic idea (common and probably far from optimal): 'failed-fieldbuilding-attempt insurance' - For people you think should do this, you agree to give a 5 year stipend of 2-5k/month if they try & fail & can't find another decent job. Likely you wouldn't even have to pay this out much, because most people that you're excited to see try fieldbuilding are IMO incorrect about not being able to transition back. So in practice, you'd give them the stipend for a few months before they found a new job. And many of them would actually succeed & you'd pay nothing!

My big concern is, to what extent do capacity-building efforts create pipelines into AI capabilities work that shortens timelines? This happens to at least some extent, but I don't know how big an effect it is.

See Ryan Kidd on How MATS addresses “mass movement building” concerns. One useful-ish framing from that post is is that the safety is more neglected than capabilities, so you should expect the marginal FTE on safety to add more than the marginal FTE on capabilities.

I'll make up numbers to illustrate the point: let's say there are 10,000 capabilities researchers and 1000 AI safety researchers — that's a rough 10:1 ratio. Now say MATS funnels 500 people into capabilities and 500 into AI safety. Now it's 10,500:1500 ratio, or 7:1, which is much better. I think 50% of the MATS mentees going into capabilities is very pessimistic.

According to the MATS Alumni Impact Analysis, the ratios ares 78%:

49% are "Working/interning on AI alignment/control."
29% are "Conducting alignment research independently."
1.4% are "Working/interning on AI capabilities."

If we (perhaps unfairly) ignore the independent researcher, then that's 3 to capabilities for every 100 to safety!

This also ignores the effects of having capabilities researchers more clued up / familiar with AI safety — you might think this is net-positive (bc they are marginally more likely to update on new evidence and pivot to safety) or net-positive (bc they misuse our technical insights). I think this probably shakes out to be net-positive.

But if locally robust alignment makes faster progress on capabilities without being a version of locally robust alignment that can become asymptotically robust alignment and give us a win node, then it's not only not as obvious, but might well be primarily made of backfire. I don't believe the most extreme version of this claim, but I used to, I know people who do. The "people going into capabilities" is also including "local alignment that doesn't generalize to asymptotic alignment".

The naive way to get asymptotic, and possibly the only way, is to have locally robust alignment that is reliably always ahead of impactfulness of deployment. But if impactfulness of deployment is always operating at 110% of local robustness, that seems like a recipe for disaster. So the question is ultimately one about how to stay in the local validity window indefinitely, even as that local window gets harder to be sure we've maintained.

Also there's the whole thing about how alignment is fundamentally about figuring out what to ask for, and being robust about that is a confusing question at best. Most of MATS seems too focused on the (very hard!) problems of achieving local-alignment-good-enough-to-keep-going to spend the necessary effort to succeed at the really hard parts later (which is not obviously a mistake on MATS's part, but might well be).

It's much easier to get reliable robustness for local alignment when you don't have to solve "what should a thing overwhelmingly smarter than us do?" yet, but if there's a hole coming where a thing moderately smarter than us has the ability to confuse us into choosing a plan that seems good to it but which is not what we want, we may fall off the manifold we would have wanted to be on if we had been able to see things coming without getting confused.

I now think the net-neutral ratio of new median safety to capabilities researchers is probably closer to 1:100 (though I think 1:10 isn't crazy). However, the longitudinal MATS career data indicates our alumni are employed in safety to capabilities roles 8:1.

Loosely construed, the foundings of DeepMind, OpenAI, and Anthropic were the result of x-risk capacity-building. If you count those (and it's not clear that you should), then capacity-building has probably been strongly net negative to date.

I’ve heard variants of this argument, and I overall haven't found them that persuasive for reasons close to the ones Habryka gives-- I think if you carve things up such that capacity-building work has been responsible for e.g. speeding up the creation of frontier AI labs, you should also credit it for the broader movement focused on catastrophic risks, and my intuition is the counterfactual world without that movement would be worse off overall. I don’t buy that the acceleration has been substantial enough that the unaccelerated world would have bought a lot of time for societal improvements useful for addressing catastrophic risks; instead, it feels to me like the unaccelerated world would be facing these risks more blindly and with less time to usefully prepare.

I also think given the massive amount of non-GCR-related interest and resources in AI now, the forward-looking acceleration effects seem likely to be much smaller than any historic effect. I generally think the ratio of “meaningfully adding to the talent pool of people working on catastrophic risks” to “meaningfully accelerating AI capabilities” for most CB programs will look extremely favorable.

I don't think it's obvious that even if you count those, capacity-building has been strongly net-negative to date, but I do think it's pretty plausible.

Like, if you were to count the costs as broadly as "all the labs are downstream of capacity-build work" then you also need to count the benefits as broadly. And a broadly known public track record of being concerned about these problems for a long time, and that it's motivated by altruism, and that you tried to solve the problem for a long time, and being one of the few memetic centers in the world that people draw on to figure out what to do about this whole AI situation is quite valuable, possibly more valuable than the acceleration-effects of things like Deepmind, OpenAI and Anthropic.

(that said, my actual take here is that the biggest issue with most capacity-building work is that it actively undermines the things that other capacity-building work has been highly successful at, so that ultimately some capacity building work is predictably extremely good for the world, and some is predictably extremely bad for the world).

the biggest issue with most capacity-building work is that it actively undermines the things that other capacity-building work has been highly successful at, so that ultimately some capacity building work is predictably extremely good for the world, and some is predictably extremely bad for the world

Wait what does this mean? Is there some kind of dichotomy I'm not aware of?

Maybe? I am not saying the dichotomy is common-knowledge, but I feel pretty confident predicting which capacity-building work will be quite bad in-expectation and which will be quite good (this doesn't mean there isn't variance within those categories with many orgs or people having sign-flipped impact from their reference class, but that I am happy to register predictions at the class level with like reasonably-high confidence).

I would then like to know which is which (DM is okay if you feel that would be somewhat controversial, it's also alright if you want to keep your opinions to yourself)

Sorry, I am not saying there is a classifier here that is like one sentence long. At a high level I think "is it largely funneling people into places where the incentives will point towards building more powerful AI systems and/or becoming personally more powerful, or is it putting people into positions where their primary incentives are to help other people make sense of what is going with some grounding in accuracy of their beliefs" is the best short classifier I have, but I didn't intend to communicate there is some super short description of the classifier!

No worries, thanks for elaborating

Do you factor in the opportunity cost of people doing things other than AI safety research? It's really unclear to me why someone with a decade's work of feeding the homeless should work on AI safety, nor if they have experience in public health or other worthy causes. It seems strange to ignore experience and say people shouldn't do what they're good at and already "optimised" to do that helps society. I've met dozens of people who are in this position who seem pressured to work on AI regardless of skillset, experience, background or interest.

AI safety needs much more than AI safety researchers, which is partly why capacity-building helps. Right now a lot of capacity building effort is being redirected to policy researchers, operations people, generalists, middle management, recruiters, comms people, etc.

That said, yeah, there are people for which it indeed it won't make sense to work on AI safety or policy, but I also think people that are very talented and capable often refuse to go into AI safety out of impostor syndrome, so I think messaging around talent is a bit of balancing act. For example, in the past, Kairos has tried to frame decision emails in a way that nudges people whose comparative advantage might be misplaced to consider other options, while also trying to encourage those who might overupdate from a rejection.

Do you factor in the opportunity cost of people doing things other than AI safety research? It's really unclear to me why someone with a decade's work of feeding the homeless should work on AI safety, nor if they have experience in public health or other worthy causes.

The example of public health experience is a pretty interesting one, because I know of two doctors who came from a (strong) public health background who are now working on risks from advanced AI, and both do some pretty exceptional work in the field. My take is not 'everyone should pivot to AI safety', but it seems to me like if you're smart, adaptable and impact-oriented, this is something you should seriously consider, and you shouldn't defer to whether your current work seems good enough, especially if there's room for having a significantly larger amount of impact by pivoting. We're in an emergency, and while not everyone should drop what they're doing for this emergency, a lot of people should.

Very little of the impact of people working in AI Safety is downstream of their research, so this seems wrong. Maybe you also think the same will be true for policy/communications/evals work?

Very little of the impact of people working in AI Safety is downstream of their research, so this seems wrong.

Where do you think the impact comes from? And is this coming from a background belief that most current alignment work is useless?

Yeah this argument applies much less to policy/communications work.

and they generally get fewer opportunities to advance their careers.

I don't know what this means? Like, they won't get invited to talk to policy makers? Or won't be successful writers informing people about the risks? Are you talking about lab employment here?

I agree that you (hopefully) won't get that many new lab employees out of recent recruitment efforts, but that's a plus not a minus!

I guess one consideration for prioritizing direct work over capacity-building is AI timelines. Do you think this should be an important consideration?

The turnaround time is sometimes < 1 year for high-school students!

According to the MATS Alumni Impact Analysis, the ratios ares 78%:

49% are "Working/interning on AI alignment/control."
29% are "Conducting alignment research independently."
1.4% are "Working/interning on AI capabilities."

If we (perhaps unfairly) ignore the independent researcher, then that's 3 to capabilities for every 100 to safety!

I don't think it's obvious that even if you count those, capacity-building has been strongly net-negative to date, but I do think it's pretty plausible.

the biggest issue with most capacity-building work is that it actively undermines the things that other capacity-building work has been highly successful at, so that ultimately some capacity building work is predictably extremely good for the world, and some is predictably extremely bad for the world

Wait what does this mean? Is there some kind of dichotomy I'm not aware of?

I would then like to know which is which (DM is okay if you feel that would be somewhat controversial, it's also alright if you want to keep your opinions to yourself)

No worries, thanks for elaborating

Do you factor in the opportunity cost of people doing things other than AI safety research? It's really unclear to me why someone with a decade's work of feeding the homeless should work on AI safety, nor if they have experience in public health or other worthy causes.

88

The case for AI safety capacity-building work

88

The case for capacity-building work

Surveys

Testimonials

Neel Nanda (Senior Research Scientist at Google DeepMind)

Max Nadeau (Associate Program Officer (Technical AI Safety) at Coefficient Giving)

Rachel Weinberg (founder and former head of The Curve, currently at AI Futures Project)

Marius Hobbhann (CEO and founder of Apollo Research)

Adam Kaufman (member of technical staff at Redwood Research)

Gabriel Wu (member of technical staff (alignment) at OpenAI)

Catherine Brewer (Senior Program Associate (AI Governance) at Coefficient Giving)

Aric Floyd (video host for AI in Context)

Ryan Kidd (Director of MATS)

What tends to work?

What’s good to do now?

Who should be doing this work?

What would doing this work look like?

Working at an organization doing good work in the space

Constellation - CEO

Kairos – various early generalist positions

Starting or running your own capacity-building project or organization

Working on a capacity-building project part-time

Subscribing to Multiplier, a Substack with thoughts from our team (and other AI grantmaking staff at CG)

Letting our team know

Social proof

Julian Hazell, AI governance and policy at Coefficient Giving

Trevor Levin, AI governance and policy at Coefficient Giving

Ryan Greenblatt, Chief Scientist at Redwood Research:

Buck Shlegeris, CEO of Redwood Research

Appendix

88

88