Action: Help expand funding for AI Safety by coordinating on NSF response

Evan R. Murphy

Thanks to Frances Lorenz and Shiri for their feedback on a draft of this post.

tl;dr: Please fill out this short form if you might be willing to take a few small, simple actions in the next ~5 weeks that have the chance to dramatically increase funding for AI Safety through the NSF.

What is this?

The National Science Foundation (NSF) has put out a Request for Information relating to topics they will be funding in 2023 as part of their NSF Convergence Accelerator program. A group of us are working on coordinating responses to maximize chances that they'll choose AI Safety as one of their topics. This has the potential to add millions of dollars to the available grant pool for AI Safety in 2023.

Shiri Dori-Hacohen originally posted about this on the 80,000 Hours AI Safety email list. Here's an excerpt from her email which explains the situation well (some emphasis hers, some mine):

To make a long story short(ish), the responses they get to this RfI now (by Feb 28) will influence the call for proposals they put out in this program in 2023.
This RfI is really quite easy to respond to, and it could be a potentially very influential thing to propose AI Safety as a topic. It's the kind of thing that could have a disproportionate impact on the field by influencing downstream funding, which would then have a ripple effect on additional researchers learning more about AI safety and possibly shifting their work to it. This impact would last over and above any kind of research results funded by this specific call, and I sincerely believe this is one of the highest-impact actions we can take right now.
In my experience, it would be incredibly powerful to mount an orchestrated / coordinated response to this call, i.e. having multiple folks replying with distinct but related proposals. For example, I know that a large group of [redacted] grantees had mounted such a coordinated response a couple of years ago in response to this exact RfI, and that was what led the NSF to pick disinformation as one of the two topics for the 2021 Convergence call, leading to $9M in federal funding (including my own research!) -- and many many additional funding opportunities downstream for the NSF grantees.
Even if there was a relatively small probability of success for this particular call, the outsized impact of success would make the expected value of our actions quite sizable. Furthermore, the program managers reading the responses to these calls have incredible influence on the field, so even if we "fail" in setting the topic for 2023, but nonetheless manage to slightly shift the opinion of the PMs and inclining their perspective towards viewing AI safety as important, that could still have a downstream positive impact on the acceptance of this subfield.

Could this backfire?

Some people in the AI alignment community have raised concerns about how talking to governments about AI existential risk could do more harm than good. For example, in the "Discussion with Eliezer Yudkowsky on AGI interventions" post on Nov 21, 2021, Eliezer said:

Maybe some of the natsec people can be grownups in the room and explain why "stealing AGI code and running it" is as bad as "full nuclear launch" to their foreign counterparts in a realistic way. Maybe more current AGI groups can be persuaded to go closed; or, if more than one has an AGI, to coordinate with each other and not rush into an arms race. I'm not sure I believe these things can be done in real life, but it seems understandable to me how I'd go about trying - though, please do talk with me a lot more before trying anything like this, because it's easy for me to see how attempts could backfire

This is a valid concern in general, but it doesn't pertain to the present NSF RfI. The actions we're taking here are targeted at expanding grant opportunities for AI Safety through the NSF. They are unlikely to have any direct impact on US policies or regulations. Also, the NSF has a reputation of being quite nuanced and thoughtful in its treatment of research challenges.

What actions do I take?

If you're interested in helping out with this, all you have to do right now is fill out this short form so that we can follow up with you:

https://airtable.com/shrk0bAxm0EeJbyPC

Then, over the next several weeks before the NSF's RfI deadline (Feb 28), we'll ask you to take a few quick, coordinated actions to help us make the best case we can to the NSF on why AI Safety should be prioritized as a funding area for their 2023 Convergence Accelerator.

It seems likely to me that, when evaluating the impact of this, changes in available funding are a smaller consideration than changes in the field's status with the government and the academic community. NSF grants carry different obligations from the most prominent funding streams for AI safety research, and they function as a credential of sorts for being a Legitimate Line of Inquiry.

I'm pretty uncertain about how this plays out. I would expect policy work to benefit from research that is more legibly credible to people outside the community, and NSF support should help with that. On the other hand, the more traditional scientific community is full of perverse incentives and it may be bad to get tangled up in it. I imagine there are other considerations I'm not aware of.

How much AI safety work is already receiving federal funding? Maybe there's already some evidence about how this is likely to go?

AI safety research is receiving very little federal funding at this time, and is almost entirely privately funded, AFAIK. I agree with you that NSF funding leads to a field being perceived as more legitimate, which IMO is in fact one of the biggest benefits if we manage to get this through. If you ask me, the AI safety community tends to overplay the perverse incentives in academia and underplay the value of having many many more (on average) very intelligent people thinking about what is arguably one of the bigger problems of our time. Color me skeptic, but I don't see any universe in which having AI safety research go mainstream is a bad thing.

Plausible universe in which AI safety research going mainstream now would be bad: The mainstream version of the field uses the current dominant paradigms for understanding the problem, and those cause network effects such that new paradigms are very slow to be introduced. In the counterfactual world, the non-mainstream version of the field one year later develops a paradigm which, if introduced to academia immediately, solves the problem 2 years faster than the old paradigm would after being introduced to academia.

This is a conceivable universe, but do you really think it's likely? It seems to me much more likely that additional funding opportunities would help AI safety research move at least a little bit faster.

I don’t know enough about the dynamics in academia, or the rate of progress in alignment to be confident in my assessment. But I don’t think it’s <6% something similar to this happens, so if people are introducing the field to mainstream academia, they should take precautions to minimize the chances of effect I described resulting in significant slowdowns.

Two views that I have seen from the AI risk community on perverse incentives within academia are:

Incentives within the academic community are such that researchers are unable to pursue or publish work that is responsible, high-quality, and high-value. With few exceptions, anyone who attempts to do so will be outcompeted and miss out on funding, tenure, and social capital, which will eventually lead to their exit from academia.
Money, effort, and attention within the academic community are allocated by a process that is only loosely aligned with the goal of producing good research. Some researchers are able to avoid the trap and competently pursue valuable projects, but many will not, and a few will even go after projects that are harmful.

I think 1 is overplayed. There may be fields/subfields that are like that, but I think there is room in most fields for the right kind of people to pursue high-value research while succeeding in academia. I think 2 is a pretty big deal, though. I'm not too worried about all the people who will get NSF grants for unimportant research, though I am a little concerned that a flood of papers that are missing the point will go against the goal of legitimizing AI risk research for policy impact.

What I'm more worried about is research that is actively harmful. For example, my understanding is that a substantial portion of gain-of-function research has been funded by the federal government. This strikes me as frighteningly analogous to the kind of work that we should be concerned about in AI risk. I think this was mostly NIH, not NSF, so maybe there are good reasons for thinking the NSF is less likely to support dangerous work? Is there a strategy or an already-in-place mechanism for preventing people from using NSF funds for high-risk work? Or maybe there's an important difference in incentives here that I'm not seeing?

For what it's worth, I'm mostly agnostic on this, with a slight lean toward NSF attention being bad. Many of the people I most admire for their ability to solve difficult problems are academics, and I'm excited about the prospect of getting more people like that working on these problems. I really don't want to dismiss it unfairly. I find it pretty easy to imagine worlds in which this goes very badly, but I think the default outcome is probably that a bunch of money goes to pointless stuff, a smaller amount goes to very valuable work, the field grows and diversifies, and (assuming timelines are long enough) the overall result is a reduction in AI risk. But I'm not very confident of this, and the downsides seem much larger than the potential benefits.

Sure. I don't think you can fit the entire problem of AI alignment within CS, but I think the time is somewhat ripe for people to get more grants for better interpretability, and for progressively more ambitious attempts to make general-ish AI (right now language models because that's where the cheap data about humans is, but we might also imagine near-future AI that fuses language models with limited action spaces amenable to RL [MineRL season 4?]).

Judging by the voting and comments so far (both here as well as on the EA Forum crosspost), my sense is that many here support this effort, but some definitely have concerns. A few of the concerns are based in hardcore skepticism about academic research that I'm not sure are compatible with responding to the RfI. Many concerns though seem to be about this generating vague NSF grants that are in the name of AI safety but don't actually contribute to the field.

For these latter concerns, I wonder is there a way we could resolve them by limiting the scope of topics in our NSF responses or giving them enough specificity? For example, what if we convinced the NSF that all they should make grants for is mechanistic interpretability projects like the Circuits Thread. This is an area that most researchers in the alignment community seem to agree is useful, we just need a lot more people doing it to make substantial progress. And maybe there is less room to go adrift or mess up this kind of concrete and empirical research compared to some of the more theoretical research directions.

It doesn't have to be just mechanistic interpretability, but my point is, are there ways we could shape or constrain our responses to the NSF like this that would help address your concerns?

How much AI safety work is already receiving federal funding? Maybe there's already some evidence about how this is likely to go?

Two views that I have seen from the AI risk community on perverse incentives within academia are:

Incentives within the academic community are such that researchers are unable to pursue or publish work that is responsible, high-quality, and high-value. With few exceptions, anyone who attempts to do so will be outcompeted and miss out on funding, tenure, and social capital, which will eventually lead to their exit from academia.
Money, effort, and attention within the academic community are allocated by a process that is only loosely aligned with the goal of producing good research. Some researchers are able to avoid the trap and competently pursue valuable projects, but many will not, and a few will even go after projects that are harmful.

It doesn't have to be just mechanistic interpretability, but my point is, are there ways we could shape or constrain our responses to the NSF like this that would help address your concerns?