Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
After a year of negotiation, the NSF has announced a $20 million request for proposals for empirical AI safety research.
Here is the detailed program description.
The request for proposals is broad, as is common for NSF RfPs. Many safety avenues, such as transparency and anomaly detection, are in scope:
- "reverse-engineering, inspecting, and interpreting the internal logic of learned models to identify unexpected behavior that could not be found by black-box testing alone"
- "Safety also requires... methods for monitoring for unexpected environmental hazards or anomalous system behaviors, including during deployment."
Note that research that has high capabilities externalities is explicitly out of scope:
"Proposals that increase safety primarily as a downstream effect of improving standard system performance metrics unrelated to safety (e.g., accuracy on standard tasks) are not in scope."
Thanks to OpenPhil for funding a portion the RfP---their support was essential to creating this opportunity!
With the advent of Sydney and now this, I'm becoming more inclined to believe that AI Safety and policies related to it are very close to being in the overton window of most intellectuals (I wouldn't say the general public, yet). Like, maybe within a year, more than 60% of academic researchers will have heard of AI Safety. I don't feel confident whatsoever about the claim, but it now seems more than ~20% likely. Does this seem to be a reach?
I was watching an interview with that NYT reporter who had the newsworthy Bing chat interaction, and he used some language that made me think he'd searched for people talking about Bing chat and read Evan's post or a direct derivative of it.
Basically yes, I'd say that AI safety is in fact in the overton window. What I see as the problem is more that a bunch of other stupid stuff is also in the overton window.
One can hope, although I see very little evidence for it.
Most evidence I see, is an educated and very intelligent person, writing about AI (not their field), and when reading it I could easily have been a chemist reading about how the 4 basic elements makes it abundantly clear that bla bla - you get the point.
And I don't even know how to respond to that, the ontology displayed is to just fundamentally wrong, and tackling that feels like trying to explain differential equations to my 8 year old daughter (to the point where she grooks it).
There is also the problem of engaging such a person, its very easy to end up alienating them and just cementing their thinking.
That doesn't mean I think it is not worth doing, but its not some casual off the cuff thing.
This is a pretty common problem. If anyone ever needs to explain AI safety to someone, with minimal risk of messing up, I think that giving them pages 137-149 from Toby Ord's The Precipice is the best approach. It's simple, one shot, and does everything right.
I think the language here is importantly different from placing capabilities externalities as out of scope. It seems to me that it only excludes work that creates safety merely by removing incompetence as measured by standard metrics. For example, it's not clear to me that this excludes work that improves a model's situational awareness or that creates tools or insights into how a model works with more application to capabilities than to safety.
Wow, this is an incredible achievement given how AI safety is still a relatively small field. For example, this post by 80,000 hours said that $10 - $50 million was spent globally on AI safety in 2020 according to The Precipice. Therefore this grant is roughly equivalent to an entire year of global AI safety funding!
Why do they need to be in the US?
The NSF has political stakeholders.
Hmm, I suppose if as some might hypothesize AI just fails to manifest, considerations like "which country contains people with experience doing research" remain live.
In a large and prosperous country, the biggest decision makers probably have dozens, if not hundreds, of interest groups with even more influence, resources, and even smarter folks then the AI alignment community with even more adroitly crafted arguments, competing for their attention every day.
Is this through the NSF Convergence Accelerator or a different NSF program?
Looking into it more, pretty sure it's a different NSF program. The Convergence Accelerator process is still underway and it will likely be in the coming months that topics are selected for possible funding, including potentially AI safety.