Today, Anthropic, Google, Microsoft, and OpenAI are announcing the selection of Chris Meserole as the first Executive Director of the Frontier Model Forum, and the creation of a new AI Safety Fund, a more than $10 million initiative to promote research in the field of AI safety. The Frontier Model Forum, an industry body focused on ensuring safe and responsible development of frontier AI models, is also releasing its first technical working group update on red teaming to share industry expertise with a wider audience as the Forum expands the conversation about responsible AI governance approaches.
Chris Meserole comes to the Frontier Model Forum with deep expertise on technology policy, having worked extensively on the governance and safety of emerging technologies and their future applications. Most recently he served as Director of the Artificial Intelligence and Emerging Technology Initiative at the Brookings Institution.
In this new role, Meserole will be responsible for helping the Forum fulfill its mission to (i) advance AI safety research to promote responsible development of frontier models and minimize potential risks, (ii) identify safety best practices for frontier models, (iii) share knowledge with policymakers, academics, civil society and others to advance responsible AI development; and (iv) support efforts to leverage AI to address society’s biggest challenges.
“The most powerful AI models hold enormous promise for society, but to realize their potential we need to better understand how to safely develop and evaluate them. I’m excited to take on that challenge with the Frontier Model Forum.”
Over the past year, industry has driven significant advances in the capabilities of AI. As those advances have accelerated, new academic research into AI safety is required. To address this gap, the Forum and philanthropic partners are creating a new AI Safety Fund, which will support independent researchers from around the world affiliated with academic institutions, research institutions, and startups. The initial funding commitment for the AI Safety Fund comes from Anthropic, Google, Microsoft, and OpenAI, and the generosity of our philanthropic partners and the Patrick J. McGovern Foundation, the David and Lucile Packard Foundation, Eric Schmid, and Jaan Tallinn. Together this amounts to over $10 million in initial funding.
Earlier this year, the members of the Forum signed on to voluntary AI commitments at the White House, which included a pledge to facilitate third-party discovery and reporting of vulnerabilities in our AI systems. The Forum views the AI Safety Fund as an important part of fulfilling this commitment by providing the external community with funding to better evaluate and understand frontier systems. The global discussion on AI safety and the general AI knowledge base will benefit from a wider range of voices and perspectives.
The primary focus of the Fund will be supporting the development of new model evaluations and techniques for red teaming AI models to help develop and test evaluation techniques for potentially dangerous capabilities of frontier systems. We believe that increased funding in this area will help raise safety and security standards and provide insights into the mitigations and controls industry, governments, and civil society need to respond to the challenges presented by AI systems.
The Fund will put out a call for proposals within the next few months. Meridian Institute will administer the Fund — their work will be supported by an advisory committee comprised of independent external experts, experts from AI companies, and individuals with experience in grantmaking.
Over the last few months the Forum has worked to help establish a common set of definitions of terms, concepts, and processes so we have a baseline understanding to build from. This way researchers, governments, and other industry peers are all able to have the same starting point in discussions about AI safety and governance issues.
In support of building a common understanding, the Forum is also working to share best practices on red teaming across the industry. As a starting point, the Forum has come together to produce a common definition of “red teaming” for AI and a set of shared case studies in a new working group update. We defined red teaming as a structured process for probing AI systems and products for the identification of harmful capabilities, outputs, or infrastructural threats. We will build on this work and are committed to work together to continue our red teaming efforts.
We are also developing a new responsible disclosure process, by which frontier AI labs can share information related to the discovery of vulnerabilities or potentially dangerous capabilities within frontier AI models — and their associated mitigations. Some Frontier Model Forum companies have already discovered capabilities, trends, and mitigations for AI in the realm of national security. The Forum believes that our combined research in this area can serve as a case study for how frontier AI labs can refine and implement a responsible disclosure process moving forward.
Over the coming months, the Frontier Model Forum will establish an Advisory Board to help guide its strategy and priorities, representing a range of perspectives and expertise. Future releases and updates, including updates about new members, will come directly from the Frontier Model Forum — so stay tuned to their website for further information.
The AI Safety Fund will issue its first call for proposals in the coming months, and we expect grants to be issued shortly after.
The Frontier Model Forum will also be issuing additional technical findings as they become available.
The Forum is excited to work with Meserole and to deepen our engagements with the broader research community, including the Partnership on AI, MLCommons, and other leading NGOs and government and multi-national organizations to help realize the benefits of AI while promoting its safe development and use.
I was really excited about the Frontier Model Forum. This update seems... lacking; I was expecting more on commitments and best practices.
I'm not familiar with the executive director, Chris Meserole.
The associated red-teaming post seems ~worthless; it fails to establish anything like best practices or commitments. Most of the post is the four labs saying how they've done red-teaming; picking on Microsoft because its deployment of Bing Chat was the most obviously related to a failure of red-teaming, Microsoft fails to acknowledge this or discuss what they plan to do differently.
The AI Safety Fund seems better than nothing and I tentatively expect it to mostly be used well. It's not big enough to be a big deal, and it's not clear how much of it comes from the companies vs philanthropists.
The Forum has said nothing on extreme risks or the alignment problem, I think.
Also no details on how the Forum works.
My guess is that most of the Forum's value will come from sharing model evals and standards which can be incorporated into regulation and binding standards. Not clear how good those evals/standards will be. If it's just codifying what these four labs are already doing, that would be very insufficient. I unfortunately don't get a vibe of ambitious-best-practices-setting from today's update.
Look at Chris Meserole's twitter. He retweeted this opposition to pausing AI research, and the main AI worries he seems to retweet are about whether his political enemies will use it to generate propaganda that support themselves and oppose him and his allies. Looks to me like Frontier Model Forum is fundamentally compromised.
I checked, definitely directionally true, but "enemies will use it to generate propaganda" is a bad summary of legitimate concern about influence operations.
Bad summary by what criterion?
Something like it'll lead you to make worse predictions?
Scary possible AI influence ops include like making friends on Discord via text chats. I predict that if your predictions about influence-ops-concerns are only about political-propaganda, you'll make worse predictions.
Making friends on Discord and then using those friendships to disseminate propaganda through those friendships, no? Or to test the effectiveness of propaganda, or various other things. It's still centrally mediated by the propaganda.
Like I agree that there are other potential AI dangers involving AIs making friends on Discord than just propaganda, but that doesn't seem to be what "influence ops" are about? And there are other actors than political actors who could do it (e.g. companies could), but he seems to be focusing on geopolitical enemies rather than those actors.
Maybe he has concerns beyond this, but he doesn't seem to emphasize them much?
Reference class: I'm old enough to remember the founding of the Partnership on AI. My sense from back in the day was that some (innocently misguided) folks wanted in their hearts for it to be an alignment collaboration vehicle. But I think it's decayed into some kind of epiphenomenal social justice thingy. (And for some reason they have 30 staff. I wonder what they all do all day.)
I hope Frontier Model Forum can be something better, but my hopes ain't my betting odds.
Mostly agree, but PAI's new guidance (released yesterday) includes some real safety stuff for frontier models — model evals for dangerous capabilities, staged release before sharing weights.