*** This an edited and expanded version of a post I made on X in response to GovAI’s new report“Open-Sourcing Highly Capable Foundation Models” I think the report points in the right direction, but also leaves me with some additional questions. Also, thanks for significant feedback from @David_Kristoffersson,  @Elliot_Mckernon, @Corin Katzke, and @cwdicarlo ***


From my vantage point the debate around open-sourcing foundation models became heated as Yann LeCun began advocating for open-sourcing (in particular) Meta's foundation models. This prompted a knee-jerk reaction in the AI Safety community. 

The arguments went something like "of course open-sourcing foundation models is a good idea, just LOOK at all the BENEFITS open-sourcing has given us!" for the "pro" crowd, and something like "of course open-sourcing foundation models is a terrible idea, just THINK about how it increases RISK" for the "anti" crowd. 

Given this, I was excited to see the release of GovAI’s new report which, as Elizabeth A. Seger highlights in their brief summary, outlines both the noted benefits and risks of open-sourcing more generally, and how these benefits and risks might be applied in particular to foundation models. In this report titled “Open-Sourcing Highly Capable Foundation Models” Seger, along with her numerous co-authors  walk us through these benefits and risks and also explore alternative policies that arguably provide similar benefits as open-sourcing while mitigating the risks of open-sourcing foundation models. 


After reading that report, I have a few summarizing thoughts: 

To open source or not to open source foundation models is a false dichotomy. Instead, there is a gradient of options to consider.

This is something that should be more obvious but does seem to be lost in the current debate. The gradient runs from fully closed to fully open and includes additional categories including gradual/staged release, hosted access, cloud-based/API access, and downloadable. (“Box 1: Further research is needed to define open-source gradients” from the paper illustrates this well.) It’s worth noting that even Meta’s Llama2 is not fully open.

Structured access seems to be a particularly useful option. It provides many of the benefits of fully open-sourcing while protecting against some of the risks of both fully closed models and fully open-sourced models.

The report cites work from Toby Shevlane, including his paper “Structured access: an emerging paradigm for safe AI deployment” which is also a chapter in The Oxford Handbook of AI Governance. Shevlane describes the idea of structured access in the following way:


“Structured access involves constructing, through technical and often bureaucratic means, a controlled interaction between an AI system and its user. The interaction is structured to both (a) prevent the user from using the system in a harmful way, whether intentional or unintentional, and (b) prevent the user from circumventing those restrictions by modifying or reproducing the system.” 

The GovAI report echoes these benefits, and I find them compelling. 

A rigorous and healthy ecosystem for auditing foundation models could alleviate substantial risks of open sourcing.

The report references work by Deb Raji and colleagues titled “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing” The authors 

“outline the components of an initial internal audit frame- work, which can be framed as encompassing five distinct stages— Scoping, Mapping, Artifact Collection, Testing and Reflection (SMACTR)— all of which have their own set of documentation requirements and account for a different level of the analysis of a system.”

The report makes modest, cautious, and reasonable recommendations for governance.

The recommendations are as follows: 

1. Developers and governments should recognise that some highly capable models will be too risky to open-source, at least initially. These models may become safe to open-source in the future as societal resilience to AI risk increases and improved safety mechanisms are developed.

2. Decisions about open-sourcing highly capable foundation models should be informed by rigorous risk assessments. In addition to evaluating models for dangerous capabilities and immediate misuse applications, risk assessments must consider how a model might be fine-tuned or otherwise amended to facilitate misuse.

3. Developers should consider alternatives to open-source release that capture some of the same [distributive, democratic, and societal] benefits, without creating as much risk. Some promising alternatives include gradual or “staged” model release, model access for researchers and auditors, and democratic oversight of AI development and governance decisions.

4. Developers, standards setting bodies, and open-source communities should engage in collaborative and multi-stakeholder efforts to define fine-grained standards for when model components should be released. These standards should be based on an understanding of the risks posed by releasing (different combinations of) model components.

5. Governments should exercise oversight of open source AI models and enforce safety measures when stakes are sufficiently high. AI developers may not voluntarily adopt risk assessment and model sharing standards. Governments will need to enforce such measures through options such as liability law and regulation (e.g. via licensing requirements, fines, or penalties). Governments will also need to build the capacity to enforce such oversight mechanisms effectively.

Here, I’d also like to point to another post titled “Navigating the Open-Source AI Landscape: Data, Funding, and Safety” from April of this year that also points at company-centered recommendations that somewhat overlap with the GovAI report recommendations. These recommendations focus more specifically on what developers and companies can do (rather than governments), but I think the list is a good one for developers to be considering as well. Their 10 recommendations are:

  1. Prioritize safety over speed in publication. Focus on AI models’ alignment before release to minimize risks.
  2. Regularly audit and update models to reduce risks and maintain AI alignment.
  3. Seek external feedback through red teams, beta-testers, and third parties to identify potential issues before release.
  4. Sign open letters on AI safety to publicly commit to responsible AI practices.
  5. Favor structured access to ensure that only authorized users with proper understanding can deploy the technology.
  6. Avoid developing agentic systems to reduce the risks of autonomous decision-making misaligned with human values.
  7. Communicate model’s limitations, helping users make informed decisions and reducing unintended consequences.
  8. Omit harmful datasets during training, preventing undesirable content outputs (e.g. gpt-4chan was trained on 4chan’s politically-incorrect board.)
  9. Enhance open-source content moderation datasets, such as OIG-moderation, to develop more robust and safer AI systems.
  10. Develop an infohazard policy, like this one by Conjecture, to prevent the leakage of information that could accelerate AGI.

I really appreciate and endorse the conclusion of the report. 

The authors of the report conclude with the following:

"Overall, openness, transparency, accessibility, and wider community input are key to facilitating a future for beneficial AI. The goal of this paper is therefore not to argue that foundation model development should be kept behind closed doors. Model sharing, including open-sourcing, remains a valuable practice in most cases. Rather, we submit that decisions to open-source increasingly capable models must be considered with great care. Comprehensive risk assessments and careful consideration of alternative methods for pursuing open-source objectives are minimum first steps."

I enjoyed reading this article that at least attempts to place the benefits and risks of OS side-by-side and discuss how they might be applied to the context of foundation models rather than the oversimplifications that have been dominating this discussion.

In addition to my thoughts above, I'm wondering whether concerned communities could provide insight on the following two questions: 

How large are the risks from fully open-sourced foundation models? More specifically, how significant are these risks compared to the overall risks inherent in the development and deployment of foundation models?

It seems to me that many of the risks identified in the GovAI report are not that distinguishable from the inherent risks of the development and widespread deployment of foundation models. Does keeping foundation models more closed actually help prevent some of the more serious risks presented by the development and deployment of more capable foundation models? 

Is there any reasonable way to prevent leaks in a world with stricter regulation of fully OS foundation models?

I don't have a good sense of how easy it would be for a company or even a rogue employee to leak weights and architecture, other than it has already been done at least once.


Thanks again to the authors of this paper for providing a detailed and nuanced treatment of this topic, and many thanks to GovAI for sponsoring this important and interesting work! I would be very interested in any additional thoughts on these two questions.


New to LessWrong?

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 7:20 AM

Thanks for the post.

A rigorous and healthy ecosystem for auditing foundation models could alleviate substantial risks of open sourcing.

The problem here is that fine-tuning easily strips any safety changes and easily adds all kinds of dangerous things (as long as capability is there).

The only form of auditing that might work is if a model can only be run from within a protected framework, which is doing quite a bit of auditing on the fly, before allowing an inference to go through...

I can see how this can be compatible with open-sourcing encrypted weights (which, by the way, might prevent the ability to fine-tune at all as well)...

It's more difficult to imagine how this might work for weights represented by plain tensors (assuming that the model architecture is understood, and that it's not too difficult to write an unprotected version of the fine-tuning and inference engines).

Thank you for this comment! 

I think your point that "The problem here is that fine-tuning easily strips any safety changes and easily adds all kinds of dangerous things (as long as capability is there)." is spot on and maps to my intuitions about the weaknesses of fine-tuning and one of strongest points in favor of the significant risks to open-sourcing foundation models. 

I appreciate your suggestions for other methods of auditing that could possibly work such as a model being run within a protected framework and open-sourcing encrypted weights. I think these allow for something like risk mitigations for partial open-sourcing but would be less feasible for fully open sourced models where weights represented by plain tensors would be more likely to be available

Your comment is helpful and gave me some additional ideas to consider. Thanks!

One thing I would add is that the idea I had in mind for auditing was more of a broader process than a specific tool. The paper I mention to support this idea of a healthy ecosystem for auditing foundation models is “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing.” Here the authors point to an auditing process that would guide a decision of whether or not to release a specific model and the types of decision points, stakeholders, and review process that might aid in making this decision. At the most abstract level the process includes scoping, mapping, artifact collection, testing, reflection, and post-audit decisions of whether or not to release the model. 

Open source or not open source. 
Is that the question?
Whether tis nobler in the mind to share 
the bits and weights of outrageous fortune 500 models? 
or to take arms against superintelligence
and through privacy, end them? to hide.
to share, no more. and by a share to say we end
the headache and the thousand artificial shocks
the brain is heir to: tis a conversation
devoutly to be wished. to hide.
to encrypt, perchance to silence - aye, there's the rub. 
for in that closed off world, what solutions may arise, 
that may save us from the models we build, 
may give us our pause?

This article talks a lot about risks from AI. I wish the author would be more specific what kinds of risks they are thinking about. For example, it is unclear which parts are motivated by extinction risks or not. The same goes for the benefits of open-sourcing these models. (note: I haven't read the reports this article is based on, these might have been more specific)

Thanks for this comment. I agree there is some ambiguity here on the types of risks that are being considered with respect to the question of open-sourcing foundation models. I believe the report favors the term "extreme risks" which is defined as "risk of significant physical harm or disruption to key societal functions." I believe they avoid the terms of "extinction risk" and "existential risk," but are implying something not too different with their choice of extreme risks. 

For me, I pose the question above as:

"How large are the risks from fully open-sourced foundation models? More specifically, how significant are these risks compared to the overall risks inherent in the development and deployment of foundation models?"  

What I'm looking for is something like "total risk" versus "total benefit." In other words, if we take all the risks together, just how large are they in this context? In part I'm not sure if the more extreme risks really come from open sourcing the models or simply from the development and deployment of increasingly capable foundation models.

I hope this helps clarify!