Cross-posted on the EA Forum. This article is part of a series of ~10 posts comprising a 2024 State of the AI Regulatory Landscape Review, conducted by the Governance Recommendations Research Program at Convergence Analysis. Each post will cover a specific domain of AI governance, such as incident reporting, safety evals, model registries, and more. We’ll provide an overview of existing regulations, focusing on the US, EU, and China as the leading governmental bodies currently developing AI legislation. Additionally, we’ll discuss the relevant context behind each domain and conduct a short analysis.
This series is intended to be a primer for policymakers, researchers, and individuals seeking to develop a high-level overview of the current AI governance space. We’ll publish individual posts on our website and release a comprehensive report at the end of this series.
What are open-source models, and what are their effects on AI safety?
Some software developers choose to open-source their software; they freely share the underlying source code and allow anyone to use, modify, and deploy their work. This can encourage friendly collaboration and community-building, and has produced many popular pieces of software, including operating systems like Linux, programming languages and platforms like Python and Git, and many more.
Similarly, AI developers are open-sourcing their models and algorithms, though the details can vary. Generally, open-sourcing of AI models involves some combination of:
Sharing the model weights. These are the specific parameters that make the model function, and are set during training. If these are shared, others can reconstruct the model without doing their own training, which is the most expensive part of developing such AI.
Sharing the training data used to train the model.
Sharing the underlying source code.
Licensing for free commercial usage.
For example, Meta released the model weights of their LLM, Llama 2, but not their training code, methodology, original datasets, or model architecture details. In their excellent article on Openness In Language Models, Prompt Engineering labels this an example of an “open weight” model. Such an approach allows external parties to use the model for inference and fine-tuning, but doesn’t allow them to meaningfully improve or analyze the underlying model. Prompt Engineering points out a drawback of this approach:
So, open weights allows model use but not full transparency, while open source enables model understanding and customization but requires substantially more work to release [...] If only open weights are available, developers may utilize state-of-the-art models but lack the ability to meaningfully evaluate biases, limitations, and societal impacts. Misalignment between a model and real-world needs can be difficult to identify.
Further, while writing this article in April 2024, Meta released Llama 3 with the same open-weights policy, claiming that it is “the most capable openly available LLM to date”. This has brought fresh attention to the trade-offs of open-sourcing, as the potential harms of freely sharing software are greater the more powerful the model in question is. Even those who are fond of sharing wouldn’t want everyone in the world to have easy access to the instructions for a 3D-printable rocket launcher, and freely sharing powerful AI could present similar risks; such AI could be used to generate instructions for assembling homemade bombs or even designing deadly pathogens. Distributing information of this nature widely is termed an information hazard.
To prevent these types of hazards, AI models like ChatGPT have safeguards built in during the fine-tuning phase towards the end of their development (implementing techniques such as Reinforcement Learning by Human Feedback, or RLHF). This technique can limit AI models from producing harmful or undesired content.
Some people find ways to get around this fine-tuning, but experts have pointed out that malicious actors could circumvent the problem entirely. ChatGPT and Claude, the two most prominent LLMs are closed-source (and their model weights are closely guarded secrets), but open-source models can be used and deployed without fine-tuning safeguards. This was demonstrated practically with Llama 2, a partly open-source LLM developed by Meta in Palisade Research’s paper BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B. To quote an interview with one of its authors Jeoffrey Ladish:
You can train away the harmlessness. You don’t even need that many examples. You can use a few hundred, and you get a model that continues to maintain its helpfulness capabilities but is willing to do harmful things. It cost us around $200 to train even the biggest model for this. Which is to say, with currently known techniques, if you release the model weights there is no way to keep people from accessing the full dangerous capabilities of your model with a little fine tuning.
Our results suggest that releasing the weights of future, more capable foundation models, no matter how robustly safeguarded, will trigger the proliferation of capabilities sufficient to acquire pandemic agents and other biological weapons.
Yes, openly available models come with risks and vulnerabilities — AI models can be abused by malicious actors or deployed by ill-equipped developers. However, we have seen time and time again that the same holds true for proprietary technologies — and that increasing public access and scrutiny makes technology safer, not more dangerous. The idea that tight and proprietary control of foundational AI models is the only path to protecting us from society-scale harm is naive at best, dangerous at worst.
Finally, some argue that open-sourcing or not is a false dichotomy, putting forward intermediate policies such as structured access:
Instead of openly disseminating AI systems, developers facilitate controlled, arm's length interactions with their AI systems. The aim is to prevent dangerous AI capabilities from being widely accessible, whilst preserving access to AI capabilities that can be used safely.
There are more perspectives and arguments than we can concisely include here, and you might be interested in the following discussions:
The US AI Bill of Rights doesn’t discuss open-source models, but the Executive Order on AI does initiate an investigation into the risk-reward tradeoff of open-sourcing. Section 4.6 calls for soliciting input on foundation models with “widely available model weights”, specifically targeting open-source models. Section 4.6 summarizes the risk-reward tradeoff of publicly sharing model weights, which offers “substantial benefits to innovation, but also substantial security risks, such as the removal of safeguards within the model”. In particular: 4.6 calls for the Secretary of Commerce to:
Section 4.6(a): Set up a public consultation with the private sector, academia, civil society, and other stakeholders on the impacts and appropriate policy related to dual-use foundation models with widely available weights (“such models” below), including:
4.6(a)(i): Risks associated with fine-tuning or removing the safeguards from such models;
4.6(a)(ii): Benefits to innovation, including research into AI safety and risk management, of such models;
4.6(a)(iii): Potential voluntary, regulatory, and international mechanisms to manage risk and maximize the benefits of such models;
4.6(b): Submit a report to the president based on the results of 4.6(a), on the impacts of such models, including policy and regulatory recommendations.
The EU
The EU AI Act states that open-sourcing can increase innovation and economic growth. The act therefore exempts open-source models and developers from some restrictions and responsibilities placed on other models and developers. Note though that these exemptions do not apply to foundation models (meaning generative AI like ChatGPT), or if the open-source software is monetized or is a component in high-risk software.
Section 57: Places responsibilities on providers throughout the “AI value chain”, i.e. anyone developing components or software that’s used in AI. Third parties should be exempt if their products are open-source, though it encourages open-source developers to implement documentation practices, such as model cards and data sheets.
Section 60i & i+1: Clarifies that GPAI models released under free and open-source licenses count as satisfying “high levels of transparency and openness” if their parameters are made publicly available, and a license should be considered free and open-source when users can run, copy, distribute, study, change, and improve the software and data. This exception does not apply if the component is monetized in any way.
Section 60f: Exempts providers of open-source GPAI models from the transparency requirements unless they present a systemic risk. This does not exempt GPAI developers from the obligation to produce a summary about training data or to enact a copyright policy.
Section 60o: Specifies that developers of GPAI models should notify the AI Office if they’re developing a GPAI model that exceeds certain thresholds (therefore conferring systemic risk), and that this is especially important for open-source models.
Article 2(5g): States that obligations shall not apply to AI systems released under free and open-source licenses unless they are placed on the market or put into service as high-risk AI systems.
Article 28(2b): States that providers of high-risk AI systems and third parties providing components for such systems have a written agreement on what information the provider will need to comply with the act. However, third parties publishing “AI components other than GPAI models under a free and open licence” are exempt from this.
Article 52c(-2) & 52ca(5): Exempt providers of AI models under a free and open licence that publicly release the weights and information on their model from (1) the obligation to draw up technical documentation and (2) from the requirement to appoint an authorized representative in the EU. Neither of these exemptions apply ifthe GPAI model has systemic risks.
There is no mention of open-source models in China’s regulations between 2019 and 2023; open-source models are neither exempt from any aspects of the legislation, nor under any additional restrictions or responsibilities.
Convergence’s Analysis
The boundaries andterminology around open-sourcing are often underspecified.
Open-sourcing vs closed-sourcing AI models is not binary, but a spectrum. Developers must choose whether to publicly release multiple aspects of each model: the weights and parameters of the model; the data used to train the model; the source code and algorithms underlying the model and its training; licenses for free use; and so on.
Existing legislation does not clearly delineate how partially open-sourced models should be categorized and legislated. It’s unclear, for example, whether Meta’s open-weight Llama-2 model would be considered open-source under EU legislation, as its source code is not public.
Open-sourcing modelsimproves transparency and accountability, but also gives the public broader access to dangerous information and reduces the efficacy of legislation. No one agrees on the right balance.
Through their training on vast swathes of data, LLMs contain hazardous information. Although RLHF is not sufficient to stop users accessing underlying hazardous information, it is a barrier, and one that can be much more easily bypassed in open-sourced models.
The more powerful a model is, the greater harm its misuse could lead to, and the more open-source a model is, the more easily misused it is. This means the potential harms of open-source models will increase over time.
Open-source models can be easily used and altered by potentially any motivated party, making it harder to implement and enforce safety legislation.
However, many experts are still staunch advocates for open-sourcing (as listed in the Context section), and believe it is essential for an accountable and transparent AI ecosystem. There is profound disagreement on the right balance between open and closed-source models, and such disagreement is likely to persist.
Developers ofopen-source models are not currently under any additional legal obligations compared to developers of private or commercial models.
In particular, the US Executive Order and Chinese regulations currently have no particular rules unique to open-source models or developers, though the US does recognize the risk-reward tradeoff presented by open-source AI, and has commissioned a report into its safety and appropriate policy.
The EU legislationtreats open-source models favorably.
Unlike the US Executive Order, the EU AI Act only describes the potential benefits of open-sourcing powerful models, without mentioning potential risks.
The EU AI act exempts open-source developers from many obligations faced by commercial competitors, unless the open-sourced software is part of a general-purpose or high-risk system.
Despite this, and despite the exemptions, proponents of open-sourcing have criticized the EU regulations for what they perceive as over-regulation of open-source models.
Cross-posted on the EA Forum. This article is part of a series of ~10 posts comprising a 2024 State of the AI Regulatory Landscape Review, conducted by the Governance Recommendations Research Program at Convergence Analysis. Each post will cover a specific domain of AI governance, such as incident reporting, safety evals, model registries, and more. We’ll provide an overview of existing regulations, focusing on the US, EU, and China as the leading governmental bodies currently developing AI legislation. Additionally, we’ll discuss the relevant context behind each domain and conduct a short analysis.
This series is intended to be a primer for policymakers, researchers, and individuals seeking to develop a high-level overview of the current AI governance space. We’ll publish individual posts on our website and release a comprehensive report at the end of this series.
What are open-source models, and what are their effects on AI safety?
Some software developers choose to open-source their software; they freely share the underlying source code and allow anyone to use, modify, and deploy their work. This can encourage friendly collaboration and community-building, and has produced many popular pieces of software, including operating systems like Linux, programming languages and platforms like Python and Git, and many more.
Similarly, AI developers are open-sourcing their models and algorithms, though the details can vary. Generally, open-sourcing of AI models involves some combination of:
For example, Meta released the model weights of their LLM, Llama 2, but not their training code, methodology, original datasets, or model architecture details. In their excellent article on Openness In Language Models, Prompt Engineering labels this an example of an “open weight” model. Such an approach allows external parties to use the model for inference and fine-tuning, but doesn’t allow them to meaningfully improve or analyze the underlying model. Prompt Engineering points out a drawback of this approach:
Further, while writing this article in April 2024, Meta released Llama 3 with the same open-weights policy, claiming that it is “the most capable openly available LLM to date”. This has brought fresh attention to the trade-offs of open-sourcing, as the potential harms of freely sharing software are greater the more powerful the model in question is. Even those who are fond of sharing wouldn’t want everyone in the world to have easy access to the instructions for a 3D-printable rocket launcher, and freely sharing powerful AI could present similar risks; such AI could be used to generate instructions for assembling homemade bombs or even designing deadly pathogens. Distributing information of this nature widely is termed an information hazard.
To prevent these types of hazards, AI models like ChatGPT have safeguards built in during the fine-tuning phase towards the end of their development (implementing techniques such as Reinforcement Learning by Human Feedback, or RLHF). This technique can limit AI models from producing harmful or undesired content.
Some people find ways to get around this fine-tuning, but experts have pointed out that malicious actors could circumvent the problem entirely. ChatGPT and Claude, the two most prominent LLMs are closed-source (and their model weights are closely guarded secrets), but open-source models can be used and deployed without fine-tuning safeguards. This was demonstrated practically with Llama 2, a partly open-source LLM developed by Meta in Palisade Research’s paper BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B. To quote an interview with one of its authors Jeoffrey Ladish:
Therefore, these models and their underlying software may themselves be information hazards, and many argue that open-sourcing advanced AI should be legally prohibited, or at least prohibited until developers can guarantee the safety of their software. In “Will releasing the weights of future large language models grant widespread access to pandemic agents?”, the authors conclude that
Others counter that openness is necessary to stop the power and wealth generated by powerful AI falling into the hands of a few, and that prohibitions won’t be effective safeguards, as argued in GitHub’s Supporting Open Source and Open Science in the EU AI Act and Mozilla’s Joint Statement on AI Safety and Openness, which was signed by over 1,800 people and states:
Finally, some argue that open-sourcing or not is a false dichotomy, putting forward intermediate policies such as structured access:
There are more perspectives and arguments than we can concisely include here, and you might be interested in the following discussions:
Current Regulatory Policies
The US
The US AI Bill of Rights doesn’t discuss open-source models, but the Executive Order on AI does initiate an investigation into the risk-reward tradeoff of open-sourcing. Section 4.6 calls for soliciting input on foundation models with “widely available model weights”, specifically targeting open-source models. Section 4.6 summarizes the risk-reward tradeoff of publicly sharing model weights, which offers “substantial benefits to innovation, but also substantial security risks, such as the removal of safeguards within the model”. In particular: 4.6 calls for the Secretary of Commerce to:
The EU
The EU AI Act states that open-sourcing can increase innovation and economic growth. The act therefore exempts open-source models and developers from some restrictions and responsibilities placed on other models and developers. Note though that these exemptions do not apply to foundation models (meaning generative AI like ChatGPT), or if the open-source software is monetized or is a component in high-risk software.
Notably, the treatment of open-source models was contentious during the development of the EU AI Act (see also here).
China
There is no mention of open-source models in China’s regulations between 2019 and 2023; open-source models are neither exempt from any aspects of the legislation, nor under any additional restrictions or responsibilities.
Convergence’s Analysis
The boundaries and terminology around open-sourcing are often underspecified.
Open-sourcing models improves transparency and accountability, but also gives the public broader access to dangerous information and reduces the efficacy of legislation. No one agrees on the right balance.
Developers of open-source models are not currently under any additional legal obligations compared to developers of private or commercial models.
The EU legislation treats open-source models favorably.