Response to “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”

Matthew Wearden

This is a linkpost for https://matthewwearden.co.uk/2023/10/30/response-to-coordinated-pausing-an-evaluation-based-coordination-scheme-for-frontier-ai-developers/

Watercolour image of a frontier AI lab, bathed in warm sunlight. People are working together to discuss pausing of frontier AI research — Generated by DALL-E 3

Note: this is a x-post from my blog, Thoughts on AI , where I discuss a variety of topics in AI governance, particularly corporate governance.

Introduction

The corporate governance team at the Centre for the Governance of AI recently published a great paper, “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”, authored by Jide Alaga and Jonas Schuett. The following post contains a set of responses and comments to the paper - all such responses are based on personal insights and opinions that I have on the content that I hope add to the conversation. Any negative tone that may come across in the post does not represent my feelings on the paper overall - I think it is a fantastic, practical piece of research that I hope is read by policymakers both in frontier AI labs and governments.

A Note on Good Intentions

It may appear that some of my comments take a rather pessimistic view of frontier AI labs and their interests. In truth, I believe that many of these labs are full of individuals genuinely trying to do the right thing, who are aware of the risks they are dealing with. In my mind, this good faith should be given to almost any individual working at a frontier lab, but it absolutely should not be extended to the labs themselves. Any such organisation exists in an environment that strongly rewards certain types of decision making, and a collection of entirely justifiable, well-meant decisions can still lead to very bad outcomes. Good governance should not rely on the good intentions of organisations, but instead seek to make the exercising of those good intentions as likely as possible to align with good outcomes, whilst making any bad intentions as painful and cumbersome as possible to execute on.

Limitations of the Mutual Auditor Model

The main area where my current opinions disagree with those of the authors are on the efficacy and feasibility of the Mutual Auditor model in the paper. There are two key disagreements presented below.

It is unlikely that there will be a single auditor for the industry

Many of the strengths of the mutual auditor model lie in the coordination that is possible due to all frontier AI labs using a single auditor. This is a scenario that I believe is very unlikely to exist in practice, primarily because auditing is a profitable industry, with both demand and space for multiple organisations to enter the market.

Unless there are legal requirements to use one auditor, a frontier lab will seek to find any organisation that can a) evaluate their systems sufficiently to demonstrate they have found a reasonable auditor, and b) be loose enough with their audits that the lab’s own opinions on the safety of their models can heavily influence the outcome of the audit. This incentive mechanism has been shown through many other industries to be enough to attract new market entrants, and there is no compelling reason I can find to believe why this wouldn’t be true of frontier AI research. Amongst others, the Big Four are known to build teams of experts ready to enter new auditing markets in a variety of technical fields.

Given that there is no longer one single auditor, the coordinated pausing model breaks down. Agreements on the terms of coordinated pausing would need to be established between auditing firms, and there is no reason to assume that these would be sufficiently cautious to prevent the severe risk scenarios that the paper is intending to address. In such a world, a new race to the bottom may well begin between the auditors as they seek to attract firms away from their competitors.

There are two things that I can currently imagine would change my mind about this:

If I were to see examples of other industries that were largely audited by one or two firms, I would be much more optimistic that the single auditor model is feasible
If there were to be a set of practical and sound policies that could be implemented between multiple auditing firms, I would be much more convinced that the mutual auditor model could still work with multiple auditors in the market.

Auditors implementing pauses face too many legal risks

Any auditor that is asking frontier labs to pause model deployments, or even research, will face significant legal challenges from their clients if they attempt to enforce such policies. Organisations that attempt to restrict the competitiveness of private firms without very strong grounds for doing so may be held liable for the loss of profit they cause. Any pause will be met with strong challenges for why it was started, as well as challenges against the conditions for its ending. This can be seen often in the finance industry, with lengthy and expensive legal battles ensuing. This disincentivises an auditor to implement such pauses, decreasing their efficacy significantly.

There are significant legal questions to be answered here, and I am not qualified to give opinions here. I would be enthusiastic to see research demonstrating that this issue is less important than I currently believe it to be.

Limitations of the Voluntary Pausing and Pausing Agreement Models

I would first like to state that I agree with the authors of the paper that both the Voluntary Pausing and Pausing Agreement models are valuable intermediate steps to longer term solutions. However, there are a couple of limitations of the models that I don’t believe were addressed in the original paper and I would like to mention here.

Private Deployments

One issue with both of these models is that they do not sufficiently address the risks posed by the private deployments of dangerous models to individual clients of the frontier labs. As such deals are likely to be considered confidential, proprietary information, the enforcers of pauses in either model (the public and other AI labs) are unlikely to be aware of such deployments. Though I do not have financial information to back the following claim up, I think that such private deployments are likely to constitute major proportions of the revenue of frontier labs. As such, the incentives to make risky decisions are higher in such deals.

These risks would be less salient to me if regulations were introduced to force the disclosure of such private deployments to the public, or data showed that private deployments constituted much smaller proportions of revenue than I had imagined.

Contractual Penalties

Similar to the point raised above about auditors enforcing penalties and the anti-trust issues described in the paper, I have strong concerns over the efficacy of contractual penalties as they are described in the Pausing Agreements model. My main concern is that there are very few business incentives for AI firms to maintain and uphold such contractual agreements with their competitors. The point I made at the start of this post about good intentions is relevant here - as much as everyone at these companies wants to do the right thing, this is too often overridden by the nature of the environment they operate in - organisations that are at all driven by the incentives of capital markets cannot be relied on to self regulate when the dangers are so great. Most likely, disagreements on the terms of any such contractual penalties will inevitably arise and they will be quietly dropped to avoid reproach by authorities enforcing antitrust laws.

I am more optimistic about a model where enforcement comes through pausing or revoking memberships to industry bodies, where the enforcer is a related but separate entity to any of its individual members. Work to establish and deepen the relationships between a frontier AI industry body and its members would be valuable.

Observations on Key Research Areas

I am strongly encouraged by the work in this paper that there are feasible options for implementing a coordinated pausing policy in the future. From the paper, I see a few key research areas that require prioritisation before any such policy could be implemented, which I thought were worth listing below.

Model Evaluations

It hardly needs pointing out, but development of effective model evaluation methodologies are a fundamental requirement to the development of any pausing policies. For this and many other reasons, evaluations research and threshold definitions are a must for the industry.

Model Similarity and Research Relevance

For any pauses to be implemented, measures of model similarity must be created. Without them, it will be impossible to define what work at labs needs to be paused. This is probably the single largest bottleneck besides model evaluation research to any such policies being implemented.

Legal Research

Any enforcement of a pause is likely to be met with legal challenge, even targeted against regulators. Research into relevant case studies from other industries, as well as research into the development of strongly binding contracts will be extremely valuable going into the future.

Incident Reporting Schemes

In order for coordinated pausing strategies to work successfully, risk incidents must be correctly identified and reported to relevant organisations. Work to develop practical incident reporting, whistleblowing and safe harbour schemes should be developed as a priority to enable this.

Model Registers and Disclosure Requirements

One key requirement for the success of a pausing policy is the development of model registers. These registers should categorise models by their capabilities, architecture and deployment, and are ideally coordinated by regulators that can enforce disclosure requirements, especially at the pre-training and pre-deployment stages. Specific, practical policy proposals for disclosure and notification schemes should be considered a high priority, as should work to build infrastructure for a register of models and their capabilities.

Open-Sourcing Regulation

Once models become open sourced, work done to restrict their usage will become almost entirely useless. Further research into policy proposals to prevent the open sourcing of frontier models will be important for ensuring that the regulation of proprietary models remains relevant.

Corporate Governance

For pauses to effectively be implemented within organisations, strong corporate governance structures need to be developed. Without this, it is possible that research may be conducted despite the formal position of the company, potentially still leading to dangerous outcomes.

[-]mic6mo21

Great writeup! I recently wrote a brief summary and review of the same paper.

Alaga & Schuett (2023) propose a framework for frontier AI developers to manage potential risk from advanced AI systems, by coordinating pausing in response to models are assessed to have dangerous capabilities, such as the capacity to develop biological weapons.
The scheme has five main steps:
Frontier AI models are evaluated by developers or third parties to test for dangerous capabilities.
If a model is shown to have dangerous capabilities (“fails evaluations”), the developer pauses training and deployment of that model, restricts access to similar models, and delays related research.
Other developers are notified whenever a dangerous model is discovered, and also pause similar work.
The failed model's capabilities are analyzed and safety precautions are implemented during the pause.
Developers only resume paused work once adequate safety thresholds are met.
The report discusses four versions of this coordination scheme:
Voluntary – developers face public pressure to evaluate and pause but make no formal commitments.
Pausing agreement – developers collectively commit to the process in a contract.
Mutual auditor – developers hire the same third party to evaluate models and require pausing.
Legal requirements – laws mandate evaluation and coordinated pausing.
The authors of the report prefer the third and fourth versions, as they are most effective.
Strengths and weaknesses
The report addresses the important and underexplored question of what AI labs should do in response to evaluations finding dangerous capabilities. Coordinated pausing is a valuable contribution to this conversation. The proposed scheme seems relatively effective and potentially feasible, as it aligns with the efforts of the dangerous-capability evaluation teams of OpenAI and the Alignment Research Center.
A key strength is the report’s thorough description of multiple forms of implementation for coordinated pausing. This ranges from voluntary participation relying on public pressure, to contractual agreements among developers, shared auditing arrangements, and government regulation. Having flexible options makes the framework adaptable and realistic to put into practice, rather than a rigid, one-size-fits-all proposal.
The report acknowledges several weaknesses of the proposed framework, including potential harms from its implementation. For example, coordinated pausing could provide time for competing countries (such as China) to “catch up,” which may be undesirable from a US policy perspective. Pausing could mean that capabilities rapidly increase after a pause, through applying algorithmic improvements discovered during the pause, which may be less safe than a “slow takeoff.”
Additionally, the paper acknowledges concerns with feasibility, such as the potential that coordinated pausing may violate US and EU antitrust law. As a countermeasure, it suggests making “independent commitments to pause without discussing them with each other,” with no retaliation against non-participating AI developers, but defection would seem to be an easy option under such a scheme. It recommends further legal analysis and consultation regarding this topic, but the authors are not able to provide assurances regarding the antitrust concern. The other feasibility concerns – regarding enforcement, verifying that post-deployment models are the same as evaluated models, potential pushback from investors, and so on – are adequately discussed and appear possible to overcome.
One weakness of the report is that the motivation for coordinated pausing is not presented in a compelling manner. The report provides twelve pages of implementation details before explaining the benefits. These benefits, such as “buying more time for safety research,” are indirect and may not be persuasive to a skeptical reader. AI lab employees and policymakers often take a stance that technological innovation, especially in AI, should not be hindered unless otherwise demonstrated. Even if the report intends to take a balanced perspective rather than advocating for the proposed framework, the arguments provided in favor of the framework seem weaker than what is possible.
It seems intuitive that deployment of a dangerous AI system should be halted, though it is worth clearly noting that “failing” a dangerous-capability evaluation does not necessarily mean that the AI system in practice has dangerous capability. However, it is not clear why the development of such a system must also be paused. As long as the dangerous AI system is not deployed, further pretraining of the model does not appear to pose risks. AI developers may be worried about falling behind competitors, so the costs incurred from this requirement must be clearly motivated for them to be on board.
While the report makes a solid case for coordinated pausing, it has gaps around considering additional weaknesses of the framework, explaining its benefits, and solving key feasibility issues. More work may be done to strengthen the argument to make coordinated pausing more feasible.

[-]Matthew Wearden6mo10

Thank you for sharing this!

LESSWRONG
LW