Avoiding AI Races Through Self-Regulation

LESSWRONG
is fundraising!
LW

Avoiding AI Races Through Self-Regulation — LessWrong

Summary

The first group to build artificial general intelligence or AGI stands to gain a significant strategic and market advantage over competitors, so companies, universities, militaries, and other actors have strong incentives to race to build AGI first. An AGI race would be dangerous, though, because it would prioritize capabilities over safety and increase the risk of existential catastrophe. A self-regulatory organization (SRO) for AGI may be able to change incentives to favor safety over capabilities and encourage cooperation rather than racing.

Introduction

The history of modern technology has often been a history of technological races. A race starts when a new technology becomes cost-effective, then companies, states, and other actors hurry to develop the technology in hopes of capturing market share before others can, gaining a strategic advantage over competitors, or otherwise benefitting from a first-mover position. Some notable examples of recent technological races include races over rockets, personal computers, and DNA sequencing.

Although most of these races have been generally beneficial for society by quickly increasing productivity and expanding the economy, others, like races over weapons, generally make us less safe. In particular the race to build nuclear weapons dramatically increased humanity’s capability to extinguish itself and exposed us to new existential risks that we previously did not face. This means that technological races can harm as much as they can help, and nowhere is that more true than in the burgeoning race to build AI.

In particular we may be near the start of a race to build artificial general intelligence or AGI thanks to recent advances in deep learning. And unlike existing narrow AI that outperforms humans but only on very specific tasks, AGI will be as good or better than humans at all tasks such that an AGI could replace a human in any context. The promise of replacing humans with AGI is extremely appealing to many organizations since AGI could be cheaper, more productive, and more loyal than humans, so the incentives to race to build the first AGI are strong, but the very capabilities that make AGI so compelling also make them extremely dangerous, and we may actually be better off not building AGI at all if we cannot build them safely!

The risks of AGI have been widely discussed, but we may briefly summarize them by saying AGI will eventually become more capable than humans, AGI may not necessarily share human values, and so AGI may eventually act against humanity’s wishes in ways that we will be powerless to prevent. This means AGI presents a new existential risk similar to but far more unwieldy than the one created by nuclear weapons, and unlike nuclear weapons that can be controlled with relatively prosaic methods, controlling AGI demands solving the much harder problem of value aligning an “alien” agent. Thus it’s especially dangerous if there is a race for AGI since it will create incentives to build capabilities out in advance of our ability to control them due to the likely tradeoff between capabilities and safety.

This all suggests that building safe AGI requires in part resolving the coordination problem of avoiding an AGI race. To that end we consider the creation of a self-regulatory organization for AGI to help coordinate AGI research efforts to ensure safety and avoid a race.

An SRO for AGI

Self-regulatory organizations (SROs) are non-governmental organizations (NGOs) setup by companies and individuals in an industry to serve as voluntary regulatory bodies. Although they are sometimes granted statutory power by governments, usually they operate as free associations that coordinate to encourage participation by actors in their industries, often by shunning those who do not participate and conferring benefits to those that do. They are especially common in industries where there is either a potentially adversarial relationship with society, like advertising and arms, or a safety concern, like medicine and engineering. Briefly reviewing the form and function of some existing SROs:

TrustArc (formerly TRUSTe) has long provided voluntary certification services to web companies to help them assure the public that companies are following best practices that allow consumers to protect their privacy. They have been successful enough to, outside the EU, keep governments from much regulating online privacy issues.
The US Green Building Council offers multiple levels of LEED certification to provide both targets and proof to the public that real estate developers are protecting environmental commons.
The European Adversing Standards Alliance and the International Council for Ad Self-Regulation encourage advertisers to self-regulate and adopt voluntary standards that benefit the public to avoid the imposition of potentially less favorable and more fractured governmental ad regulation.
The American Medical Association, the American Bar Association, the National Society of Professional Engineers, and the National Association of Realtors are SROs that function as de facto official regulators of their industries in the United States. They act to ensure doctors, lawyers, engineers, and realtors, respectively, follow practices that serve the public interest in the absence of more comprehensive government regulation.
Although governments have progressively taken a stronger hand in financial regulation over the past 100 years, many segments of the financial industry rely in part on SROs to shape their actions and avoid unwanted legislative regulation.

Currently computer programmers, data scientists, and other IT professionals are largely unregulated except insofar as their work touches other regulated industries. There are professional associations like the IEEE and ACM and best-practice frameworks like ITIL, but otherwise there are no SROs overseeing the work of companies and researchers pursuing either narrow AI or AGI, yet as outlined above narrow AI and especially AGI are areas where there are many incentives to build capabilities that may unwittingly violate societal preferences and damage the public commons. Consequently, there may be reason to form an AGI SRO. Some reasons in favor:

An SRO could offer certification of safety and alignment efforts being taken by AGI researchers.
An SRO may be well positioned to reduce the risk of an AGI race by coordinating efforts that would otherwise result in competition.
An SRO could encourage AGI safety in industry and academia while being politically neutral (not tied to a single university, company, or nation).
An SRO may allow AGI safety experts to manage the industry rather than letting it fall to other actors who may be less qualified or have different concerns that do not as strongly include prevention of existential risks.
An SRO could act as a “clearinghouse” for AGI safety research funding.
An SRO could give greater legitimacy to prioritizing AGI safety efforts among capabilities researchers.

Some reasons against:

An SRO might form a de facto “guild” and keep out qualified researchers.
An SRO could create the appearance that more is being done than really is and thus disincentivize safety research.
An SRO could relatedly promote the wrong incentives and actually result in less safe AGI.
An SRO might divert funding and effort from technical research in AGI safety.

On the whole this suggests an SRO for AGI would be net positive so long as it were well managed, focused on promoting safety, and responsive to developments in AGI safety research. In particular it may offer a way to avoid an AGI race by changing incentives to avoid the game theoretic equilibriums that cause races.

Using an SRO to Reduce AGI Race Risks

To see how an SRO could reduce the risk of an AGI race, consider the following simplified example.

Suppose that there are two entities trying to build AGI — company A and company B. It costs $1 trillion to develop AGI, a cost both companies must pay, and the market for AGI is worth $4 trillion. If one company beats the other to market it will capture the entire market thanks to its first-mover advantage, netting the company $3 trillion in profits, and the company that is last to market earns no revenue and loses $1 trillion. If the companies tie, though, they split the market and each earn $1 trillion. This scenario yields the following payout matrix:

+-----------------+-----------------+----------------+
| A/B Payout | Company A First | Company A Last |
+-----------------+-----------------+----------------+
| Company B First | 1/1 | -1/3 |
| Company B Last | 3/-1 | 1/1 |
+-----------------+-----------------+----------------+

This tells us that the expected value of trying to win is 0.5(-1)+0.5(3)=1, the expected value of tying is 0.5(1)+0.5(1)=1, and the expected value of competing is 0.25(1)+0.25(1)+0.25(-1)+0.25(3)=1, thus companies A and B should be indifferent between trying to win and tying. Given this it seems it should be easy to convince both companies that they should cooperate for a tie and coordinate their efforts so that they can focus on safety, but this immediately creates a new game where each company must choose whether to honestly cooperate or pretend to cooperate and race in secret. If both race or both cooperate their expected values remain 1, but if one races and the other cooperates then the racer stands to win at the expense of the cooperator.

The payout matrix for this new game:

+----------------------+-----------------+----------------------+
| A/B Payout | Company A Races | Company A Cooperates |
+----------------------+-----------------+----------------------+
| Company B Races | 1/1 | -1/3 |
| Company B Cooperates | 3/-1 | 1/1 |
+----------------------+-----------------+----------------------+

In this case the expected value of racing is 0.5(1)+0.5(3)=2 and the expected value of cooperating is 0.5(-1)+(0.5)1=0, so it seems both companies should be inclined to race lest they lose by cooperating when the other company races, and an easy way to get ahead in the race is to ignore safety in favor of capabilities. Unfortunately for us this game only considers the financial gains to be had by the companies and ignores the externalities unsafe AGI impose, which suggest a rather different set of outcomes assuming safety is always ignored when racing and always attained when cooperating:

+----------------------+-----------------+----------------------+
| Humanity's Payout | Company A Races | Company A Cooperates |
+----------------------+-----------------+----------------------+
| Company B Races | -∞ | -∞ |
| Company B Cooperates | -∞ | ∞ |
+----------------------+-----------------+----------------------+

Thus we are all better off if both companies cooperate so they do not have to ignore safety, but the companies are not incentivized to do this, so if we wish to change the equilibrium of the AGI race so that both companies cooperate we must act to change the payoff matrix. One way to do this would be with an SRO for AGI which could impose externalities on the companies by various methods including:

inspections to demonstrate to the other company that they are cooperating
contractual financial penalties that would offset any gains from defecting
social sanctions via public outreach that would reduce gains from defecting
sharing discoveries between companies
required shutdown of any uncooperatively built AGI

In this example we need penalties worth in excess of $2 trillion imposed on companies that race to make them prefer to cooperate, which in the real world would likely require the combination of several strategies to make sure the bar is cleared even if one or several sources of penalties fail. Some of these strategies may also require enforcement by state actors, which further complicates the situation since militaries may also be participating in the race, and suggests an SRO may be insufficient to prevent an AGI race unless it is partnered with an intergovernmental organization, such as the United Nations (cf. international bodies involved in enforcing weapons treaties). That said a more traditional SRO could act faster with fewer political entanglements, so there seems to be space for both an SRO focused on industrial and academic AGI research and an intergovernmental organization working in collaboration with it to adjust the incentives of state actors.

The key takeaway is that even if an SRO is not the best way to modify the equilibrium of the AGI race, there is a need for some organization to impose externalities that reduce the chance of an AGI race by making it less appealing than when externalities can be ignored. SROs provide a clear template for this sort of organization, though addressing the AGI race specifically may require innovative policy solutions outside of those normally taken by SROs. An SRO for AGI thus stands likely to be a key component in avoiding an AGI race if it is willing to evolve in ways that help it address the issue.

Conclusion

An SRO for AGI is likely valuable, and may be particularly helpful in counteracting the incentives to race to develop AGI. Although there is currently no SRO for AGI, there are several organizations that are already positioned to take on an SRO role if they so chose, although some more than others. They include:

If none of these groups wish to take on the task then creating an SRO for AGI is likely a neglected cause for those concerned about the existential risks posed by AGI. It is the recommendation of the present work that either an existing organization or a new one take up the task of serving as an SRO for AGI to reduce the risk of an AGI race and otherwise foster safety in AGI research.

NB: I wrote this as part of the “Solving the AI Race” round of the General AI Challenge.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

7

Avoiding AI Races Through Self-Regulation

7

Ω 2

7

Ω 2

Summary

Introduction

An SRO for AGI

Using an SRO to Reduce AGI Race Risks

Conclusion