Shaping economic incentives for collaborative AGI

Kaj_Sotala

In "An AI Race for Strategic Advantage: Rhetoric and Risks" (2018), Stephen Cave and Seán S ÓhÉigeartaigh argue that we should try to promote a cooperative AI narrative over a competitive one:

The next decade will see AI applied in an increasingly integral way to safety-critical systems; healthcare, transport, infrastructure to name a few. In order to realise these benefits as quickly and safely as possible, sharing of research, datasets, and best practices will be critical. For example, to ensure the safety of autonomous cars, pooling expertise and datasets on vehicle performances across as wide as possible a range of environments and conditions (including accidents and near-accidents) would provide substantial benefits for all involved. This is particularly so given that the research, data, and testing needed to refine and ensure the safety of such systems before deployment may be considerably more costly and time-consuming than the research needed to develop the initial technological capability.

Promoting recognition that deep cooperation of this nature is needed to deliver the benefits of AI robustly may be a powerful tool in dispelling a ‘technological race’ narrative; and a ‘cooperation for safe AI’ framing is likely to become increasingly important as more powerful and broadly capable AI systems are developed and deployed. [...]

There have been encouraging developments promoting the above narratives in recent years. ‘AI for global benefit’ is perhaps best exemplified by the 2017’s ITU summit on AI for Global Good (Butler 2017), although it also features prominently in narratives being put forward by the IEEE’s Ethically Aligned Design process (IEEE 2016), the Partnership on AI, and programmes and materials put forward by Microsoft, DeepMind and other leading companies. Collaboration on AI in safety-critical settings is also a thematic pillar for the Partnership on AI2 . Even more ambitious cooperative projects have been proposed by others, for example the call for a ‘CERN for AI’ from Professor Gary Marcus, through which participants “share their results with the world, rather than restricting them to a single country or corporation” (Marcus 2017).

So.

In order to make future AGI projects more collaborative and co-operation focused, could we create incentives (via e.g. government policy) that would push today's machine learning researchers towards more collaborative attitudes?

This might seem irrelevant, given that today's machine learning researchers are mostly not working on AGI. However, external incentives can shape the internal norms of a culture. For example, holding companies responsible for accidents at their workplaces, means that they have an incentive to reduce accidents, which means that they have an incentive to create an internal culture of safety where everyone takes safety concerns seriously. And once such a culture is established, it will start having a life of its own, being propagated to future workers through the various sociological mechanisms by which norms and cultures normally propagate themselves, and may stay alive even if there's a change to the external norms which led to that culture being originally created.

So my idea is something like:

figure out the kinds of external incentives that would affect machine learning companies and research that's happening today, pushing it in a more collaborative direction
implementing these kinds of incentives via the right policy, will cause the field to more generally adopt the kinds of values and norms where collaboration is seen as a good thing
to the extent that the field which ends up developing AGI is a descendant of the field that does AI research today, the collaborative norms and values of today's field will be inherited by that future field, shifting their prevailing attitudes away from "arms race" framings and increasing the chances of AGI being developed collaboratively

In a discussion, James Miller suggested that - among other things - codes of conduct, intellectual property laws, antitrust laws, tort law, and international agreements/tariffs might be policy tools which could be used to shape external incentives.

A possible addition that comes to mind might be privacy laws; at least current ML systems require a lot of data, and there have been a lot of demands (e.g.) to reign in the ability of companies to collect information on people - information which could, among other things, be used to train ML systems. And e.g. the GDPR (which might be enforced more strictly after the recent Facebook revelations) establishes things like "Automated individual decision-making, including profiling [...] is contestable [...] Citizens have rights to question and fight significant decisions that affect them that have been made on a solely-algorithmic basis"; to the extent that decisions made by algorithms can be contested by the people who are affected by them, companies may have an incentive to be cooperative and e.g. develop the kinds of standards that they can follow in order to ensure that decisions made by their systems will be held up in court. (Doshi-Velez et al. (2017) is a paper attempting to establish some kinds of standards for how a legal right to explanation from AI systems could be met.)

Some other thoughts:

It might be worth thinking about a more specific definition for "cooperativeness". For instance, one form of "cooperativeness" might be openness in AI development. Openness seems worth distinguishing from other forms of cooperation, since while general cooperativeness may make things safer, openness may make them less safe. But I would intuitively think that non-openness would be hard to reconcile with cooperativeness. Maybe it's unavoidable for cooperativeness to lead to at least some degree of openness. (Bostrom (2017) notes on page 9 that openness could make AI development more competitive, but also more cooperative, if it removes incentives for competition: "The more that different potential AI developers (and their backers) feel that they would fully share in the benefits of AI even if they lose the race to develop AI first, the less motive they have for prioritizing speed over safety, and the easier it should be for them to cooperate with other parties to pursue a safe and peaceful course of development of advanced AI designed to serve the common good.")

As Baum (2017) points out, it's important to consider how AI developer communities react to external rules: if e.g. safety regulations are viewed as pointless annoyances, that may cause a lot of resentment. And it's easy to adopt a patronizing mindset in thinking about this: "how could we get AI developers to understand that they shouldn't destroy the world?". We shouldn't think about this that way (that's not a particularly collaborative mindset 😉).

Rather, the better mindset is something like this: most people don't want to destroy the world, AI developers included. But it's easy to end up in situations where everyone has a rational incentive to do something that nobody wants. So what we want is to collaboratively design mechanisms that end up supporting people in better fulfilling their own preference of not destroying the world.

(thanks to James Miller as well as my colleagues at the Foundational Research Institute for discussions that contributed to this article)

I agree that establishing a cooperative mindset in the AI / ML community is very important. I'm less sure if economic incentives or government policy are a realistic way to get there. Can you think of a precedent or example for such external incentives in other areas?

Also, collaboration between the researchers that develop AI may be just one piece of the puzzle. You could still get military arms races between nations even if most researchers are collaborative. If there are several AI systems, then we also need to ensure cooperation between these AIs, which isn't necessarily the same as cooperation between the researchers that build them.

Can you think of a precedent or example for such external incentives in other areas?

Good question. I was somewhat inspired by civil engineering, where it's my understanding that there is a rather strong culture of safety, driven in part by various historical accidents that killed a lot of people and caught the attention of regulators / insurers / etc. I don't actually know exactly how many of the resulting reforms were a result of external pressure vs. people just generally shaping up and not wanting to kill more people, but given how much good intentions may be neglected in the face of bad incentives (AFAIK, several historical accidents [e.g.] were known to be disasters just waiting to happen well ahead of time), I would guess that external incentives / consequences have played a major role in them.

Neat paper, congrats!

(Btw I think you may have switched your notation from theta to x in section 5.)

My vague impression was that FHI was skeptical about the value of openness in AI development. Is that incorrect? AI strikes me as "dual use" technology analogous to nuclear physics (can be used for both nuclear power plants (benign) and nuclear bombs (not benign)). Not sure whether it's good to make dual use technologies public. Also, if you're a believer in the capabilities vs alignment model, it seems like maybe you'd want people working on alignment to collaborate more (in order to speed alignment research) but you'd prefer for those working on capabilities to be fumbling alone in the dark?

I suppose one bright spot is that insofar as ML researchers believe in something like the capabilities vs alignment model, they will be naturally be incentivized to keep capabilities research to themselves, but publish alignment research in order to help someone else avoid triggering an unfortunate accident?

Edit: sorry, I see you addressed this, unfortunately I don't see a delete comment button

I am optimistic about developing AGI collaboratively, especially through AI researchers cooperating. I'm not sure whether external incentives from government are the right way to achieve this -- it seems likely that such regulation would be aimed at the wrong problems if it originated from government and not from AI researchers themselves. I'm more optimistic about some AI researchers developing guidelines and incentive structures themselves, that researchers buy into voluntarily, that maybe later get codified into law by governments, or adopted by companies for their AI research.

I would definitely want AI developers to participate in figuring out this stuff! Like I said in the post, the system is supposed to support them in creating the kind of an environment they want, rather than imposing something unwanted from the outside.

That said, voluntary arrangements only work to the extent that everyone has an incentive to follow them. For things like arms races, the fact that everyone has an incentive to participate in an arms race even if nobody wanted to, preventing people from opting into voluntary arrangements intended to avoid the race, is exactly the problem that this kind of thing is trying to help avoid.

I guess I'm confused about the path by which you hope to get the external incentives to be created. I'm advocating for a voluntary version that some people buy into (but not everyone because incentives), that later gets codified into actual external incentives (eg. law), whereas I read your post as suggesting that we push for external incentives like law, and when we're drafting them we make sure to get input from AI researchers. These seem very different to me.

I guess I'm confused about the path by which you hope to get the external incentives to be created.

The way I'm thinking of it, this sentence seems to imply that AI development wouldn't be facing any external incentives right now. But everyone is always operating under some set of external incentives, which unavoidably shape their behavior. And if they're not intentionally designed ones, they are likely to be bad ones.

So the way I'd phrase it now, my proposal is neither "push for external incentives like law and get input from AI researchers in drafting them", nor "establish voluntary codes to buy into and make them into external incentives later". Rather it's "get the AI researchers to give their input on what the current external incentives are like and how they could be better, and then use whatever policy instruments are available to shift those incentives to be more like the better ones".

E.g. to take the specific example of liability legislation; there are already existing laws that are going to be applied if an AI system gets out of control and kills people. Is that existing legal framework, and the way it's likely to be applied, good or bad for encouraging the kinds of behavior we'd like to see from AI developers? I don't know, but at least I know that it was never designed with this specific intent in mind, so there may be things to improve on there.

Ah, I see, that makes sense, thanks for clarifying!

I agree you want to choose according to your current utility function (preferences), and this explains all of the examples you have in basically the same way you explained it (that modifying taste is not the same as modifying the utility function/preferences).

I can see this being a problem, though. What is to stop us from doing the following. Let Ut(x) be my true utility function and let Up(y) be my so called *practical* utility function. Furthermore, let Up(y)=x so that Ut(x)=Ut(Up(y)). If we agree that changing the taste function doesn't alter the utility function, then changing Up(y)shouldn't alter my utility function --- but this is all it is based on!

You seem to be taking the position that as long as you can define your utility function/preferences in terms of another function, it's fine to change that function. I agree this seems wrong. In the apple/chocolate/banana case, I prefer worlds in which I have the subjective feeling of good taste. That preference is not getting modified. In this new case, I care directly about y, so you can't just go and modify Up(y) and expect not to be changing my preferences.

Btw, side note: If you aren't dealing with probability (as in this post), then "having a utility function" just means "having transitive preferences about all possible world-histories" (or world-states if you don't care about actions or paths to states). So it's worth thinking about this in terms of transitive preferences. I think that makes my argument clearer, and probably would help with other issues you raise in the post.

(this looks like it got posted in the wrong place?)

That's where this comment went! Yeah, sorry about that, ignore it.

(I did post this on the correct post, but when I clicked "Submit" it just vanished and I had no idea what happened to it. Somehow it made it's way here.)

Blockchains may offer a model solution for the incentive-alignment/privacy problems you mention. Incentives are integrated into the network and encourage good-faith actors to cooperate honestly. Zero-knowledge proofs could enable ML algorithms to interact and compute data without compromising privacy.

Could you give a more specific example?