I agree that establishing a cooperative mindset in the AI / ML community is very important. I'm less sure if economic incentives or government policy are a realistic way to get there. Can you think of a precedent or example for such external incentives in other areas?
Also, collaboration between the researchers that develop AI may be just one piece of the puzzle. You could still get military arms races between nations even if most researchers are collaborative. If there are several AI systems, then we also need to ensure cooperation between these AIs, which isn't necessarily the same as cooperation between the researchers that build them.
Can you think of a precedent or example for such external incentives in other areas?
Good question. I was somewhat inspired by civil engineering, where it's my understanding that there is a rather strong culture of safety, driven in part by various historical accidents that killed a lot of people and caught the attention of regulators / insurers / etc. I don't actually know exactly how many of the resulting reforms were a result of external pressure vs. people just generally shaping up and not wanting to kill more people, but given how much good intentions may be neglected in the face of bad incentives (AFAIK, several historical accidents [e.g.] were known to be disasters just waiting to happen well ahead of time), I would guess that external incentives / consequences have played a major role in them.
My vague impression was that FHI was skeptical about the value of openness in AI development. Is that incorrect? AI strikes me as "dual use" technology analogous to nuclear physics (can be used for both nuclear power plants (benign) and nuclear bombs (not benign)). Not sure whether it's good to make dual use technologies public. Also, if you're a believer in the capabilities vs alignment model, it seems like maybe you'd want people working on alignment to collaborate more (in order to speed alignment research) but you'd prefer for those working on capabilities to be fumbling alone in the dark?
I suppose one bright spot is that insofar as ML researchers believe in something like the capabilities vs alignment model, they will be naturally be incentivized to keep capabilities research to themselves, but publish alignment research in order to help someone else avoid triggering an unfortunate accident?
Edit: sorry, I see you addressed this, unfortunately I don't see a delete comment button
I am optimistic about developing AGI collaboratively, especially through AI researchers cooperating. I'm not sure whether external incentives from government are the right way to achieve this -- it seems likely that such regulation would be aimed at the wrong problems if it originated from government and not from AI researchers themselves. I'm more optimistic about some AI researchers developing guidelines and incentive structures themselves, that researchers buy into voluntarily, that maybe later get codified into law by governments, or adopted by companies for their AI research.
I would definitely want AI developers to participate in figuring out this stuff! Like I said in the post, the system is supposed to support them in creating the kind of an environment they want, rather than imposing something unwanted from the outside.
That said, voluntary arrangements only work to the extent that everyone has an incentive to follow them. For things like arms races, the fact that everyone has an incentive to participate in an arms race even if nobody wanted to, preventing people from opting into voluntary arrangements intended to avoid the race, is exactly the problem that this kind of thing is trying to help avoid.
I guess I'm confused about the path by which you hope to get the external incentives to be created. I'm advocating for a voluntary version that some people buy into (but not everyone because incentives), that later gets codified into actual external incentives (eg. law), whereas I read your post as suggesting that we push for external incentives like law, and when we're drafting them we make sure to get input from AI researchers. These seem very different to me.
I guess I'm confused about the path by which you hope to get the external incentives to be created.
The way I'm thinking of it, this sentence seems to imply that AI development wouldn't be facing any external incentives right now. But everyone is always operating under some set of external incentives, which unavoidably shape their behavior. And if they're not intentionally designed ones, they are likely to be bad ones.
So the way I'd phrase it now, my proposal is neither "push for external incentives like law and get input from AI researchers in drafting them", nor "establish voluntary codes to buy into and make them into external incentives later". Rather it's "get the AI researchers to give their input on what the current external incentives are like and how they could be better, and then use whatever policy instruments are available to shift those incentives to be more like the better ones".
E.g. to take the specific example of liability legislation; there are already existing laws that are going to be applied if an AI system gets out of control and kills people. Is that existing legal framework, and the way it's likely to be applied, good or bad for encouraging the kinds of behavior we'd like to see from AI developers? I don't know, but at least I know that it was never designed with this specific intent in mind, so there may be things to improve on there.
I agree you want to choose according to your current utility function (preferences), and this explains all of the examples you have in basically the same way you explained it (that modifying taste is not the same as modifying the utility function/preferences).
I can see this being a problem, though. What is to stop us from doing the following. Let Ut(x) be my true utility function and let Up(y) be my so called *practical* utility function. Furthermore, let Up(y)=x so that Ut(x)=Ut(Up(y)). If we agree that changing the taste function doesn't alter the utility function, then changing Up(y)shouldn't alter my utility function --- but this is all it is based on!
You seem to be taking the position that as long as you can define your utility function/preferences in terms of another function, it's fine to change that function. I agree this seems wrong. In the apple/chocolate/banana case, I prefer worlds in which I have the subjective feeling of good taste. That preference is not getting modified. In this new case, I care directly about y
, so you can't just go and modify Up(y)
and expect not to be changing my preferences.
Btw, side note: If you aren't dealing with probability (as in this post), then "having a utility function" just means "having transitive preferences about all possible world-histories" (or world-states if you don't care about actions or paths to states). So it's worth thinking about this in terms of transitive preferences. I think that makes my argument clearer, and probably would help with other issues you raise in the post.
Blockchains may offer a model solution for the incentive-alignment/privacy problems you mention. Incentives are integrated into the network and encourage good-faith actors to cooperate honestly. Zero-knowledge proofs could enable ML algorithms to interact and compute data without compromising privacy.
In "An AI Race for Strategic Advantage: Rhetoric and Risks" (2018), Stephen Cave and Seán S ÓhÉigeartaigh argue that we should try to promote a cooperative AI narrative over a competitive one:
So.
In order to make future AGI projects more collaborative and co-operation focused, could we create incentives (via e.g. government policy) that would push today's machine learning researchers towards more collaborative attitudes?
This might seem irrelevant, given that today's machine learning researchers are mostly not working on AGI. However, external incentives can shape the internal norms of a culture. For example, holding companies responsible for accidents at their workplaces, means that they have an incentive to reduce accidents, which means that they have an incentive to create an internal culture of safety where everyone takes safety concerns seriously. And once such a culture is established, it will start having a life of its own, being propagated to future workers through the various sociological mechanisms by which norms and cultures normally propagate themselves, and may stay alive even if there's a change to the external norms which led to that culture being originally created.
So my idea is something like:
In a discussion, James Miller suggested that - among other things - codes of conduct, intellectual property laws, antitrust laws, tort law, and international agreements/tariffs might be policy tools which could be used to shape external incentives.
A possible addition that comes to mind might be privacy laws; at least current ML systems require a lot of data, and there have been a lot of demands (e.g.) to reign in the ability of companies to collect information on people - information which could, among other things, be used to train ML systems. And e.g. the GDPR (which might be enforced more strictly after the recent Facebook revelations) establishes things like "Automated individual decision-making, including profiling [...] is contestable [...] Citizens have rights to question and fight significant decisions that affect them that have been made on a solely-algorithmic basis"; to the extent that decisions made by algorithms can be contested by the people who are affected by them, companies may have an incentive to be cooperative and e.g. develop the kinds of standards that they can follow in order to ensure that decisions made by their systems will be held up in court. (Doshi-Velez et al. (2017) is a paper attempting to establish some kinds of standards for how a legal right to explanation from AI systems could be met.)
Some other thoughts:
It might be worth thinking about a more specific definition for "cooperativeness". For instance, one form of "cooperativeness" might be openness in AI development. Openness seems worth distinguishing from other forms of cooperation, since while general cooperativeness may make things safer, openness may make them less safe. But I would intuitively think that non-openness would be hard to reconcile with cooperativeness. Maybe it's unavoidable for cooperativeness to lead to at least some degree of openness. (Bostrom (2017) notes on page 9 that openness could make AI development more competitive, but also more cooperative, if it removes incentives for competition: "The more that different potential AI developers (and their backers) feel that they would fully share in the benefits of AI even if they lose the race to develop AI first, the less motive they have for prioritizing speed over safety, and the easier it should be for them to cooperate with other parties to pursue a safe and peaceful course of development of advanced AI designed to serve the common good.")
As Baum (2017) points out, it's important to consider how AI developer communities react to external rules: if e.g. safety regulations are viewed as pointless annoyances, that may cause a lot of resentment. And it's easy to adopt a patronizing mindset in thinking about this: "how could we get AI developers to understand that they shouldn't destroy the world?". We shouldn't think about this that way (that's not a particularly collaborative mindset 😉).
Rather, the better mindset is something like this: most people don't want to destroy the world, AI developers included. But it's easy to end up in situations where everyone has a rational incentive to do something that nobody wants. So what we want is to collaboratively design mechanisms that end up supporting people in better fulfilling their own preference of not destroying the world.
(thanks to James Miller as well as my colleagues at the Foundational Research Institute for discussions that contributed to this article)