[ Question ]

Where are people thinking and talking about global coordination for AI safety?

byWei_Dai1mo22nd May 20198 comments

86


Many AI safety researchers these days are not aiming for a full solution to AI safety (e.g., the classic Friendly AI), but just trying to find good enough partial solutions that would buy time for or otherwise help improve global coordination on AI research (which in turn would buy more time for AI safety work), or trying to obtain partial solutions that would only make a difference if the world had a higher level of global coordination than it does today.

My question is, who is thinking directly about how to achieve such coordination (aside from FHI's Center for the Governance of AI, which I'm aware of) and where are they talking about it? I personally have a bunch of questions related to this topic (see below) and I'm not sure what's a good place to ask them. If there's not an existing online forum, it seems a good idea to start thinking about building one (which could perhaps be modeled after the AI Alignment Forum, or follow some other model).

  1. What are the implications of the current US-China trade war?
  2. Human coordination ability seems within an order of magnitude of what's needed for AI safety. Why the coincidence? (Why isn’t it much higher or lower?)
  3. When humans made advances in coordination ability in the past, how was that accomplished? What are the best places to apply leverage today?
  4. Information technology has massively increased certain kinds of coordination (e.g., email, eBay, Facebook, Uber), but at the international relations level, IT seems to have made very little impact. Why?
  5. Certain kinds of AI safety work could seemingly make global coordination harder, by reducing perceived risks or increasing perceived gains from non-cooperation. Is this a realistic concern?
  6. What are the best intellectual tools for thinking about this stuff? Just study massive amounts of history and let one's brain's learning algorithms build what models it can?

86

New Answer
Ask Related Question
New Comment

5 Answers

My question is, who is thinking directly about how to achieve such coordination (aside from FHI's Center for the Governance of AI, which I'm aware of) and where are they talking about it?

OpenAI has a policy team (this 80,000 Hours podcast episode is an interview with three people from that team), and I think their research areas include models for coordination between top AI labs, and improving publication norms in AI (e.g. maybe striving for norms that are more like those in computer security, where people are expected to follow some responsible disclosure process when publishing about new vulnerabilities). For example, the way OpenAI is releasing their new language model GPT-2 seems like a useful way to learn about the usefulness/feasibility of new publication norms in AI (see the "Release Strategy" section here).

I think related work is also being done at the Centre for the Study of Existential Risk (CSER).

Last year there was a prize for papers and the authors spoke on a panel about this subject at HLAI 2018.

I want to focus on your second question: "Human coordination ability seems within an order of magnitude of what's needed for AI safety. Why the coincidence? (Why isn’t it much higher or lower?)"

Bottom line up front: Humanity has faced a few potentially existential crises in the past; world wars, nuclear standoffs, and the threat of biological warfare. The fact that we survived those, plus selection bias, seems like a sufficient explanation of why we are near the threshold for our current crises.

I think this is a straightforward argument. At the same time, I'm not going to get deep into the anthropic reasoning, which is critical here, but I'm not clear enough on to discuss clearly. (Side note: Stuart Armstrong recently mentioned to me that there are reasons I'm not yet familiar with for why anthropic shadows aren't large, which is assumed in the below model.)

If we assume that large scale risks are distributed in some manner, such as from Bostrom's urn of technologies (See: Vulnerable World Hypothesis - PDF,) we should expect that the attributes of the problems, including the coordination needed to withstand / avoid them, are distributed with some mean and variance. Whatever that mean and variance is, we expect that there should be more "easy" risks (near or below the mean) than "hard" ones. Unless the tail is very, very fat, this means that we are likely to see several moderate risks before we see more extreme ones. For a toy model, let's assume risks show up at random yearly, and follow a standard normal distribution in terms of capability needed. If we had capability in the low single digits, we would be wiped out already with high probability. Given that we've come worryingly close, however, it seems clear that we aren't in the high double digits either.

Given all of that, and the selection bias of asking the question when faced with larger risks, I think it's a posteriori likely that most salient risks we face are close to our level of ability to overcome.

A source tells me there's a fair bit of non-public discussion of AGI-safety-relevant strategy/policy/governance issues, but it often takes a while for those discussions to cohere into a form that is released publicly (e.g. in a book or paper), and some of it is kept under wraps due to worries about infohazards (and worries about the unilateralist's curse w.r.t. infohazards).

For question 2, I think the human-initiated nature of AI risk could partially explain the small distance between ability and need. If we were completely incapable of working as a civilization, other civilizations might be a threat, but we wouldn’t have any AIs of our own, let alone general AIs.