Many AI safety researchers these days are not aiming for a full solution to AI safety (e.g., the classic Friendly AI), but just trying to find good enough partial solutions that would buy time for or otherwise help improve global coordination on AI research (which in turn would buy more time for AI safety work), or trying to obtain partial solutions that would only make a difference if the world had a higher level of global coordination than it does today.

My question is, who is thinking directly about how to achieve such coordination (aside from FHI's Center for the Governance of AI, which I'm aware of) and where are they talking about it? I personally have a bunch of questions related to this topic (see below) and I'm not sure what's a good place to ask them. If there's not an existing online forum, it seems a good idea to start thinking about building one (which could perhaps be modeled after the AI Alignment Forum, or follow some other model).

  1. What are the implications of the current US-China trade war?
  2. Human coordination ability seems within an order of magnitude of what's needed for AI safety. Why the coincidence? (Why isn’t it much higher or lower?)
  3. When humans made advances in coordination ability in the past, how was that accomplished? What are the best places to apply leverage today?
  4. Information technology has massively increased certain kinds of coordination (e.g., email, eBay, Facebook, Uber), but at the international relations level, IT seems to have made very little impact. Why?
  5. Certain kinds of AI safety work could seemingly make global coordination harder, by reducing perceived risks or increasing perceived gains from non-cooperation. Is this a realistic concern?
  6. What are the best intellectual tools for thinking about this stuff? Just study massive amounts of history and let one's brain's learning algorithms build what models it can?

New to LessWrong?

New Answer
New Comment

8 Answers sorted by

A source tells me there's a fair bit of non-public discussion of AGI-safety-relevant strategy/policy/governance issues, but it often takes a while for those discussions to cohere into a form that is released publicly (e.g. in a book or paper), and some of it is kept under wraps due to worries about infohazards (and worries about the unilateralist's curse w.r.t. infohazards).

I have since been given access to a sample of such non-public discussions. (The sample is small but I think at least somewhat representative.) Worryingly, it seems that there's a disconnect between the kind of global coordination that AI governance researchers are thinking and talking about, and the kind that technical AI safety researchers often talk about nowadays as necessary to ensure safety.

In short, the Google docs I've seen all seem to assume that a safe and competitive AGI can be achieved at some reasonable level of investment into technical safety, and the main coordination problem is how to prevent a "race to the bottom" whereby some actors try to obtain a lead in AI capabilities by underinvesting in safety. However, current discussion among technical AI safety researchers suggest that a safe and competitive AGI perhaps can't be achieved at any feasible level of investment into technical safety, and at a certain point we'll probably need global coordination to stop, limit, or slow down progress in and/or deployment/use of AI capabilities.

Questions I'm trying to answer now: 1) Is my impression from the limited sample correct? 2) If so, how best to correct this communications gap (and prevent similar gaps in the future) between the two groups of people working on AI risk?

I appreciate how you turned the most useful private info into public conversation while largely minimising the amount of private info that had to become public.

To respond directly, yes, your observation matches my impression of folks working on governance issues who aren’t very involved in technical alignment (with the exception of Bostrom). I have no simple answer to the latter question.

8Rohin Shah4y
Seems right to me, yes. Convince the researchers at OpenAI, FHI and Open Phil, and maybe DeepMind and CHAI, that it's not possible to get safe, competitive AI; then ask them to pass it on to governance researchers.

I have a feeling it's not that simple. See the last part of “Generate evidence of difficulty” as a research purpose on biases. So for example I know at least one person who quit from an AI safety org (in part) because they became convinced that it's too difficult to achieve safe, competitive AI (or at least the approach pursued by the org wasn't going to work). Another person privately told me they have little idea how their research will eventually contribute to a safe, competitive AI, but hasn't written anything like that publicly AFAIK. (And note that I don't actually have that many opportunities to speak privately with other AI safety researchers.) Another thing is that most AI safety researchers probably don't think it's part of their job to "generate evidence of difficulty" so I have to convince them of that first.

Unless these problems are solved, I might be able to convince a few safety researchers to go to governance researchers and tell them they think it's not possible to get safe, competitive AI, but their concerns will probably just be dismissed as outliers. I think a better step forward would be to build a private forum where these kinds of concerns can be more frankly discussed, as well as a culture where doing so is normative. This addresses some of the possible biases and I'm still not sure about the others.

2Rohin Shah4y
This is pretty strongly different from my impressions, but I don't think we could resolve the disagreement without talking about specific examples of people, so I'm inclined to set this aside.
I would guess three main disagreements are: i) are the kinds of transformative AI that we're reasonably likely to get in the next 25 years are unalignable? ii) how plausible are the extreme levels of cooperation Wei Dai wants iii) how important is career capital/credibility? I'm perhaps midway between Wei Dai's view and the median governance view so may be an interesting example. I think we're ~10% likely to get transformative general AI in the next 20 years, and ~6% likely to get an incorrigible one, and ~5.4% likely to get incorrigible general AI that's insufficiently philosophically competent. Extreme cooperation seems ~5% likely, and is correlated with having general AI. It would be nice if more people worked on that, or on whatever more-realistic solutions would work for the transformative unsafe AGI scenario, but I'm happy for some double-digit percentage of governance researchers to keep working on less extreme (and more likely) solutions to build credibility.
My question is, who is thinking directly about how to achieve such coordination (aside from FHI's Center for the Governance of AI, which I'm aware of) and where are they talking about it?

OpenAI has a policy team (this 80,000 Hours podcast episode is an interview with three people from that team), and I think their research areas include models for coordination between top AI labs, and improving publication norms in AI (e.g. maybe striving for norms that are more like those in computer security, where people are expected to follow some responsible disclosure process when publishing about new vulnerabilities). For example, the way OpenAI is releasing their new language model GPT-2 seems like a useful way to learn about the usefulness/feasibility of new publication norms in AI (see the "Release Strategy" section here).

I think related work is also being done at the Centre for the Study of Existential Risk (CSER).

I want to focus on your second question: "Human coordination ability seems within an order of magnitude of what's needed for AI safety. Why the coincidence? (Why isn’t it much higher or lower?)"

Bottom line up front: Humanity has faced a few potentially existential crises in the past; world wars, nuclear standoffs, and the threat of biological warfare. The fact that we survived those, plus selection bias, seems like a sufficient explanation of why we are near the threshold for our current crises.

I think this is a straightforward argument. At the same time, I'm not going to get deep into the anthropic reasoning, which is critical here, but I'm not clear enough on to discuss clearly. (Side note: Stuart Armstrong recently mentioned to me that there are reasons I'm not yet familiar with for why anthropic shadows aren't large, which is assumed in the below model.)

If we assume that large scale risks are distributed in some manner, such as from Bostrom's urn of technologies (See: Vulnerable World Hypothesis - PDF,) we should expect that the attributes of the problems, including the coordination needed to withstand / avoid them, are distributed with some mean and variance. Whatever that mean and variance is, we expect that there should be more "easy" risks (near or below the mean) than "hard" ones. Unless the tail is very, very fat, this means that we are likely to see several moderate risks before we see more extreme ones. For a toy model, let's assume risks show up at random yearly, and follow a standard normal distribution in terms of capability needed. If we had capability in the low single digits, we would be wiped out already with high probability. Given that we've come worryingly close, however, it seems clear that we aren't in the high double digits either.

Given all of that, and the selection bias of asking the question when faced with larger risks, I think it's a posteriori likely that most salient risks we face are close to our level of ability to overcome.

Would the calculation be moved by this being the last crisis we will face?

No, that's implicit in the model - and either *some* crisis requiring higher capacity than we have will overwhelm us and we'll all die (and it doesn't matter which,) or the variance is relatively small so no such event occurs, and/or our capacity to manage risks grows quickly enough that we avoid the upper tail.

Last year there was a prize for papers and the authors spoke on a panel about this subject at HLAI 2018.

Oh interesting, I wasn't aware of this prize. Where are these papers being discussed? It seems like it's mostly in person, at conferences, and through published papers? Are you aware of an online forum similar to LW/AF where such papers and ideas are being discussed?

ETA: Are the papers being discussed, or are people just publishing their own papers and not really commenting on each other's ideas?

4Gordon Seidoh Worley5y
There was some in-person conversation about the papers among us, but that's about it. I've not seen a strong community develop around this so far; mostly people just publish things one-off and then they go into a void where no one builds on each others work. I think this mostly represents the early stage of the field and the lack of anyone very dedicated to it, though, as I got the impression that most of us were just dabbling in this topic because it was near-by things we were already interested in and had some ideas about it.
4Wei Dai5y
Ok, that's what I was afraid of, and what I'm hoping to see change. Since you seem to have thought about this for longer than I have, do you have any suggestions about what to do?
4Gordon Seidoh Worley4y
Having just come back from EA Global in SF I will say I have a much stronger sense that there are a decent number of people hoping to start thinking and talking about coordination for AI safety and there were at least a significant number of people there (maybe as many as 30) talking to each other at the conference about it. I'd now update my answer to say I am more confident that there is some dedicated effort happening in this direction, including from Center for Emerging Technologies, Global Catastrophic Risk Initiative, and others spread out over multiple organizations.

RE the title, a quick list:

  • FHI (and associated orgs)
  • CSER
  • OpenAI
  • OpenPhil
  • FLI
  • FRI
  • GovAI
  • PAI

I think a lot of orgs that are more focused on social issues which can or do arise from present day AI / ADM (automated decision making) technology should be thinking more about global coordination, but seem focused on national (or subnational, or EU) level policy. It seems valuable to make the most compelling case for stronger international coordination efforts to these actors. Examples of this kind of org that I have in mind are AINow and Montreal AI ethics institute (MAIEI).

As mentioned in other comments, there are many private conversations among people concerned about AI-Xrisk, and (IMO, legitimate) info-hazards / unilateralist curse concerns loom large. It seems prudent to make progress on those meta-level issues (i.e. how to engage the public and policymakers on AI(-Xrisk) coordination efforts) as a community as quickly as possible, because:

  • Getting effective AI governance in place seems like it will be challenging and take a long time.
  • There are a rapidly growing number of organizations seeking to shape AI policy, who may have objectives that are counter-productive from the point of view of AI-Xrisk. And there may be a significant first-mover advantage (e.g. via setting important legal or cultural precedents, and framing the issue for the public and policymakers).
  • There is massive untapped potential for people who are not currently involved in reducing AI-Xrisk to contribute (consider the raw number of people who haven't been exposed to serious thought on the subject).
  • Info-hazard-y ideas are becoming public knowledge anyways, on the timescale of years. There may be a significant advantage to getting ahead of the "natural" diffusion of these memes and seeking to control the framing / narrative.

My answers to your 6 questions:

1. Hopefully the effect will be transient and minimal.

2. I strongly disagree. I think we (ultimately) need much better coordination.

3. Good question. As an incomplete answer, I think personal connections and trust play a significant (possibly indispensable) role.

4. I don't know. Speculating/musing/rambling: the kinds of coordination where IT has made a big difference (recently, i.e. starting with the internet) are primarily economic and consumer-faced. For international coordination, the stakes are higher; it's geopolitics, not economics; you need effective international institutions to provide enforcement mechanisms.

5. Yes, but this doesn't seem like a crucial consideration (for the most part). Do you have specific examples in mind?

6. Social science and economics seem really valuable to me. Game theory, mechanism design, behavioral game theory. I imagine there's probably a lot of really valuable stuff on how people/orgs make collective decisions that the stakeholders are satisfied with in some other fields as well (psychology? sociology? anthropology?). We need experts in these fields (esp, I think the softer fields are underrepresented) to inform the AI-Xrisk community about existing findings and create research agendas.

For question 2, I think the human-initiated nature of AI risk could partially explain the small distance between ability and need. If we were completely incapable of working as a civilization, other civilizations might be a threat, but we wouldn’t have any AIs of our own, let alone general AIs.

> When humans made advances in coordination ability in the past, how was that accomplished? What are the best places to apply leverage today?

I am confused by the general lack of interest I've encountered in how joint stock corporations came to be and underwent selection to get us to where we are now. It may be I'm not looking in the right places. I know the founders of Mckinsey are quite interested in this.

Do you have resources on this topic to recommend?

Not really. Although proto joint stock corps existed at the time, The Glorious Revolution and the importation of Dutch commercial trading practices was a significant event.
4. Information technology has massively increased certain kinds of coordination (e.g., email, eBay, Facebook, Uber), but at the international relations level, IT seems to have made very little impact. Why?

I note the coordination is entirely at a lower-level than those companies: mostly individuals are using these services for coordination, as well as small groups. It seems like coordination innovations aren't bottom up, but rather top-down (even if the IT examples are mostly opt-in). This seems to match other large coordination improvements, like empire, monotheism, or corporations. There is no higher level of abstraction than governments from which to improve international relations, it seems to me.

Quite separately, we could ask: what are the specific challenges in international relations that IT could address? The problems mostly revolve around questions of trust, questions of the basic competence of human agents (diplomats, ambassadors, heads of state, etc), and fundamental conflicts of interest. International relations has an irreducible component of face-to-face personal relationships, so I would expect tools built around that or to facilitate it to be the most relevant.

That being said, it's also clear that Facebook and Uber aren't even trying to target problems related to international relations. We know contracting with multiple governments is achievable, because people like Google, Microsoft, and Palantir all manage it selling IT for intelligence purposes. Dominic Cummings has a blog post High performance government, ‘cognitive technologies’, Michael Nielsen, Bret Victor, & ‘Seeing Rooms’ that speculates about how international relations could be improved by making the stupendous complexity of the information at work more readily available to decision makers, both for educational purposes and in real time. Maybe there would be an opportunity for a Situation Room Company, or similar. Following on the personal relationship observation, perhaps something like Salesforce-but-for-diplomacy would have some value.