[ Question ]

Does it become easier, or harder, for the world to coordinate around not building AGI as time goes on?

by elityre3 min read29th Jul 201927 comments


Coordination / Cooperation

(Or, is coordination easier in a long timeline?)

It seems like it would be good if the world could coordinate to not build AGI. That is, at some point in the future, when some number of teams will have the technical ability to build and deploy and AGI, but they all agree to voluntarily delay (perhaps on penalty of sanctions) until they’re confident that humanity knows how to align such a system.

Currently, this kind of coordination seems like a pretty implausible state of affairs. But I want to know if it seems like it becomes more or less plausible as time passes.

The following is my initial thinking in this area. I don’t know the relative importance of the factors that I listed, and there’s lots that I don’t understand about each of them. I would be glad for…

  • Additional relevant factors.
  • Arguments that some factor is much more important than the others.
  • Corrections, clarifications, or counterarguments to any of this.
  • Other answers to the question, that ignore my thoughts entirely.

If coordination gets harder overtime, that’s probably because...

  • Compute increases make developing and/or running an AGI cheaper. The most obvious consideration is that the cost of computing falls each year. If one of the bottlenecks for an AGI project is having large amounts of compute, then “having access to sufficient compute” is a gatekeeper criterion on who can build AGI. As the cost of computing continues to fall, more groups will be able to run AGI projects. The more people who can build an AGI, the harder it becomes to coordinate all of them into not deploying it.
    • Note that It is unclear to what degree there is currently, or will be, a hardware overhang. If someone in 2019 could already run an AGI, on only $10,000 worth of AWS, if only they knew how, then the cost of compute is not relevant to the question of coordination.
  • The number of relevant actors increases. If someone builds an AGI in the next year, I am reasonably confident that that someone will be Deep Mind. I expect that in 15 years, if I knew that AGI would be developed one year from then, it will be much less overdetermined which group is going to build it, because there will be many more well funded AI teams with top talent, and, most likely, none of them will have as strong a lead as Deep Mind currently appears to have.
    • This consideration suggests that coordination gets harder over time. However, this depends heavily on other factors (like how accepted AI safety memes are) that determine how easily Deep Mind could coordinate internally.

If coordination gets easier over time, that’s probably because…

  • AI safety memes become more and more pervasive and generally accepted. It seems that coordination is easier in worlds where it is uncontroversial and common knowledge that an unaligned AGI poses and existential risk, because everyone agrees that they will lose big if anyone builds an AGI.
    • Over the past 15 years, the key arguments of AI safety have gone from being extremely fringe, to a reasonably regarded (if somewhat controversial) position, well inside the overton window. Will this process continue? Will it be commonly accepted by ML researches in 2030, that advanced AI poses and existential threat? Will it be commonly accepted by the leaders of nation-states?
    • What will the perception of safety be in a world where there is another AGI winter? Suppose that narrow ML proves to be extremely useful in a large number of fields, but there’s lots of hype about AGI being right around the corner, then that bubble bursts, and there is broad disinterest in AGI again. What happens to the perception of AI safety? Is there a sense of “It looks like AI Alignment wasn’t important after all”? How cautious will researchers be in developing new AI technologies.
  • [Partial subpoint to the above consideration] Individual AI teams develop more serious info security conscious processes. If some team in Deep Mind discovered AGI today, and the Deep Mind leadership opted to wait to insure safety before deploying it, I don’t know how long it would be until some relevant employees left to build AGI on their own, or some other group (such as a state actor) stole their technology and deployed it.
    • I don’t know if this is getting better or worse, overtime.
  • The technologies for maintaining surveillance of would-be AGI developers improve. Coordination is made easier by technologies that aid in enforcement. If surveillance technology improves that seems like it would make coordination easier. As a special case, highly reliable lie detection or mind reading technologies would be a game-changer for making coordination easier.
    • Is there a reason to think that offense will beat defense in this area? Surveillance could get harder over time if the technology for detecting and defeating surveillance outpaces the technology for surveilling.
  • Security technology improves. Similarly, improvements in computer security (and traditional info security), would make it easier for actors to voluntarily delay deploying advanced AI technologies, because they could trust that their competitors (other companies and other nations), wouldn’t be able to steal their work.
    • I don’t know if this is plausible at all. My impression is that the weak point of all security systems is the people involved. What sort of advancements would make the human part of a security system more reliable?


New Answer
Ask Related Question
New Comment

5 Answers

A missing point in favor of coordination getting easier: AI safety as a field seems likely to mature over time, and as it does the argument "let's postpone running this AGI code until we first solve x" may become more compelling, as x increases in legibility and tractability.

I suspect that one of the factors that will make coordinating to not build AGI harder is that the incentive to build AGI will become greater for a larger amount of people. Right now, there's a large class of people who view AI as a benign technology, that will bring about large amounts of economic growth, and that it's effects are going to be widespread and positive. I think this position is best captured by Andew Ng when he says "AI is the new electricity". Likewise, the Whitehouse states "Artificial intelligence holds the promise of great benefits for American workers, with the potential to improve safety, increase productivity, and create new industries we can’t yet imagine.".

However, as time goes by AI capabilities grow and so will public demonstrations of what's possible with AI. This will cause people to revise upwards their beliefs about the impact/power of AI and AGI and drag far more actors into the game. I think that if the Whitehouse shared the views of DeepMind or OpenAI on AGI, they wouldn't hesitate to start the equivalent of a second Manhattan project.

elityre makes a sincere effort to examination of the question from the ground up. But this overlooks the work that's already been done in similar fields. A lot of what has been accomplished with regard to applied genetic research is likely to be transferable, for instance.

More generally, formal methods of safety engineering can provide a useful framework, when adapted flexibly to reflect novel aspects of the question.

2elityre1yAre there existing agreements constraining the deployment of applied genetic research? What are the keywords I should search for, if I want to know more? The only thing I know about this area is that an unaffiliated researcher used CRISPR to modify human embryos, and that most of the field rebuked him for it. This suggests that there are general norms about which experiments are irresponisble to try, but not strong coordination that prevents those experiments from being done.
4Jimdrix_Hendri1yHi elityre, and thanks for responding. I am no certainly no expert, but I do know there is legislation - both national and international - regulating to genetic research. Quick queries to Professor Google delivered two international agreements that appear relevant: o Cartagena Protocol on Biosafety to the Convention on Biological Diversity, and the o International Declaration on Human Genetic Data Both are older documents which establish a kind of precedent for a basic framework for how national governments can cooperate to regulate a rapidly changing and critically dangerous technology. Another place to look would be the evolution of agreements on non-proliferation of weapons of mass destruction; especially in the early years, when the political and technological application of e.g. nuclear weapons was still in flux. Hope this helps.

New consideration: hyperbolic time discounting suggests it gets harder over time. It's easier to lose a benefit that seems far off in the future than to lose a benefit that seems imminent.

(Though usually I think of this consideration as suggesting that coordination right now will be easier than we think.)

6mr-hire1yHyperbolic discounting applies to negative as well, correct? Which means this could go either way.
5rohinmshah1yYeah, that's fair. Nonetheless, I predict that people will find it easier to commit to not building AGI a long time before it is feasible, because of something like "it feels less aversive to lose the potential benefits", but I agree hyperbolic time discounting is not a sufficient explanation for this prediction.
3LawrenceC1ySomething something near mode vs far mode [https://wiki.lesswrong.com/wiki/Near/far_thinking]?
1MakoYass1yIn what situation should a longtermist (a person who cares about people in the future as much as they care about people in the present) ever do hyperbolic discounting
8LawrenceC1yHyperbolic discounting leads to preferences reversals over time: the classic example is always preferring a certain $1 now to $2 tomorrow, but preferring a certain $2 in a week to $1 in 6 days. This is a pretty clear sign that it never "should" be done - An agent with these preferences might find themselves paying a cent to switch from $1 in 6 days to $2 in 7, then, 6 days later, paying another cent to switch it back and get th $1 immediately. However, in practice, even rational agents might exhibit hyperbolic discounting like preferences (though no preference reversals): for example, right now I might not believe you're very trustworthy and worry you might forget to give me money tomorrow. So I prefer $1 now to $2 tomorrow. But if you actually are going to give me $1 in 6 days, I might update to thinking you're quite trustworthy and then be willing to wait another day to get $2 instead. (See this paper for a more thorough discussion of this possibility: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1689473/pdf/T9KA20YDP8PB1QP4_265_2015.pdf [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1689473/pdf/T9KA20YDP8PB1QP4_265_2015.pdf] )
2rohinmshah1yIn theory, never (either hyperbolic time discounting is a bias, and never "should" be done, or it's a value, but one that longtermists explicitly don't share). In practice, hyperbolic time discounting might be a useful heuristic, e.g. perhaps since we are bad at thinking of all the ways that our plans can go wrong, we tend to overestimate the amount of stuff we'll have in the future, and hyperbolic time discounting corrects for that.

The question makes the assumption that "The World" is in any way coordinating or will ever coordinate to build AGI. I posit that "The World" has not and will not coordinate anything.

Are there global corporations and super power governments that are coordinating AI projects? Yes, but its not a singular AI project, The AI-sphere contains multitudes.

Also, AGI is a specific term and although its become more popular, its mostly a term Goertzel created because the term "AI" was being improperly used to label even simplistic statistical models like deep learning networks. At least that is how I saw it when I first read the term. I'm still looking for a free copy of AGI Revolution.

14 comments, sorted by Highlighting new comments since Today at 4:37 PM
The technologies for maintaining surveillance of would-be AGI developers improve.

Yeah, when I was reading Bostrom's Black Ball paper I wanted to yell many times, "Transparent Society would pretty much totally preclude all of this".

We need to talk a lot more about the outcome where surveillance becomes so pervasive that it's not dystopian any more (in short, "It's not a panopticon if ordinary people can see through the inspection house"), because it seems like 95% of x-risks would be averted if we could just all see what everyone is doing and coordinate. And that's on top of the more obvious benefits like, you know, the reduction of violent crime, the economic benefits of massive increase in openness.

Regarding technologies for defeating surveillance... I don't think falsification is going to be all that tough to solve (Scrying for outcomes where the problem of deepfakes has been solved).

If it gets to the point where multiple well sealed cameras from different manufacturers are validating every primary source and where so much of the surrounding circumstances of every event are recorded as well, and where everything is signed and timestamped in multiple locations the moment it happens, it's going to get pretty much impossible to lie about anything, no matter how good your fabricated video is, no matter how well you hid your dealings with your video fabricators operating in shaded jurisdictions, we must ask where you'd think you could slot it in, where people wouldn't notice the seams.

But of course, this will require two huge cultural shifts. One to transparency and another to actually legislate against AGI boxing, because right now if someone wanted to openly do that, no one could stop them. Lots of work to do.

This is a really good example of a possible cultural/technological change that would alter the coordination landscape substantially. Thanks.

FYI, here's a past Paul Christiano exploration of this topic:

Anyway, I did say that I thought there were lots of plausible angles, so I can try to give one. This is very off-the-cuff, it’s not a topic that I have yet thought about much though I expect to at some point.
Example: tagging advanced technology
Let’s say that a technology is “basic” if it is available in 2016; otherwise we say it is “advanced.” We would like to:
1. Give individuals complete liberty when dealing with basic technology.
2. Give individuals considerable liberty when dealing with advanced technology.
3. Prevent attackers from using advanced technologies developed by law-abiding society in order to help do something destructive .
We’ll try to engineer a property of being “tagged,” aiming for the following desiderata:
1. All artifacts embodying advanced technology, produced or partly produced by law-abiding citizens, are tagged.
2. All artifacts produced using tagged artifacts are themselves tagged.
3. Tagged artifacts are not destructive (in the sense of being much more useful for an agent who wants to destroy).
Property #1 is relatively easy to satisfy, since the law can require tagging advanced technology. Ideally tagging will be cheap and compatible with widely held ethical ideals, so that there is little incentive to violate such laws. The difficulty is achieving properties #2 and #3 while remaining cheap / agreeable.
The most brutish way to achieve properties #2 and #3 is to have a government agency X which retains control over all advanced artifacts. When you contribute an artifact to X they issue you a title. The title-holder can tell X what to do with an advanced artifact, and X will honor those recommendations so long as (1) the proposed use is not destructive, and (2) the proposed use does not conflict with X’s monopoly on control of advanced artifacts. The title-holder is responsible for bearing the costs associated with maintaining X’s monopoly — for example, if a title-holder would like to used advanced artifacts in a factory in Nevada, then X will need to physically defend that factory, and the title-holder must pay the associated costs.
(In this case, tagging = “controlled by X.”)
This system is problematic for a number of reasons. In particular: (1) it provides an objectionable level of power to the organization X itself, (2) it may impose significant overhead on the use of advanced artifacts, (3) it only works so long as X is able to understand the consequences of actions recommended by title-holders (further increasing overhead and invasiveness).
More clever tagging schemes can ameliorate these difficulties, and AI seems very helpful for that. For example, if we were better able to automate bureaucracies, we could ensure that power rests with a democratic process that controls X rather than with the bureaucrats who implement X (and could potentially address concerns with privacy). We could potentially reduce overhead for some artifacts by constructing them in such a way that their destructive power is limited without having to retain physical control. (This would be much easier if we could build powerful AI into advanced artifacts.) And so on. In general, the notion of “tagging” could be quite amorphous and subtle.
If we implemented some kind of tagging, then a would-be attacker’s situation in the future is not much better than it is today. They could attempt to develop advanced technology in parallel; if they did that without the use of other advanced artifacts then it would require the same kind of coordination that is currently beyond the ability of terrorist groups. If they did it with the use of tagged advanced artifacts, then their products would end up getting tagged.

I'm curating this question.

I think I'd thought about each of the considerations Eli lists here, but I had not seen them listed out all at once and framed as a part of a single question before. I also had some sort of implicit background belief that longer timelines were better from a coordination standpoint. But as soon as I saw these concerns listed together, I realized that was not at all obvious.

So far none of the answers here seem that compelling to me. I'd be very interested in more comprehensive answers that try to weigh the various considerations at play.

Coordination to do something is hard, and possible only because it doesn't require everyone agree, only enough people to do the thing. Coordination NOT to do something that's obviously valuable (but carries risks) is _MUCH_ harder, because it requires agreement (or at least compliance and monitoring) from literally everyone.

It's not a question of getting harder or easier to coordinate over time - it's not possible to prevent AGI research now, and it won't become any less or more possible later. It's mostly a race to understand safety well enough to publish mechanisms to mitigate and reduce risks BEFORE a major self-improving AGI can be built by someone.

This is a nitpick, but I contest the $10,000 figure. If I had an incentive as strong as building an (aligned) AGI, I'm sure I could find a way to obtain upwards of a million dollars worth of compute.

Sure. 10,000 is an (almost) unobjectionable lower bound.

I'm a trained rationalist and all the things I've read precedently about AI being an existential risk were bullshit. But I know the Lesswrong community (which I respect) is involved in AI risk. So where can I find a concise, exhaustive list of all sound arguments pro and con AGI being likely an existential risk? If no such curated list exist, are people really caring about the potential issue?

I would like to update my belief about the risk. But I suppose that most people talking about AGI risk have not enough knowledge about what technically constitute an AGI. I'm currently building an AGI that aims to understand natural language and to optimally answer questions, internally satisfying a coded utilitarian effective altruistic finality system. The AGI take language as input and output natural language text. That's it. How can text be an existential risk is to be answered... There's no reason to give effectors to AGI, just asking her knowledge and optimal decision would be suffisant for revolutionizing humanity well being (e.g optimal politics), and the output would be analysed by rational humans, stopping it from AGI mistakes. As for thinking that an AGI will become self conscious, this is nonsense and I would be fascinated to be proved otherwise.

So where can I find a concise, exhaustive list of all sound arguments pro and con AGI being likely an existential risk?

Nick Bostrom’s book ‘Superintelligence’ is the standard reference here. I also find the AI FOOM Debate especially enlightening, which hits a lot of the same points. Both you can find easily using google.

But I suppose that most people talking about AGI risk have not enough knowledge about what technically constitute an AGI.

I agree most people who talk about it are not experts in mathematics, computer science, or the field of ML, but the smaller set of people that I trust often are, such as researchers at UC Berkeley (Stuart Russell, Andrew Critch, many more), OpenAI (Paul Christiano, Chris Olah, many more), DeepMind (Jan Leike, Vika Krakovna, many more), MIRI, FHI, and so on. And of course just being an expert in a related technical domain does not make you an expert in long-term forecasting or even AGI, of which there are plausibly zero people with deep understanding.

And in this community Eliezer has talked often about actually solving the hard problem of AGI, not bouncing off and solving something easier and nearby, in part here but also in other places I’m having a hard time linking right now.

Bostrom's book is a bit out of date, and perhaps isn't the best reference on the AI safety community's current concerns. Here are some more recent articles:

  1. Disentangling arguments for the importance of AI safety
  2. A shift in arguments for AI risk
  3. The Main Sources of AI Risk?

Thanks. I'll further add Paul's post What Failure Looks Like, and say that the Alignment Forum sequences raise a lot more specific technical concerns.

The AI asks for lots of info on biochemistry, and gives you a long list of chemicals that it claims cure various diseases. Most of these are normal cures. One of these chemicals will mutate the common cold into a lethal super plague. Soon we start some clinical trials of the various drugs, until someone with a cold takes the wrong one and suddenly the wold has a super plague.

The medial marvel AI is asked about the plague, It gives a plausible cover story for the plagues origins, along with describing an easy to make and effective vaccine. As casualties mount, humans rush to put the vaccine into production. The vaccine is designed to have an interesting side effect, a subtle modification of how the brain handles trust and risk. Soon the AI project leaders have been vaccinated. The AI says that it can cure the plague, it has a several billion base pair DNA file, that should be put into a bacterium. We allow it to output this file. We inspect it in less detail than we should have, given the effect of the vaccine, then we synthesize the sequence and put it in a bacteria. A few minutes later, the sequence bootstraps molecular nanotech. over the next day, the nanotech spreads around the world. Soon its exponentially expanding across the universe turning all matter into drugged out brains in vats. This is the most ethical action according to the AI's total utilitarian ethics.

The fundamental problem is that any time that you make a decision based on the outputs of an AI, that gives it a chance to manipulate you. If what you want isn't exactly what it wants, then it has incentive to manipulate.

(There is also the possibility of a side channel. For example, manipulating its own circuits to produce a cell phone signal, spinning its hard drive in a way that makes a particular sound, ect. Making a computer just output text, rather than outputing text, and traces of sound, microwaves and heat which can normally be ignored but might be maliciously manipulated by software, is hard)

I'm a trained rationalist

What training process did you go through? o.o

My understanding is that we don't really know a reliable way to produce anything that could be called a "trained rationalist", a label which sets impossibly high standards (in the view of a layperson) and is thus pretty much unusable. (A large part of becoming an aspiring rationalist involves learning how any agent's rationality is necessarily limited, laypeople have overoptimistic intuitions about that)

An AGI that can reason about it's own capabilities to decide how to spend resources might be more capable then one that can't reason about itself because it know better how to approach solving a given problem. It's plausible that a sufficiently complex neural net finds that this is a useful sub-feature and implements it.

I wouldn't expect Google translate to suddenly develop self consciousness but self consciousness is a tool that helps humans to reason better. Self consciousness allows us to reflect about our own action and think about how we should best approach a given problem.

An AGI that can reason about it's own capabilities to decide how to spend resources might be more capable then one that can't reason about itself because it know better how to approach solving a given problem. It's plausible that a sufficiently complex neural net finds that this is a useful subfeature and implements it.