Will advanced AI let some small group of people or AI systems take over the world?

AI X-risk folks and others have accrued lots of arguments about this over the years, but I think this debate has been disappointing in terms of anyone changing anyone else’s mind, or much being resolved. I still have hopes for sorting this out though, and I thought a written summary of the evidence we have so far (which often seems to live in personal conversations) would be a good start, for me at least.

To that end, I started a collection of reasons to expect discontinuous progress near the development of AGI.

I do think the world could be taken over without a step change in anything, but it seems less likely, and we can talk about the arguments around that another time.

Paul Christiano had basically the same idea at the same time, so for a slightly different take, here is his account of reasons to expect slow or fast take-off.

Please tell us in the comments or feedback box if your favorite argument for AI Foom is missing, or isn’t represented well. Or if you want to represent it well yourself in the form of a short essay, and send it to me here, and we will gladly consider posting it as a guest blog post.

I’m also pretty curious to hear which arguments people actually find compelling, even if they are already listed. I don’t actually find any of the ones I have that compelling yet, and I think a lot of people who have thought about it do expect ‘local takeoff’ with at least substantial probability, so I am probably missing things.

Crossposted from AI Impacts.

New to LessWrong?

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 3:24 AM

tldr: I think it matters differently when something is discontinuitous with linear vs exponential vs hyperbolic curves.

Some thoughts copied over from an earlier conversation about this.

Reading this and some of Paul's related thoughts, I experienced some confusion about why we care about discontinuity in the first place. Here are my thoughts on that.

Caring about discontinuity is similar but not identical to "why do we care about takeoff speed?". ESRogs mentions in this comment in the related Paul thread that we might care about:

How much of a lead will one team need to have over others in order to have a decisive strategic advantage...
in wall clock time? in economic doublings time?
How much time do we have to solve hard alignment problems...
in wall clock time? in economic doublings time?

A related issue is how surprising an advance will be, in a fashion that disrupts our ability to manage it.

A problem the AI safety crowd has faced is that people don't intuitively understand exponentials. In most of our lives, things move fairly linearly. The "one grain of rice, doubled per day" parable works because the King has no conception of how bad things are going to get and how fast.

So if there's a discontinuity in a linear curve, the average human will be surprised, but only once, and then the curve goes back to it's usual progress and things are mostly okay.

If a curve that looked linear-ish turns out to be exponential, that human may find themselves surprised if they're not paying attention, once the curve moves past an inflection point. And more surprised if the curve was hyperbolic. (If you are paying attention I'm not sure whether it matters whether the curve is exponential or hyperbolic)

If an exponential curve has a discontinuity, it's even more surprising. The average human is hit both with a curve arcing upward faster than they expected, and with that curve suddenly jumping upward.

Now, I'm not sure how relevant this is to Leaders of Industry in the AI world – presumably at least some are paying attention. I'm not actually sure how much attention I expect them to be paying (there's a lot of things to pay attention to, and if we're at the early stage of a hyperbolic curve it's a plausible mistake to assume it's linear and/or "only" exponential).

(I think it matters a lot more whether industry leaders are paying attention than whether Joe/Jane Public is. It seems like a major reason discontinuity matters (or whether DeepMind et all think it matters, or think other organizations think it matters) has to do with both arms races, and whether or not it's plausible that we end up in a multipolar scenario vs a singleton)

Relatedly, the thing that actually prompted the above was looking at this graph, which Katja and Sarah Constantin have both referenced:


And not being sure how to think about Chess ratings and whether this discontinuity is better thought of as linear or exponential or what. Or, put another way: insofar as being good at Chess has anything to do with fully general artificial intelligence, is that jump mean we're a little closer or a lot closer?)

At this point I think it's clear being good at Chess is fairly distinct from being good at fully general intelligence so that particular framing doesn't matter, but one might have expected it to matter before much progress on Chess was made.

This brought to mind a general confusion I've had, which is that I see people taking data of various sorts of curves, and plotting them on various sorts of graphs, and sometimes from context it's hard to tell how important a trend is.

And not being sure how to think about Chess ratings and whether this discontinuity is better thought of as linear or exponential or what.

Are you talking about the overall trend, or the discontinuity itself?

It doesn't seem like it would make sense to talk about the discontinuity itself as linear / exponential, since it's defined by just two points: the point before the jump, and the point after the jump. You could fit a line through two points, you could fit an exponential through two points, you could fit anything!

(If you had a trend that switched from being linear to being exponential, that would be a different story. But this graph doesn't look like that to me.)

Have I misunderstood what you're saying?

Are you talking about the overall trend, or the discontinuity itself?

I was mostly talking about the overall trend, although I have additional thoughts on your point about the point-of-discontinuity.

(epistemic status: a bit outside my comfort zone. I feel confident but wouldn't be too surprised if someone who thinks about this more than me responded in a way that updated me considerably. But, I think I may be communicating a point that has reverse inferential distance – i.e the points I'm making are so obvious that they don't seem relevant to the discussion, and my point is that if you're not used to thinking in exponential terms they aren't obvious, so this subthread may be most useful to people who happen to be feel confused or that things are unintuitive in the way I feel right now)

You could fit a line through two points, you could fit an exponential through two points, you could fit anything!

I mean, presumably there are more data points you could (at least hypothetically) have included, in which it's not literally a single discontinuity, but a brief switch to a sharp increase in progress, followed by a return to something closer to the original curve. I'm not sure about the technical definition of discontinuity, but in a world where the graph had a point for each month instead of year, but the year of 2007 still had such a sharp uptick, the point doesn't stop being interesting.

Since the Chess graph is uniquely confusing (hence my original confusion), I'd answer the rest of your question with, say, a more generic economic growth model.

If the economy were growing linearly, and then had a brief spike, and then returned to growing linearly at roughly the same rate, that's one kind of interesting.

The fact that the economy grows exponential is a different kind of interesting, which layfolk routinely make bad choices due to poor intuitions about. (i.e. this is why investing is a much better idea that it seems, and why making tradeoffs that involve half-percent sacrifices to economic growth are a big deal. If you're used to thinking about it this way it may not longer seem interesting, but, like, there are whole courses explaining this concept because it's non-obvious)

If the economy is growing exponentially, and there's a discontinuity where for one year it grows much more rapidly, that's a third kind of interesting, and it's in turn different interesting whether growth slows back down such that it seems like it's at a similar rate to what we had before the spike, or continues as the spike had basically let you skip several years and then continue at an even faster rate.

A lot of great points!

I think we can separate the arguments into about three camps, based on their purpose (though they (EDIT: whoops, forgot a don't) don't all cleanly sit in one camp):

  • Arguments why progress might be generally fast: Hominid variation, Brain scaling.
  • Arguments why a local advantage in AI might develop: Intelligence explosion, One algorithm, Starting high, Awesome AlphaZero.
  • Arguments why a local advantage in AI could cause a global discontinuity: Deployment scaling, Train vs. test, Payoff thresholds, Human-competition threshold, Uneven skills.

These facts need to work together to get the thesis of a single disruptive actor to go through: you need there to be jumps in AI intelligence, you need them to be fairly large even near human intelligence, and you need those increases to translate into a discontinuous impact on the world. This framework helps me evaluate arguments and counterarguments - for example, you don't just argue against Hominid variation as showing that there will be a singularity, you argue against its more limited implications as well.

Bits I didn't agree with, and therefore have lots to say about:

Intelligence Explosion:

The counterargument seems pretty wishy-washy. You say: "Positive feedback loops are common in the world, and very rarely move fast enough and far enough to become a dominant dynamic in the world." How common? How rare? How dominant? Is global warming a dominant positive feedback loop because warming leads to increased water in the atmosphere which leads to more warming, and it's going to have a big effect on the world? Or is it none of those, because Earth won't get all that much warmer, because there are other well-understood effects keeping it in homeostasis?

More precisely, I think the argument from reference class that a positive feedback loop (or rather, the behavior that we approximate as a positive feedback loop) will be limited in time and space is hardly an argument at all - it practically concedes that the feedback loop argument works for the middle of the three camps above, but merely points out that it's not also an argument that intelligence will be important. A strong argument against the intelligence feedback hypothesis has to argue that a positive feedback loop is unlikely.

One can obviously respond by emphasizing that objects in the reference class you've chosen (e.g. tipping back too far in your chair and falling) don't generally impact the world, and therefore this is a reference class argument against AI impacting the world. But AI is not drawn uniformly from this reference class - the only reason we're talking about it is because it's been selected for the possibility of impacting the world. Failure to account for this selection pressure is why the strength of the argument seemed to change upon breaking it into parts vs. keeping it as a whole.

Deployment scaling:

We agree that slow deployment speed can "smooth out" a discontinuous jump in the state of the art into a continuous change in what people actually experience. You present each section as a standalone argument, and so we also agree that fast deployment speed alone does not imply discontinuous jumps.

But I think keeping things so separate misses the point that fast deployment is among the necessary conditions for a discontinuous impact. There's also risk, if we think of things separately, of not remembering these necessary conditions when thinking about historical examples. Like, we might look at the history of drug development, where drug deployment and adoption takes a few years, and costs falling to allow more people to access the treatment takes more years, and notice that even though there's an a priori argument for a discontinuous jump in best practices, peoples' outcomes are continuous on the scale of several years. And then, if we've forgotten about other necessary factors, we might just attribute this to some mysterious low base rate of discontinuous jumps.

Payoff thresholds:

The counterargument doesn't really hold together. We start ex hypothesi with some threshold effect in usefulness (e.g. good enough boats let you reach another island). Then you say that it won't cause a discontinuity in things we care about directly; people might buy better boats, but because of this producers will spend more effort making better boats and sell them more expensively, so the "value per dollar" doesn't jump. But this just assumes without justification that the production eats up all the value - why can't the buyer and the producer both capture part of the increase in value? The only way the theoretical argument seems to work is in equilibrium - which isn't what we care about.

Nuclear weapons are a neat example, but may be a misleading one. Nuclear weapons could have had half the yield, or twice the yield, without altering much about when they were built - although if you'd disagree with this, I'd be interested in in hearing about it. (Looking at your link, it seems like nuclear weapons were in fact more expensive per ton of TNT when they were first built - and yet they were built, which suggests there's something fishy about their fit to this argument).

Awesome AlphaZero:

I think we can turn this into a more general thesis: Research is often local, and often discontinuous, and that's important in AI. Fields whose advance seems continuous on the several-year scale may look jumpy on the six-month scale, and those jumps are usually localized to one research team rather than distributed. You can draw a straight line through a plot of e.g. performance of image-recognition AI, but that doesn't mean that at the times in between the points there was a program with that intermediate skill at image-recognition. This is important to AI if the scale of the jumps, and the time between them, allows one team to jump through some region (not necessarily a discontinuity) of large gain in effect and gain a global advantage.

The missing argument about strategy:

There's one possible factor contributing to the likelihood of discontinuity that I didn't see, and that's the strategic one. If people think that there is some level of advantage in AI that will allow them to have an important global impact, then they might not release their intermediate work to the public (so that other groups don't know their status, and so their work can't be copied), creating an apparent discontinuity when they decide to go public, even if 90% of their AI research would have gotten them 90% of the taking-over-the-world power.

Thanks for your thoughts!

I don't quite follow you on the intelligence explosion issue. For instance, why does a strong argument against the intelligence explosion hypothesis need to show that a feedback loop is unlikely? Couldn't we believe that it is likely, but not likely to be very rapid for a while? For instance, there is probably a feedback loop in intelligence already, where humans with better thoughts and equipment are effectively smarter, and can then devise better thoughts and equipment. But this has been true for a while, and is a fairly slow process (at least for now, relative to our ability to deal with things).

Yeah, upon rereading that response, I think I created a few non sequiturs in revision. I'm not even 100% sure what I meant by some bits. I think the arguments that now seem confusing were me was saying that by putting an intelligence feedback loop in the reference class of "feedback loops in general" and then using that to forecast low impact, the thing that is doing most of the work is simply how low impact most stuff is.

A nuclear bomb (or a raindrop forming, or tipping back a little too far in your chair) can be modeled as a feedback loop through several orders of magnitude of power output, and then eventually that model breaks down and the explosion dissipates, and the world might be a little scarred and radioactive, but it is overall not much different. But if your AI increased by several orders of magnitude in intelligence (let's just pretend that's meaningful for a second), I would expect that to be a much bigger deal, just because the thing that's increasing is different. That is, I was thinking that the implicit model used by the reference class argument from the original link seems to predict local advantages in AI, but predict *against* those local advantages being important to the world at large, which I think is putting the most weight on the weakest link.

Part of this picture I had comes from what I'm imagining as prototypical reference class members - note that I only imagined self-sustaining feedback, not "subcritical" feedback. In retrospect, this seems to be begging the question somewhat - subcritical feedback speeds up progress, but doesn't necessarily concentrate it, unless there is some specific threshold effect for getting that feedback. Another feature of my prototypes was that they're out-of-equilibrium rather than in-equilibrium (an example of feedback in equilibrium is global warming, where there's lots of feedback effects but they're more or less canceling each other out), but this seems justified.

I would agree that one can imagine some kind of feedback loop in "effective smartness" of humans, but I am not sure how natural it is to divorce this from the economic / technological revolution that has radically reshaped our planet, since so much of our effective smartness enhancement is also economy / technology. But this is ye olde reference class ping pong.

This project (best read in the bolded link, not just in this post) seemed and still seems really valuable to me. My intuitions around "Might AI have discontinuous progress?" become a lot clearer once I see Katja framing them in terms of concrete questions like "How many past technologies had discontinuities equal to ten years of past progress?". I understand AI Impacts is working on an updated version of this, which I'm looking forward to.

Updated me quite strongly towards continuous takeoff (from a position of ignorance)

Seconding Rohin. 

I think this is basically the same nomination as the post "Arguments Against Fast Takeoff", it's all one conversation, but just wanted to nominate it to be clear.

Promoted to frontpage.