There has been considerable debate over whether development in AI will experience a discontinuity, or whether it will follow a more continuous growth curve. Given the lack of consensus and the confusing, diverse terminology, it is natural to hypothesize that much of the debate is due to simple misunderstandings. Here, I seek to dissolve some misconceptions about the continuous perspective, based mostly on how I have seen people misinterpret it in my own experience.

First, we need to know what I even mean by continuous takeoff. When I say it, I mean a scenario where the development of competent, powerful AI follows a trajectory that is roughly in line with what we would have expected by extrapolating from past progress. That is, there is no point at which a single project lunges forward in development and creates an AI that is much more competent than any other project before it. This leads to the first clarification,

Continuous doesn't necessarily mean slow

The position I am calling "continuous" has been called a number of different names over the years. Many refer to it as "slow" or "soft." I think continuous is preferable to these terms because it focuses attention on the strategically relevant part of the question. It seems to matter less what the actual clock-time is from AGI to superintelligence, and instead matters more if there are will be single projects who break previous technological trends and gain capabilities that are highly unusual relative to the past.

Moreover, there are examples of rapid technological developments that I consider to be continuous. As an example, consider GANs. In 2014, GANs were used to generate low quality black-and-white photos of human faces. By late 2018, they were used to create nearly-photorealistic images of human faces.

Yet, at no point during this development did any project leap forward by a huge margin. Instead, every paper built upon the last one by making minor improvements and increasing the compute involved. Since these minor improvements nonetheless happened rapidly, the result is that the GANs followed a fast development relative to the lifetimes of humans.

Extrapolating from this progress, we can assume that GAN video generation will follow a similar trajectory, starting with simple low resolution clips, and gradually transitioning to the creation of HD videos. What would be unusual is if someone right now in late 2019 produces some HD videos using GANs.

Large power differentials can still happen in a continuous takeoff

Power differentials between nations, communities, and people are not unusual in the course of history. Therefore, the existence of a deep power differential caused by AI would not automatically imply that a discontinuity has occurred.

In a continuous takeoff, a single nation or corporation might still pull ahead in AI development by a big margin and use this to their strategic advantage. To see how, consider how technology in the industrial revolution was used by western European nations to conquer much of the world.

Nations rich enough to manufacture rifles maintained a large strategic advantage over those unable to. Despite this, the rifle did not experience any surprising developments which catapulted it to extreme usefulness, as far as I can tell. Instead, sharpshooting became gradually more accurate, with each decade producing slightly better rifles.

See also: Soft takeoff can still lead to decisive strategic advantage

Continuous takeoff doesn't require believing that ems will come first

This misconception seems to mostly be a historical remnant of the Hanson-Yudkowsky AI-Foom debate. In the old days, there weren't many people actively criticizing foom. So, if you disagreed with foom, it was probably because you were sympathetic to Hanson's views.

There are now many people who disagree with foom who don't take Hanson's side. Paul Christiano and AI Impacts appear somewhat at the forefront of this new view.

Recursive self-improvement is compatible with continuous takeoff

In my experience, recursive self improvement is one of the main reasons cited for why we should expect a discontinuity. The validity of this argument is far from simple, but needless to say: folks who subscribe to continuous takeoff aren't simply ignoring it.

Consider I.J. Good's initial elaboration of recursive self improvement,

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion’, and the intelligence of man would be left far behind.

The obvious interpretation from the continuous perspective is that by the time we have an ultraintelligent machine, we'll already have a not-quite-ultraintelligent machine. Therefore, the advantage that an ultraintelligent machine will have over the collective of humanity + machines will be modest.

It is sometimes argued that even if this advantage is modest, the growth curves will be exponential, and therefore a slight advantage right now will compound to become a large advantage over a long enough period of time. However, this argument by itself is not an argument against a continuous takeoff.

Exponential growth curves are common for macroeconomic growth, and therefore this argument should apply equally to any system which experiences a positive feedback loop. Furthermore, large strategic advantages do not automatically constitute a discontinuity since they can still happen even if no project surges forward suddenly.

Continuous takeoff is relevant to AI alignment

The misconception here is something along the lines of, "Well, we might not be able to agree about AI takeoff, but at least we can agree that AI safety is extremely valuable in either case." Unfortunately, the usefulness of many approaches to AI alignment appear to hinge quite a bit on continuous takeoff.

Consider the question of whether an AGI would defect during testing. The argument goes that an AI will have an instrumental reason to pretend to be aligned while weak, and then enter a treacherous turn when it is safe from modification. If this phenomenon ever occurs, there are two distinct approaches we can take to minimize potential harm.

First, we could apply extreme caution and try to ensure that no system will ever lie about its intentions. Second, we could more-or-less deal with systems which defect as they arise. For instance, during deployment we could notice that some systems are optimizing something different than what we intended during training, and therefore we shut them down.

The first approach is preferred if you think that there will be a rapid capability gain relative the rest of civilization. If we deploy an AI and it suddenly catapults to exceptional competence, then we don't really have a choice other than to get its values right the first time.

On the other hand, under a continuous takeoff, the second approach seems more promising. Each individual system won't by themselves carry more power than the sum of projects before it. Instead, AIs will only be slightly better than the ones that came before it, including any AIs we are using to monitor the newer ones. Therefore, to the extent that the second approach carries a risk, it will probably look less like a sudden world domination and will look more like a bad product rollout, in line with say, the release of Windows Vista.

Now, obviously there are important differences between current technological products and future AGIs. Still, the general strategy of "dealing with things as they come up" is much more viable under continuous takeoff. Therefore, if a continuous takeoff is more likely, we should focus our attention on questions which fundamentally can't be solved as they come up. This is a departure from the way that many have framed AI alignment in the past.

New Comment
38 comments, sorted by Click to highlight new comments since:
Second, we could more-or-less deal with systems which defect as they arise. For instance, during deployment we could notice that some systems are optimizing something different than what we intended during training, and therefore we shut them down.
Each individual system won’t by themselves carry more power than the sum of projects before it. Instead, AIs will only be slightly better than the ones that came before it, including any AIs we are using to monitor the newer ones.

If the sum of projects from before carry more power than the individual system, such that it can't win by defection, there's no reason for it to defect. It might just join the ranks of "projects from before", and subtly try to alter future systems to be similarly defective, waiting for a future opportunity to strike. If the way we build these things systematically renders them misaligned, we'll sooner or later end up with a majority of them being misaligned, at which point we can't trivially use them to shut down defectors.

(I agree that continuous takeoff does give us more warning, because some systems will presumably defect early, especially weaker ones. And IDA is kind of similar to this strategy, and could plausibly work. I just wanted to point out that a naive implementation of this doesn't solve the problem of treacherous turns.)

Expanding on that a little, even if we know our AIs are misaligned that doesn't necessarily save us. We might reach a state of knowledge when it is easy to create AIs that (i) misaligned (ii) superhuman and (iii) non-singular (i.e. a single such AI is not stronger than the sum total of humanity and aligned AIs) but hard/impossible to create aligned superhuman AIs. Since misaligned AIs that can't take over still mostly follow human instructions, there will be tremendous economic incentives to deploy more such systems. This is effectively a tragedy of the commons: for every individual actor, deploying more AIs only increases global risk a little but brings in tremendous revenue. However, collectively, risk accumulates rapidly. At some point the total power of misaligned AIs crosses some (hard to predict in advance) threshold and there is a phase transition (a cascade of failures) from a human-controlled world to a coalition-of-misaligned-AI-controlled world. Alternatively, the AIs might find a way to manipulate our entire culture into gradually changing its values into something the AIs prefer (like with Murder Gandhi).

We might reach a state of knowledge when it is easy to create AIs that (i) misaligned (ii) superhuman and (iii) non-singular (i.e. a single such AI is not stronger than the sum total of humanity and aligned AIs) but hard/impossible to create aligned superhuman AIs.

My intuition is that it'd probably be pretty easy to create an aligned superhuman AI if we knew how to create non-singular, mis-aligned superhuman AIs, and had cheap, robust methods to tell if a particular AI was misaligned. However, it seems pretty plausible that we'll end up in a state where we know how to create non-singular, superhuman AIs; strongly suspect that most/all of them are mis-aligned; but don't have great methods to tell whether any particular AI is aligned or mis-aligned. Does that sound right to you?

My intuition is that it'd probably be pretty easy to create an aligned superhuman AI if we knew how to create non-singular, mis-aligned superhuman AIs, and had cheap, robust methods to tell if a particular AI was misaligned.

This sounds different from how I model the situation; my views agree here with Nate's (emphasis added):

I would rephrase 3 as "There are many intuitively small mistakes one can make early in the design process that cause resultant systems to be extremely difficult to align with operators’ intentions.” I’d compare these mistakes to the “small” decision in the early 1970s to use null-terminated instead of length-prefixed strings in the C programming language, which continues to be a major source of software vulnerabilities decades later.
I’d also clarify that I expect any large software product to exhibit plenty of actually-trivial flaws, and that I don’t expect that AGI code needs to be literally bug-free or literally proven-safe in order to be worth running. Furthermore, if an AGI design has an actually-serious flaw, the likeliest consequence that I expect is not catastrophe; it’s just that the system doesn’t work. Another likely consequence is that the system is misaligned, but in an obvious ways that makes it easy for developers to recognize that deployment is a very bad idea. The end goal is to prevent global catastrophes, but if a safety-conscious AGI team asked how we’d expect their project to fail, the two likeliest scenarios we’d point to are "your team runs into a capabilities roadblock and can't achieve AGI" or "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time."

My current model of 'the default outcome if the first project to develop AGI is highly safety-conscious, is focusing on alignment, and has a multi-year lead over less safety-conscious competitors' is that the project still fails, because their systems keep failing their tests but they don't know how to fix the deep underlying problems (and may need to toss out years of work and start from scratch in order to have a real chance at fixing them). Then they either (a) lose their lead, and some other project destroys the world; (b) decide they have to ignore some of their tests, and move ahead anyway; or (c) continue applying local patches without understanding or fixing the underlying generator of the test failures, until they or their system find a loophole in the tests and sneak by.

I don't think any of this is inevitable or impossible to avoid; it's just the default way I currently visualize things going wrong for AGI developers with a strong interest in safety and alignment.

Possibly you'd want to rule out (c) with your stipulation that the tests are "robust"? But I'm not sure you can get tests that robust. Even in the best-case scenario where developers are in a great position to build aligned AGI and successfully do so, I'm not imagining post-hoc tests that are robust to a superintelligence trying to game them. I'm imagining that the developers have a prior confidence from their knowledge of how the system works that every part of the system either lacks the optimization power to game any relevant tests, or will definitely not apply any optimization to trying to game them.

Possibly you'd want to rule out (c) with your stipulation that the tests are "robust"? But I'm not sure you can get tests that robust.

That sounds right. I was thinking about an infinitely robust misalignment-oracle to clarify my thinking, but I agree that we'll need to be very careful with any real-world-tests.

If I imagine writing code and using the misalignment-oracle on it, I think I mostly agree with Nate's point. If we have the code and compute to train a superhuman version of GPT-2, and the oracle tells us that any agent coming out from that training process is likely to be misaligned, we haven't learned much new, and it's not clear how to design a safe agent from there.

I imagine a misalignment-oracle to be more useful if we use it during the training process, though. Concretely, it seems like a misalignment-oracle would be extremely useful to achieve inner alignment in IDA: as soon as the AI becomes misaligned, we can either rewind the training process and figure out what we did wrong, or directly use the oracle as a training signal that severely punish any step that makes the agent misaligned. Coupled with the ability to iterate on designs, since we won't accidentally blow up the world on the way, I'd guess that something like this is more likely to work than not. This idea is extremely sensitive to (c), though.

Has the "alignment roadblock" scenario been argued for anywhere?

Like Lanrian, I think it sounds implausible. My intuition is that understanding human values is a hard problem, but taking over the world is a harder problem. For example, the AI which can talk its way out of a box probably has a very deep understanding of humans--a deeper understanding than most humans have of humans! In order to have such a deep understanding, it must have lower-level building blocks for making sense of the world which work extremely well, and could be used for a value learning system.

BTW, coincidentally, I quoted this same passage in a post I wrote recently which discussed this scenario (among others). Is there a particular subscenario of this I outlined which seems especially plausible to you?

My intuition is that understanding human values is a hard problem, but taking over the world is a harder problem.

Especially because taking over the world requires you to be much better than other agents who want to stop you from taking over the world, which could very well include other AIs.

ETA: That said, upon reflection, there have been instances of people taking over large parts of the world without being superhuman. All world leaders qualify, and it isn't that unusual. However, what would be unusual is if someone wanted to take over the world and everyone else didn't want that yet it still happened.

In a scenario where multiple AIs compete for power the AIs who makes fast decisions without checking back with humans have an advantage in the power competition and are going to get more power over time.

Additionally, AGI differ fundamentally from humans because the can spin up multiple copies of themselves when they get more resources while human beings can't similarly scale their power when they have access to more food.

The best human hacker can't run a cyber war alone but if he could spin of 100,000 copies of themselves he could find enough 0 days to hack into all important computer systems.

In a scenario where multiple AIs compete for power the AIs who makes fast decisions without checking back with humans have an advantage in the power competition and are going to get more power over time.

Agreed this is a risk, but I wouldn't call this an alignment roadblock.

It might just join the ranks of "projects from before", and subtly try to alter future systems to be similarly defective, waiting for a future opportunity to strike.

Admittedly, I did not explain this point well enough. What I meant to say was that before we have the first successful defection, we'll have some failed defection. If the system could indefinitely hide its own private intentions to later defect, then I would already consider that to be a 'successful defection.'

Knowing about a failed defection, we'll learn from our mistake and patch that for future systems. To be clear, I'm definitely not endorsing this as a normative standard for safety.

I agree with the rest of your comment.

It is sometimes argued that even if this advantage is modest, the growth curves will be exponential, and therefore a slight advantage right now will compound to become a large advantage over a long enough period of time. However, this argument by itself is not an argument against a continuous takeoff.

I'm not sure this is an accurate characterization of the point; my understanding is that the concern largely comes from the possibility that the growth will be faster than exponential, rather than merely exponential.

I'm not sure this is an accurate characterization of the point; my understanding is that the concern largely comes from the possibility that the growth will be faster than exponential

Sure, if someone was arguing that, then they have a valid understanding of the difference between continuous vs. discontinuous takeoff. I would just question the assumption why we should expect growth to be faster than exponential for any sustained period of time.

To be clear, intelligence explosion via recursive self-improvement has been distinguished from merely exponential growth at least as far back as Yudkowsky's "Three Major Singularity Schools". I couldn't remember the particular link when I wrote the comment above, but, well, now I remember it.

Anyway, I don't have a particular argument one way or the other; I'm just registering my surprise that you encountered people here arguing for merely exponential growth base on intelligence explosion arguments.

Empirically, most systems with a feedback loop don't grow hyperbolically. I would need strong theoretical reasons in order to understand why this particular distinction is important.

I don't really want to go trying to defend here a position I don't necessarily hold, but I do have to nitpick and point out that there's quite a bit of room inbetween exponential and hyperbolic.

the general strategy of "dealing with things as they come up" is much more viable under continuous takeoff. Therefore, if a continuous takeoff is more likely, we should focus our attention on questions which fundamentally can't be solved as they come up.

Can you give some examples of AI alignment failure modes which would be definitely (or probably) easy to solve if we had a reproducible demonstration of that failure mode sitting in front of us? It seems to me that none of the current ongoing work is in that category.

When I imagine that type of iterative debugging, the example in my mind is a bad reward function that the programmers are repeatedly patching, which would be a bad situation because it would probably amount to a "nearest unblocked strategy" loop.

[-]Eli TyreΩ0110
Yet, at no point during this development did any project leap forward by a huge margin. Instead, every paper built upon the last one by making minor improvements and increasing the compute involved. Since these minor improvements nonetheless happened rapidly, the result is that the GANs followed a fast development relative to the lifetimes of humans.

Does anyone have time series data on the effectiveness of Go-playing AI? Does that similarly follow a gradual trend?

AlphaGo seems much closer to "one project leaps forward by a huge margin." But maybe I'm mistaken about how big an improvement AlpahGo was over previous Go AIs.

Miles Brundage argues that "it’s an impressive achievement, but considering it in this larger context should cause us to at least slightly decrease our assessment of its size/suddenness/significance in isolation".

In the wake of AlphaGo’s victory against Fan Hui, much was made of the purported suddenness of this victory relative to expected computer Go progress. In particular, people at DeepMind and elsewhere have made comments to the effect that experts didn’t think this would happen for another decade or more. One person who said such a thing is Remi Coulom, designer of CrazyStone, in a piece in Wired magazine. However, I’m aware of no rigorous effort to elicit expert opinion on the future of computer Go, and it was hardly unanimous that this milestone was that long off. I and others, well before AlphaGo’s victory was announced, said on Twitter and elsewhere that Coulom’s pessimism wasn’t justified. Alex Champandard noted that at a gathering of game AI experts a year or so ago, it was generally agreed that Go AI progress could be accelerated by a concerted effort by Google or others. At AAAI last year [2015], I also asked Michael Bowling, who knows a thing or two about game AI milestones (having developed the AI that essentially solved limit heads-up Texas Hold Em), how long it would take before superhuman Go AI existed, and he gave it a maximum of five years. So, again, this victory being sudden was not unanimously agreed upon, and claims that it was long off are arguably based on cherry-picked and unscientific expert polls. [...]
Hiroshi Yamashita extrapolated the trend of computer Go progress as of 2011 into the future and predicted a crossover point to superhuman Go in 4 years, which was one year off. In recent years, there was a slowdown in the trend (based on highest KGS rank achieved) that probably would have lead Yamashita or others to adjust their calculations if they had redone them, say, a year ago, but in the weeks leading up to AlphaGo’s victory, again, there was another burst of rapid computer Go progress. I haven’t done a close look at what such forecasts would have looked like at various points in time, but I doubt they would have suggested 10 years or more to a crossover point, especially taking into account developments in the last year. Perhaps AlphaGo’s victory was a few years ahead of schedule based on reported performance, but it should always have been possible to anticipate some improvement beyond the (small team/data/hardware-based) trend based on significant new effort, data, and hardware being thrown at the problem. Whether AlphaGo deviated from the appropriately-adjusted trend isn’t obvious, especially since there isn’t really much effort going into rigorously modeling such trends today. Until that changes and there are regular forecasts made of possible ranges of future progress in different domains given different effort/data/hardware levels, “breakthroughs” may seem more surprising than they really should be.
AlphaGo seems much closer to "one project leaps forward by a huge margin."

I don't have the data on hand, but my impression was that AlphaGo indeed represented a discontinuity in the domain of Go. It's difficult to say why this happened, but my best guess is that DeepMind invested a lot more money into solving Go than any competing actor at the time. Therefore, the discontinuity may have followed straightforwardly from a background discontinuity in attention paid to the task.

If this hypothesis is true, I don't find it compelling that AlphaGo is evidence for a discontinuity for AGI, since such funding gaps are likely to be much smaller for economically useful systems.

The following is mostly a nitpick / my own thinking through of a scenario:

If this hypothesis is true, I don't find it compelling that AlphaGo is evidence for a discontinuity for AGI, since such funding gaps are likely to be much smaller for economically useful systems.

If there is no fire alarm for general intelligence, it's not implausible that that there will be a similar funding gap for useful systems. Currently, there are very few groups explicitly aiming at AGI, and of those groups Deep Mind is by far the best funded.

If we are much nearer to AGI than most of us suspect, we might see the kind of funding differential exhibited in the Go example for AGI, because the landscape of people developing AGI will look a lot closer to that of Alpha Go (only one group trying seriously), vs. the one for GANs (many groups making small iterative improvements on each-other's work).

Overall, I find this story to be pretty implausible, though. It would mean that there is a capability cliff very nearby in ML design space, somehow, and that cliff is so sharp to be basically undetectable right until someone's gotten to the top of it.

Still, the general strategy of "dealing with things as they come up" is much more viable under continuous takeoff.

Agreed. This is why I'd like to see MIRI folks argue more for their views on takeoff speeds. If they’re right, more researchers should want to work for them.

Summary of my response: chimps are nearly useless because they aren’t optimized to be useful, not because evolution was trying to make something useful and wasn’t able to succeed until it got to humans.

So far as I can tell, the best one-line summary for why we should expect a continuous and not a fast takeoff comes from the interview Paul Christiano gave on the 80k podcast: 'I think if you optimize AI systems for reasoning, it appears much, much earlier.' Which is to say, the equivalent of the 'chimp' milestone on the road to human-level AI does not have approximately the economic utility of a chimp, but a decent fraction of the utility of something that is 'human-level'. This strikes me as an important argument that he's repeated here, and discussed here last april but other than that it seems to have gone largely unnoticed and I'm wondering why.

I have a theory about why this didn't get discussed earlier - there is a much more famous bad argument against AGI being an existential risk, the 'intelligence isn't a superpower' argument that sounds similar. From Chollet vs Yudkowsky:

Intelligence is not a superpower; exceptional intelligence does not, on its own, confer you with proportionally exceptional power over your circumstances.
…said the Homo sapiens, surrounded by countless powerful artifacts whose abilities, let alone mechanisms, would be utterly incomprehensible to the organisms of any less intelligent Earthly species.

I worry that in arguing against the claim that general intelligence isn't a meaningful concept or can't be used to compare different animals, some people have been implicitly assuming that evolution has been putting a decent amount of effort into optimizing for general intelligence. Alternatively, that arguing for one sounds like another, or that a lot of people have been arguing for both together and haven't distinguished between them.

Claiming that you can meaningfully compare evolved minds on the generality of their intelligence needs to be distinguished from claiming that evolution has been optimizing for general intelligence reasonably hard before humans came about.

That part of the interview with Paul was super interesting to me, because the following were previously claims I'd heard from Nate and Eliezer in their explanations of how they think about fast takeoff:

[E]volution [hasn't] been putting a decent amount of effort into optimizing for general intelligence. [...]

'I think if you optimize AI systems for reasoning, it appears much, much earlier.'

Ditto things along the lines of this Paul quote from the same 80K interview:

It’s totally conceivable from our current perspective, I think, that an intelligence that was as smart as a crow, but was actually designed for doing science, actually designed for doing engineering, for advancing technologies rapidly as possible -- it is quite conceivable that such a brain would actually outcompete humans pretty badly at those tasks.


I think that’s another important thing to have in mind, and then when we talk about when stuff goes crazy, I would guess humans are an upper bound for when stuff goes crazy. That is we know that if we had cheap simulated humans, that technological progress would be much, much faster than it is today. But probably stuff goes crazy somewhat before you actually get to humans.

This is part of why I don't talk about "human-level" AI when I write things for MIRI.

If you think humans, corvids, etc. aren't well-optimized for economically/pragmatically interesting feats, this predicts that timelines may be shorter and that "human-level" may be an especially bad way of thinking about the relevant threshold(s).

There still remains the question of whether the technological path to "optimizing messy physical environments" (or "science AI", or whatever we want to call it) looks like a small number of "we didn't know how to do this at all, and now we do know how to do this and can suddenly take much better advantage of available compute" events, vs. looking like a large number of individually low-impact events spread out over time.

If no one event is impactful enough, then a series of numerous S-curves ends up looking like a smooth slope when you zoom out; and large historical changes are usually made of many small changes that add up to one big effect. We don't invent nuclear weapons, get hit by a super-asteroid, etc. every other day.

MIRI thinks that the fact Evolution hasn't been putting much effort into optimizing for general intelligence is a reason to expect discontinuous progress? Apparently, Paul's point is that once we realize evolution has been putting little effort into optimizing for general intelligence, we realize we can't tell much about the likely course of AGI development from evolutionary history, which leaves us in the default position of ignorance. Then, he further argues that the default case is that progress is continuous.

So far as I can tell, Paul's point is that absent specific reasons to think otherwise, the prima facie case that any time we are trying hard to optimize for some criteria, we should expect the 'many small changes that add up to one big effect' situation.

Then he goes on to argue that the specific arguments that AGI is a rare case where this isn't true (like nuclear weapons) are either wrong or aren't strong enough to make discontinuous progress plausible.

From what you just wrote, it seems like the folks at MIRI agree that we should have the prima facie expectation of continuous progress, and I've read elsewhere that Eliezer thinks the case for recursive self-improvement leading to a discontinuity is weaker or less central than it first seemed. So, are MIRI's main reasons for disagreeing with Paul down to other arguments (hence the switch from the intelligence explosion hypothesis to the general idea of rapid capability gain)?

I would think the most likely place to disagree with Paul (if not on the intelligence explosion hypothesis) would be if you expected the right combination of breakthroughs exceeds to a 'generality threshold' (or 'secret sauce' as Paul calls it) that leads to a big jump in capability, but inadequate achievement on any one of the breakthroughs won't do.

Stuart Russell gives a list of the elements he thinks will be necessary for the 'secret sauce' of general intelligence in Human Compatible: human-like language comprehension, cumulative learning, discovering new action sets and managing its own mental activity. (I would add that somebody making that list 30 years ago would have added perception and object recognition, and somebody making it 60 years ago would have also added efficient logical reasoning from known facts). Let's go with Russell's list, so we can be a bit more concrete. Perhaps this is your disagreement:

An AI with (e.g.) good perception and object recognition, language comprehension, cumulative learning capability and ability to discover new action sets but a merely adequate or bad ability to manage its mental activity would be (Paul thinks) reasonably capable compared to an AI that is good at all of these things, but (MIRI thinks) it would be much less capable. MIRI has conceptual arguments (to do with the nature of general intelligence) and empirical arguments (comparing human/chimp brains and pragmatic capabilities) in favour of this hypothesis, and Paul thinks the conceptual arguments are too murky and unclear to be persuasive and that the empirical arguments don't show what MIRI thinks they show. Am I on the right track here?

"Continuous" vs "gradual":

I’ve also seen people internally use the word gradual and I prefer it to continuous because 1) in maths, a discontinuity can be an insignificant jump and 2) a fast takeoff is about fast changes in the growth rate, whereas continuity is about jumps in the function value (you can have either without the other). I don’t see a natural way to say non-gradual or a non-graduality though, which is why I do often say discontinuous instead.

These all seem basically accurate.

From one of the linked articles, Christiano talking about takeoff speeds:

I believe that before we have incredibly powerful AI, we will have AI which is merely very powerful. This won’t be enough to create 100% GDP growth, but it will be enough to lead to (say) 50% GDP growth. I think the likely gap between these events is years rather than months or decades.
In particular, this means that incredibly powerful AI will emerge in a world where crazy stuff is already happening (and probably everyone is already freaking out). If true, I think it’s an important fact about the strategic situation.

and in your post:

Still, the general strategy of "dealing with things as they come up" is much more viable under continuous takeoff. Therefore, if a continuous takeoff is more likely, we should focus our attention on questions which fundamentally can't be solved as they come up.

I agree that the continuous/slow takeoff is more likely than fast takeoff, though I have low confidence in that belief (and in most of my beliefs about AGI timelines) but the world of a continuous/slow takeoff, badly managed still seems like an extreme danger and a case where it would be too late to deal with many problems in e.g. the same year that they arise. Are you imagining something like this?

I agree that large power differentials between are possible between countries, like in the industrial revolution. I think it’s worth distinguishing concentration of power among countries from concentration among AI systems. I.e.

1) each country has at most a couple of AI systems and one country has significantly better ones or

2) each country’s economy uses many systems with a range of abilities and one country has significantly better ones on average.

In 2), the countries likely want to trade and negotiate (in addition to potentially conquering each other). Systems within conquered countries still have influence. That seems more like what happened in the industrial revolution. I feel like people sometimes argue for concentration of power among AI systems by saying that we’ve seen concentration among countries or companies. But those seem pretty different. (I’m not saying that’s what you’re arguing).

I'm finding it hard to see how we could get (1) without some discontinuity?

When I think about why (1) would be true, the argument that comes to mind is that single AI systems will be extremely expensive to deploy, which means that only a few very rich entities could own them. However, this would contradict the general trend of ML being hard to train and easy to deploy. Unlike, say nukes, once you've trained your AI you can create a lot of copies and distribute them widely.

Re whether ML is easy to deploy: most compute these days goes into deployment. And there are a lot of other deployment challenges that you don't have during training where you train a single model under lab conditions.

I agree with this, but when I said deployment I meant deployment of a single system, not several.

Fair - I'd probably count "making lots of copies of a trained system" as a single system here.

I'm confused about why (1) and (2) are separate scenarios then. Perhaps because in (2) there's a lot of different types of AIs?

Yes. To the extent that the system in question is an agent, I'd roughly think of many copies of it as a single distributed agent.

I don’t see the GAN example as evidence for continuous-but-quick takeoff.

When a metric suddenly becomes a target, fast progress can follow. But we already target the most important metrics (e.g. general intelligence). Face generation became a target in 2014 - and the number of papers on GANs quickly grew from a few to thousands per year. Compute budgets also jumped. There were low-hanging fruits for face-generation that people previously did not care about. I.e. we could have generated way better faces in 2014 than the one in the example if we had cared about it for some time.

It's worth noting that I wasn't using it as evidence "for" continuous takeoff. It was instead an example of something which experienced a continuous takeoff that nonetheless was quick relative to the lifespan of a human.

It's hard to argue that it wasn't continuous under my definition, since the papers got gradually and predictably better. Perhaps there was an initial discontinuity in 2014 when it first became a target? Regardless, I'm not arguing that this is a good model for AGI development.

Yes - the part that I was doubting is that it provides evidence for relatively quick takeoff.