[ Question ]

Any rebuttals of Christiano and AI Impacts on takeoff speeds?

by SoerenMind 7mo21st Apr 20191 min read20 comments

47


14 months ago, Paul Christiano and AI Impacts both published forceful and well-received take-downs of many arguments for fast (discontinuous) takeoff. I haven’t seen any rebuttals that are written by established researchers, longer than comments, or otherwise convincing. The longer there is no response, the less weight I put on the outside view that proponents of fast takeoff may be right.

Where are the rebuttals? Did I miss them? Is the debate decided? Did nobody have time or motivation to write something? Is the topic too hard to explain?

Why rebuttals would be useful:

-Give the community a sense of the extent of expert disagreement to form outside views.

-Prioritization in AI policy, and to a lesser extent safety, depends on the likelihood of discontinuous progress. We may have more leverage in such cases, but this could be overwhelmed if the probability is low.

-Motivate more people to work on MIRI’s research which seems more important to solve early if there is fast takeoff.

New Answer
Ask Related Question
New Comment
Write here. Select text for formatting options.
We support LaTeX: Cmd-4 for inline, Cmd-M for block-level (Ctrl on Windows).
You can switch between rich text and markdown in your user settings.

7 Answers

MIRI folks are the most prominent proponents of fast takeoff, and we unfortunately haven't had time to write up a thorough response. Oli already quoted the quick comments I posted from Nate and Eliezer last year, and I'll chime in with some of the factors that I think are leading to disagreements about takeoff:

  • Some MIRI people (Nate is one) suspect we might already be in hardware overhang mode, or closer to that point than some other researchers in the field believe.
  • MIRI folks tend to have different views from Paul about AGI, some of which imply that AGI is more likely to be novel and dependent on new insights. (Unfair caricature: Imagine two people in the early 20th century who don't have a technical understanding of nuclear physics yet, trying to argue about how powerful a nuclear-chain-reaction-based bomb might be. If one side were to model that kind of bomb as "sort of like TNT 3.0" while the other is modeling it as "sort of like a small Sun", they're likely to disagree about whether nuclear weapons are going to be a small v. large improvement over TNT. Note I'm just using nuclear weapons as an analogy, not giving an outside-view argument "sometimes technologies are discontinuous, ergo AGI will be discontinuous".)

This list isn't at all intended to be sufficiently-detailed or exhaustive.

I'm hoping we have time to write up more thoughts on this before too long, because this is an important issue (even given that we're trying to minimize the researcher time we put into things other than object-level deconfusion research). I don't want MIRI to be a blocker on other researchers making progress on these issues, though — it would be bad if people put a pause on hashing out takeoff issues for themselves (or put a pause on alignment research that's related to takeoff views) until Eliezer had time to put out a blog post. I primarily wanted to make sure people know that the lack of a substantive response doesn't mean that Nate+Eliezer+Benya+etc. agree with Paul on takeoff issues now, or that we don't think this disagreement matters. Our tardiness is because of opportunity costs and because our views have a lot of pieces to articulate.

I’m reluctant to reply because it sounds like you’re looking for rebuttals by explicit proponents of hard takeoff who have thought a great deal about takeoff speeds, and neither of that applies to me. But I could sketch some intuitions why reading the pieces by AI Impacts and by Christiano hasn't felt wholly convincing to me. (I’ve never run these intuitions past anyone and don’t know if they’re similar to cruxes held by proponents of hard takeoff who are more confident in hard takeoff than I am – therefore I hope people don't update much further against hard takeoff in case they find the sketch below unconvincing.) I found that it’s easiest for me to explain something if I can gesture towards some loosely related “themes” rather than go through a structured argument, so here are some of these themes and maybe people see underlying connections between them:

Culture overhang

Shulman and Sandberg have argued that one way to get hard takeoff is via hardware overhang: when a new algorithmic insight can be used immediately to its full potential, because much more hardware is available than one would have needed to overtake state of the art performance metric with the new algorithms. I think there’s a similar dynamic at work with culture: If you placed an AGI into the stone age, it would be inefficient at taking over the world even with appropriately crafted output channels because stone age tools (which include stone age humans the AGI could manipulate) are neither very useful nor reliable. It would be easier for an AGI to achieve influence in 1995 when the environment contained a greater variety of increasingly far-reaching tools. But with the internet being new, particular strategies to attain power (or even just rapidly acquire knowledge) were not yet available. Today, it is arguably easier than ever for an AGI to quickly and more-or-less single-handedly transform the world.

Snapshot intelligence versus intelligence as learning potential

There’s a sense in which cavemen are similarly intelligent as modern-day humans. If we time-traveled back into the stone age, found the couples with the best predictors for having gifted children, gave these couples access to 21st century nutrition and childbearing assistance, and then took their newborns back into today’s world where they’d grow up in a loving foster family with access to high-quality personalized education, there’s a good chance some of those babies would grow up to be relatively ordinary people of close to average intelligence. Those former(?) cavemen and cavewomen would presumably be capable of dealing with many if not most aspects of contemporary life and modern technology.

However, there’s also a sense in which cavemen are very unintelligent compared to modern-day humans. Culture, education, possibly even things like the Flynn effect, etc. – these really do change the way people think and act in the world. Cavemen are incredibly uneducated and untrained concerning knowledge and skills that are useful in modern, tool-rich environments.

We can think of this difference as the difference between the snapshot of someone’s intelligence at the peak of their development and their (initial) learning potential. Caveman and modern-day humans might be relatively close to each other in terms of the latter, but when considering their abilities at the peak of their personal development, the modern humans are much better at achieving goals in tool-rich environments. I sometimes get the impression that proponents of soft takeoffs underappreciate this difference when addressing comparisons between, for instance, early humans and chimpanzees (this is just a vague general impression which doesn’t apply to the arguments presented by AI impacts or by Paul Christiano).

How to make use of culture: The importance of distinguishing good ideas from bad ones

Both for productive engineers and creative geniuses, it holds that they could only have developed their full potential because they picked up useful pieces of insight from other people. But some people cannot tell the difference between high-quality information and low-quality information, or might make wrong use even of high-quality information, reasoning themselves into biased conclusions. An AI system capable of absorbing the entire internet but terrible at telling good ideas from bad ideas won't make too much of a splash (at least not in terms of being able to take over the world). But what about an AI system just slightly above some cleverness threshold for adopting an increasingly efficient information diet? Couldn’t it absorb the internet in a highly systematic way rather than just soaking in everything indiscriminately, learning many essential meta-skills on its way, improving how it goes about the task of further learning?

Small differences in learning potential have compounded benefits over time

If the child in the chair next to me in fifth grade was slightly more intellectually curious, somewhat more productive, and marginally better dispositioned to adopt a truth-seeking approach and self-image than I am, this could initially mean they score 100%, and I score 95% on fifth-grade tests – no big difference. But as time goes on, their productivity gets them to read more books, their intellectual curiosity and good judgment get them to read more unusually useful books, and their cleverness gets them to integrate all this knowledge in better and increasingly more creative ways. I’ll reach a point where I’m just sort of skimming things because I’m not motivated enough to understand complicated ideas deeply, whereas they find it rewarding to comprehend everything that gives them a better sense of where to go next on their intellectual journey. By the time we graduate university, my intellectual skills are mostly useless, while they have technical expertise in several topics, can match or even exceed my thinking even on areas I specialized in, and get hired by some leading AI company. The point being: an initially small difference in dispositions becomes almost incomprehensibly vast over time.

Knowing how to learn strategically: A candidate for secret sauce??

(I realized that in this title/paragraph, the word "knowing" is meant both in the sense of "knowing how to do x" and "being capable of executing x very well." It might be useful to try to disentangle this some more.) The standard AI foom narrative sounds a bit unrealistic when discussed in terms of some AI system inspecting itself and remodeling its inner architecture in a very deliberate way driven by architectural self-understanding. But what about the framing of being good at learning how to learn? There’s at least a plausible-sounding story we can tell where such an ability might qualify as the “secret sauce" that gives rise to a discontinuity in the returns of increased AI capabilities. In humans – and admittedly this might be too anthropomorphic – I'd think about it in this way: If my 12-year-old self had been brain-uploaded to a suitable virtual reality, made copies of, and given the task of devouring the entire internet in 1,000 years of subjective time (with no aging) to acquire enough knowledge and skill to produce novel and for-the-world useful intellectual contributions, the result probably wouldn’t be much of a success. If we imagined the same with my 19-year-old self, there’s a high chance the result wouldn’t be useful either – but also some chance it would be extremely useful. Assuming, for the sake of the comparison, that a copy clan of 19-year olds can produce highly beneficial research outputs this way, and a copy clan of 12-year olds can’t, what does the landscape look like in between? I don’t find it evident that the in-between is gradual. I think it’s at least plausible that there’s a jump once the copies reach a level of intellectual maturity to make plans which are flexible enough at the meta-level and divide labor sensibly enough to stay open to reassessing their approach as time goes on and they learn new things. Maybe all of that is gradual, and there are degrees of dividing labor sensibly or of staying open to reassessing one’s approach – but that doesn’t seem evident to me. Maybe this works more as an on/off thing.

How could natural selection produce on/off abilities?

It makes sense to be somewhat suspicious about any hypotheses according to which the evolution of general intelligence made a radical jump in Homo sapiens, creating thinking that is "discontinuous" from what came before. If knowing how to learn is an on/off ability that plays a vital role in the ways I described above, how could it evolve?
We're certainly also talking culture, not just genes. And via the Baldwin effect, natural selection can move individuals closer towards picking up surprisingly complex strategies via learning from their environment. At this point at latest, my thinking becomes highly speculative. But here's one hypothesis: In its generalization, this effect is about learning how to learn. And maybe there is something like a "broad basin of attraction" (inspired by Christiano's broad basin of attraction for corrigibility) for robustly good reasoning / knowing how to learn. Picking up some of the right ideas initially and early on, combined with being good at picking up things in general, produces in people an increasingly better sense of how to order and structure other ideas, and over time, the best human learners start to increasingly resemble each other, having honed in on the best general strategies.

The mediocre success of self-improvement literature

For most people, the returns of self-improvement literature (by which I mean not just productivity advice, but also information on "how to be more rational," etc.) might be somewhat useful, but rarely life-changing. People don’t tend to "go foom" from reading self-improvement advice. Why is that, and how does it square with my hypothesis above, that “knowing how to learn” could be a highly valuable skill with potentially huge compounding benefits? Maybe the answer is that the bottleneck is rarely knowledge about self-improvement, but rather the ability to make the best use of such knowledge? This would support the hypothesis mentioned above: If the critical skill is finding useful information in a massive sea of both useful and not-so-useful information, that doesn’t necessarily mean that people will get better at that skill if we gave them curated access to highly useful information (even if it's information about how to find useful information, i.e., good self-improvement advice). Maybe humans don’t tend to go foom after receiving humanity's best self-improvement advice because too much of that is too obvious for people who were already unusually gifted and then grew up in modern society where they could observe and learn from other people and their habits. However, now imagine someone who had never read any self-improvement advice, and could never observe others. For that person, we might have more reason to expect them to go foom – at least compared to their previous baseline – after reading curated advice on self-improvement (or, if it is true that self-improvement literature is often somewhat redundant, even just from joining an environment where they can observe and learn from other people and from society). And maybe that’s the situation in which the first AI system above a certain critical capabilities threshold finds itself. The threshold I mean is (something like) the ability to figure out how to learn quickly enough to then approach the information on the internet like the hypothetical 19-year olds (as opposed to the 12-year olds) from the thought experiment above.

---

Hard takeoff without a discontinuity

(This argument is separate from all the other arguments above.) Here’s something I never really understood about the framing of the hard vs. soft takeoff discussion. Let’s imagine a graph with inputs such as algorithmic insights and compute/hardware on the x-axis, and general intelligence (it doesn’t matter for my purposes whether we use learning potential or snapshot intelligence) on the y-axis. Typically, the framing is that proponents of hard takeoff believe that this graph contains a discontinuity where the growth mode changes, and suddenly the returns (for inputs such as compute) are vastly higher than the outside view would have predicted, meaning that the graph makes a jump upwards in the y-axis. But what about hard takeoff without such a discontinuity? If our graph starts to be steep enough at the point where AI systems reach human-level research capabilities and beyond, then that could in itself allow for some hard (or "quasi-hard") takeoff. After all, we are not going to be sampling points (in the sense of deploying cutting-edge AI systems) from that curve every day – that simply wouldn't work logistically even granted all the pressures to be cutting-edge competitive. If we assume that we only sample points from the curve every two months, for instance, is it possible that for whatever increase in compute and algorithmic insights we’d get in those two months, the differential on the y-axis (some measure of general intelligence) could be vast enough to allow for attaining a decisive strategic advantage (DSA) from being first? I don’t have strong intuitions about what the offense-defense balance will shift to once we are close to AGI, but it at least seems plausible that it turns more towards offense, in which case arguably a lower differential is needed for attaining a DSA. In addition, based on the classical arguments put forward by researchers such as Bostrom and Yudkowsky, it also seems at least plausible to me that we are potentially dealing with a curve that is very steep around the human level. So, if one AGI project is two months ahead of another project, and we for the sake of argument assume that there are no inherent discontinuities in the graph in question, it’s still not evident to me that this couldn’t lead to something that very much looks like hard takeoff, just without an underlying discontinuity in the graph.

Robby made this post with short perspectives from Nate and Eliezer: https://www.lesswrong.com/posts/X5zmEvFQunxiEcxHn

Copied here to make it easier to read (full text of the post):

This isn't a proper response to Paul Christiano or Katja Grace's recent writings about takeoff speed, but I wanted to cross-post Eliezer's first quick comments on Katja's piece someplace more linkable than Twitter:

There's a lot of steps in this argument that need to be spelled out in more detail. Hopefully I get a chance to write that up soon. But it already raises the level of debate by a lot, for which I am grateful.
E.g. it is not intuitive to me that "But evolution wasn't trying to optimize for STEM ability" is a rejoinder to "Gosh hominids sure got better at that quickly." I can imagine one detailed argument that this might be trying to gesture at, but I don't know if I'm imagining right.
Similarly it's hard to pin down which arguments say '"Average tech progress rates tell us something about an underlying step of inputs and returns with this type signature" and which say "I want to put the larger process in this reference class and demand big proof burdens."

I also wanted to caveat: Nate's experience is that the label "discontinuity" is usually assigned to misinterpretations of his position on AGI, so I don't want to endorse this particular framing of what the key question is. Quoting Nate from a conversation I recently had with him (not responding to these particular posts):

On my model, the key point is not "some AI systems will undergo discontinuous leaps in their intelligence as they learn," but rather, "different people will try to build AI systems in different ways, and each will have some path of construction and some path of learning that can be modeled relatively well by some curve, and some of those curves will be very, very steep early on (e.g., when the system is first coming online, in the same way that the curve 'how good is Google’s search engine' was super steep in the region between 'it doesn’t work' and 'it works at least a little'), and sometimes a new system will blow past the entire edifice of human knowledge in an afternoon shortly after it finishes coming online." Like, no one is saying that Alpha Zero had massive discontinuities in its learning curve, but it also wasn't just AlphaGo Lee Sedol but with marginally more training: the architecture was pulled apart, restructured, and put back together, and the reassembled system was on a qualitatively steeper learning curve.
My point here isn't to throw "AGI will undergo discontinuous leaps as they learn" under the bus. Self-rewriting systems likely will (on my models) gain intelligence in leaps and bounds. What I’m trying to say is that I don’t think this disagreement is the central disagreement. I think the key disagreement is instead about where the main force of improvement in early human-designed AGI systems comes from — is it from existing systems progressing up their improvement curves, or from new systems coming online on qualitatively steeper improvement curves?

Katja replied on Facebook: "FWIW, whenever I am talking about discontinuities, I am usually thinking of e.g. one system doing much better than a previous system, not discontinuities in the training of one particular system—if a discontinuity in training one system does not make the new system discontinuously better than the previous system, then I don't see why it would be important, and if it does, then it seems more relevant to talk about that."

The AISafety.com Reading Group discussed this article when it was published. My slides are here: https://www.dropbox.com/s/t0k6wn4q90emwf2/Takeoff_Speeds.pptx?dl=0

There is a recording of my presentation here: https://youtu.be/7ogJuXNmAIw

My notes from the discussion are reproduced below:

We liked the article quite a lot. There was a surprising number of new insights for an article purporting to just collect standard arguments.

The definition of fast takeoff seemed somewhat non-standard, conflating 3 things: Speed as measured in clock-time, continuity/smoothness around the threshold where AGI reaches human baseline, and locality. These 3 questions are closely related, but not identical, and some precision would be appreciated. In fairness, the article was posted on Paul Christianos "popular" blog, not his "formal" blog.

The degree to which we can build universal / general AIs right now was a point of contention. Our (limited) understanding is that most AI researchers would disagree with Paul Christianos about whether we can build a universal or general AI right now. Paul Christianos argument seem to rest on our ability to trade off universality against other factors, but if (as we believe) universality is still mysterious, this tradeoff is not possible.

There was some confusion about the relationship between "Universality" and "Generality". Possibly, a "village idiot" is above the level of generality (passes Turing test, can make coffee) whereas he would not be at the "Universality" level (unable to self-improve to Superintelligence, even given infinite time). It is unclear if Paul Christiano would agree to this.

The comparison between Humans and Chimpanzees was discussed, and related to the argument from Human Variation, which seems to be stronger. The difference between a village idiot and Einstein is also large, and the counter-argument about what evolution cares about seem to not hold here.

Paul Christiano asked for a canonical example of a key insight enabling an unsolvable problem to be solved. An example would be my Matrix Multiplication example (https://youtu.be/5DDdBHsDI-Y). Here, a series of 4 key insights turn the problem from requiring a decade, to a year, to a day, to a second. While the example is not canonical nor precisely what Paul Christiano asks for, it does point to a way get intution about the "key insight": Grab a paper and a pen, and try to do matrix multiplication faster than O(n^3). It is possible, but far from trivial.

For the deployment lag ("Sonic Boom") argument, a factor that can complicate the tradeoff is "secrecy". If deployment cause you to lose the advantages of secrecy, the tradeoffs described could look much worse.

A number of the arguments for a fast takeoff did seem to aggregate, in one specific way: If our prior is for a "quite fast" takeoff, the arguments push us towards expecting a "very fast" takeoff. This is my personal interpretation, and I have not really formalized it. I should get around to that some day.

[edit: no longer endorse the original phrasing of my opening paragraph, but still seems useful to link to past discussion]

Some previous discussion about this topic was at:

https://www.lesswrong.com/posts/AfGmsjGPXN97kNp57/arguments-about-fast-takeoff#phQ3sZj7RmCDTjfvn

One key thing is that AFAICT, when Paul says 'slow takeoff' what he actually means is 'even faster takeoff, but without a sharp discontinuity', or something like that. So be careful about how you interpret the debate.

(I also think there's been fairly continuous debate throughout many other threads. Importantly, I don't think this is a single concrete disagreement, it's more like a bunch of subtle disagreements interwoven with each other. Many posts and threads (in LW and in other channels) seem to me to be about disentangling those disagreements.

I think the discussion of Paul's Research Agenda FAQ (NOT written by Paul), including the comment by Eliezer, is one of the more accessible instances of that, although I'm not sure who if it directly bears on your question)

When an intelligence builds another intelligence, in a single direct step, the output intelligence is a function of the input intelligence , and the resources used . . This function is clearly increasing in both and . Set to be a reasonably large level of resources, eg flops, 20 years to think about it. A low input intelligence, eg a dog, would be unable to make something smarter than itself. . A team of experts (by assumption that ASI is made), can make something smarter than themselves. . So there must be a fixed point. . The questions then become, how powerful is a pre fixed point AI. Clearly less good at AI research than a team of experts. As there is no reason to think that AI research is uniquely hard to AI, and there are some reasons to think it might be easier, or more prioritized, if it can't beat our AI researchers, it can't beat our other researchers. It is unlikely to make any major science or technology breakthroughs.

I recon that is large (>10) because on an absolute scale, the difference between an IQ 90 and an IQ120 human is quite small, but I would expect any attempt at AI made by the latter to be much better. In a world where the limiting factor is researcher talent, not compute, the AI can get the compute it needs for in hours (seconds? milliseconds??) As the lumpiness of innovation puts the first post fixed point AI a non-exponentially tiny distance ahead, (most innovations are at least 0.1% that state of the art better in a fast moving field) then a handful of cycles or recursive self improvement (<1 day) is enough to get the AI into the seriously overpowered range.

The question of economic doubling times would depend on how fast an economy can grow when tech breakthroughs are limited by human researchers. If we happen to have cracked self replication at about this point, it could be very fast.

AFAICT Paul's definition of slow (I prefer gradual) takeoff basically implies that local takeoff and immediate unipolar outcomes are pretty unlikely. Many people still seem to put stock in local takeoff. E.g. Scott Garrabrant. Zvi and Eliezer have said they would like to write rebuttals. So I'm surprised by the scarcity of disagreement that's written up.