Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is the first in a series of pieces taking a stab at dealing with a conundrum:

  • I believe this could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar future.
  • But when it comes to what actions we can take to help such a development go well instead of poorly, it’s hard to say much (with a few exceptions). This is because many actions that would be helpful under one theory of how things will play out would be harmful under another (for example, see my discussion of the “caution” frame vs. the “competition” frame).

It seems to me that in order to more productively take actions (including making more grants), we need to get more clarity on some crucial questions such as “How serious is the threat of a world run by misaligned AI?” But it’s hard to answer questions like this, when we’re talking about a development (transformative AI) that may take place some indeterminate number of decades from now.

This piece introduces one possible framework for dealing with this conundrum. The framework is AI strategy nearcasting: trying to answer key strategic questions about transformative AI, under the assumption that key events (e.g., the development of transformative AI) will happen in a world that is otherwise relatively similar to today's. One (but not the only) version of this assumption would be “Transformative AI will be developed soon, using methods like what AI labs focus on today.”

The term is inspired by nowcasting. For example, the FiveThirtyEight Now-Cast projects "who would win the election if it were held today,” which is easier than projecting who will win the election when it is actually held. I think imagining transformative AI being developed today is a bit much, but “in a world otherwise relatively similar to today’s” seems worth grappling with.

Some potential benefits of nearcasting, and reservations

A few benefits of nearcasting (all of which are speculative):

As with nowcasting, nearcasting can serve as a jumping-off point. If we have an idea of what the best actions to take would be if transformative AI were developed in a world otherwise similar to today’s, we can then start asking “Are there particular ways in which we expect the future to be different from the nearer term, that should change our picture of which actions would be most helpful?”

Nearcasting can also focus our attention on the scenarios that are not only easiest to imagine concretely, but also arguably “highest-stakes.” Worlds in which transformative AI is developed especially soon are worlds in which the “nearcasting” assumptions are especially likely to hold - and these are also worlds in which we will have especially little time to react to crucial developments as they unfold. They are thus worlds in which it will be especially valuable to have thought matters through in advance. (They are also likely to be worlds in which transformative AI most “takes the world by surprise,” such that the efforts of people paying attention today are most likely to be disproportionately helpful.)

Nearcasting might give us a sort of feedback loop for learning: if we do nearcasting now and in the future, we can see how the conclusions (in terms of which actions would be most helpful today) change over time, and perhaps learn something from this.

  • I could easily imagine that we will learn more from repeated attempts to ask “What should we do if transformative AI is just around the corner?”, whose conclusions will change as the world changes, than from a persistent effort to refine our answer to “What should we do under our best-guess picture of the world, in which transformative AI is (for example) 30 years away from today?”
  • To be clear, I don’t think this is a “great” or even “good” feedback loop for learning, but in the space of transformative AI strategy, any such feedback loops are hard to come by, so I see some appeal here.

A major reservation about nearcasting (as an activity in general, not relative to forecasting) is that people are arguably quite bad at reasoning about future hypothetical scenarios,1 and most of the most impressive human knowledge to date arguably has relied heavily on empirical observations and experimentation. AI strategy nearcasting seems destined to be vastly inferior to e.g. good natural sciences work, in terms of how much we can trust its conclusions.

I think this reservation is valid as stated, and I expect any nearcast to look pretty silly in at least some respects with the benefit of hindsight. But I still think nearcasting (and, more generally, analyzing hypothetical future scenarios with transformative AI2) is probably under-invested in today:

  • The potential stakes are enormous. Even a small amount of additional clarity about how to make the best of the most important century would be immensely valuable.
  • I don’t think the case for “people are bad at reasoning about hypothetical future scenarios” or “people are bad at reasoning about the long-run future” is all that strong, (more). Moreover (as also noted at that link), past attempts at futurism didn’t make use of many of what appear to be “forecasting best practices” today, such as assigning quantitative probabilities to predictions. The science of forecasting is young, and there’s probably a lot of room for improvement on past attempts.
  • Finally, I am broadly against putting too much weight on “the track record of a particular kind of thinking,” mostly because I think such track records tend to be inconclusive.
    • I think that for almost any rules you can come up with about “what good intellectual inquiry looks like,” there’s been some good intellectual inquiry that breaks those rules.
    • I also think that (partly due to low-hanging-fruit dynamics) a lot of the most valuable innovations have historically come from something more like “Work on a very neglected question and try to ‘just be reasonable,’ without sweating whether one’s methodology has a good track record” than like “Use a super-established methodology that looks like how past innovations have happened.” (More)

I think part of the challenge of this kind of work is having reasonable judgment about which aspects of a hypothetical scenario are too specific to place big bets on, vs. which aspects represent relatively robust themes that would apply to many possible futures.

  • For example, I think it is reasonable to predict that the global average temperature will rise over the coming several decades, even as it is very difficult to predict whether it will be raining on some day even a few weeks away.
  • I have some views on what sorts of aspects of the future I’d generally expect to be tractable vs. intractable to predict,3 but I make no claim to have well-grounded takes here. I think that if there were more efforts at analyzing hypothetical future scenarios, we’d probably collectively learn more about such things.

The nearcast I’ll be discussing

Starting here, and continuing into future pieces, I’m going to lay out a “nearcast” that shares many (not all) assumptions with this piece by Ajeya Cotra (which I would highly recommend reading in full if you’re interested in the rest of this series), hereafter abbreviated as “Takeover Analysis.”

I will generally use the present tense (“in my scenario, the alignment problem is difficult”) when describing a nearcasting scenario, and since the whole story should be taken in the spirit of speculation, I’m going to give fewer caveats than I ordinarily would - I will often say something like “the alignment problem is difficult” when I mean something like “the alignment problem is difficult in my most easily-accessible picture of how things would go” (not something like “I know with confidence that the alignment problem will be difficult”).

The key properties of the scenario I’ll be considering are:

  1. Magma. For concreteness, we’re focused on a particular AI project or company, which I’ll call Magma (following Takeover Analysis).
  2. Transformative AI is knowably near, but not yet here. Magma’s leadership has good reason to believe that it can develop something like PASTA soon - say, within a year. (I also assume that it hasn’t unwittingly done so yet!)
  3. Human feedback on diverse tasks. The path to transformative AI revolves around what Takeover Analysis calls “human feedback on diverse tasks” (HFDT). You can think of this roughly as if (to oversimplify) we trained an AI by giving it a thumbs up when we approved of what it had done, and a thumbs down when we disapproved, and repeated this enough times that it was able to become very good at behaving in ways that would get a thumbs up, on a broad set of diverse tasks.
  4. Some (but not unlimited) deliberate delay is possible. Magma has some - but not unlimited - ability to delay development and/or deployment in order to take “extra” measures to reduce the risk of misaligned AI.
    1. That is, it’s not the case that “any delay in deploying a given type of AI system will simply mean a competitor does so instead.”
    2. This could be because Magma has significant advantages over its competitors (funding, algorithms, talent, etc.) and/or because its closest competitors have a similar level of caution to Magma.
    3. I’ll be imagining that Magma can reasonably expect to deploy a given AI system 6-24 months later than it would otherwise for purposes of safety, without worrying that a competitor would deploy a similar (less safe) system in that time.
  5. Otherwise, changes compared to today’s world are fairly minimal.
    1. Deep learning has evolved in many ways - for example, there might be new architectures that improve on the Transformer - but the fundamental paradigm is still basically that of models gaining capabilities via “trial-and-error.”4 More on this at Takeover Analysis.
    2. We still can’t say much about what’s going on inside a given AI model (most of what we know is that it learns from trial and error on tasks, and performs well on the tasks).
    3. AI systems have advanced to the point where there are many new commercial applications, and perhaps substantial economic impact, but we haven’t yet reached the point where AIs can fully automate scientific and technological advancement, where economic growth is “explosive” (greater than 30% per year, which would very roughly be 10x the current rate of global economic growth), or where AI systems can facilitate anything like a decisive strategic advantage.

I note that the combination of “Changes compared to today’s world are fairly minimal” with “Transformative AI is knowably near” implies a fast takeoff approaching: a very rapid transition from a world like today’s to a radically different world. This is (in my view) a key way in which reality is likely to differ from the scenario I’m describing.5

I still think this scenario is worth contemplating, for the following reasons:

  • I think this sort of fast takeoff is easier to concretely picture than most alternatives, precisely because it allows us to focus on a particular crucial period of time in which most things resemble how they are today. Generally, I think it is easiest to think about a relatively small number of relatively high-stakes developments (e.g., “We have a very powerful AI system that we can deploy or not”) - and then generalize this sort of analysis to scenarios that may revolve more around large numbers of individually low-stakes developments (e.g., “For the 20th time this year, we’re deciding whether to ship a system that is a small amount more capable, and more dangerous, than the ones already out there”), rather than starting by trying to discuss the latter.
  • Worlds in which transformative AI is developed sooner likely involve faster takeoffs, and scenarios generally more like the above, than other worlds. And these are also particularly “high-stakes” worlds, in the sense that thinking about them today is particularly likely to be helpful.
  • After examining this scenario, we can later ask how reality might diverge from it (as I plan to do in a future post). While I think there are likely to be important divergences, I think this scenario gives us a lot to work with and think about productively.

Next in series

  • The next piece I’d recommend reading is Takeover Analysis, which the rest of my series builds on. It illustrates why this nearcast involves a major risk of an AI takeover in the absence of specific preventative measures - a risk that is non-straightforward to diagnose or correct. I recommend reading the full post for a detailed picture of why this is. (A footnote addresses how this analysis differs from previous “alignment problem” discussions.6)
  • I will soon put up a piece laying out my understanding of what the best available measures would be for Magma to reduce the risk of misalignment in powerful systems it’s developing, and of how likely these measures would be to work.
  • After that, I will go beyond questions about the “alignment problem” (what technical measures can reduce risk of misalignment) and discuss the “deployment problem”: the question of what Magma (and a hypothetical agency, IAIA) should be seeking to use aligned AI systems to do (and whether/when to deploy them), under conditions of uncertainty about how safe they are and how close others are to deploying powerful AI of their own.
  • I’ll discuss which actions (including, but not limited to, AI alignment research) seem most helpful today if we expect the nearcast discussed here, and how we might expect matters to look systematically different if we relax the “nearcast” condition (imagining that transformative AI will take longer to develop, and be less likely to simply come from current techniques, than in this scenario).

Footnotes

  1. Though this is something I hear claimed a lot without necessarily much evidence; see my discussion of the track record of futurists here

  2. E.g., see this piece, this piece and Age of Em 

  3. For example, I generally feel better about predictions about technological developments (what will be possible at a particular price) than predictions about what products will be widely used, how lifestyles will change, etc.

    Because of this, I think “Extreme technology X will be available, and it seems clear and obvious that the minimum economic consequences of this technology or any technology that can do similar things would be enormous” is in many ways a better form of prediction than “Moderate technology Y will be available, and it will be superior enough to its alternatives that it will be in wide use.” This is some of why I tend to feel better about predictions about transformative AI (which I’d say are going out on a massive limb re: what will be possible, but less of one re: how significant such a thing would be) than about predictions of self-driving cars (which run into lots of questions about how technology will interact with the regulatory and economic environment). 

  4. Some readers have objected to the “trial and error” term, because they interpret it as meaning something like: “trying every possible action until something works, with no system for ‘learning’ from mistakes.” This is not how AI systems are generally trained - their training uses stochastic gradient descent, in which each “trial” comes with information about how to adjust an AI system to do better on similar inputs in the future. But I use the “trial and error” term because I think it is intuitive, and because I think the term generally is inclusive of this sort of thing (e.g., I think Wikipedia’s characterization of the term is quite good, and includes cases where each “trial” comes with information about how to do better in the future). 

  5. To be clear, I expect that the transition to transformative AI will be much faster than the vast majority of people imagine, but I don’t expect it will be quite as fast as implied here. 

  6. To my knowledge, most existing discussion of the alignment problem - see links at the bottom of this section for examples - focuses on abstract discussions of some of the challenges presented by aligning any AI system (things like “It’s hard to formalize the values we most care about”). Takeover Analysis instead takes a nearcasting frame, and walks mechanically through what it might look like to develop powerful AI with today’s methods. It discusses mechanically how this could lead to an AI takeover - as well as why the most obvious methods for diagnosing and preventing this problem seem like they wouldn’t work. 

79

Ω 37

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 9:01 PM

Would someone be able to clarify the difference between the term HFDT as used here and in the original "Takeover" post, and RLHF? 

My understanding is that HFDT doesn't assume an RL model. 

just noting that the hyperlink for "nowcasting" is broken

Fixed, thanks!

New to LessWrong?