Taboo "compute overhang"

Zach Stein-Perlman

Taboo "compute overhang" — LessWrong

21 Taboo "compute overhang"

1st Mar 2023

1 min read

21

There's no consensus definition of "compute overhang" or "hardware overhang." See e.g. 1 2 3 4 5, and I've recently seen researchers use several different definitions. And I asked some friends what "hardware overhang" means and they had different responses (a plurality said it means sufficient hardware for human-level AI already exists, which is not a useful concept). If you say "compute overhang" without clarification, many people will misunderstand you.

Instead of tabooing it, we could declare a canonical definition. I think the best candidate is something like: there is a compute overhang to the extent that the largest training runs could quickly be scaled up. But I think it's probably better for people to avoid the term or define it whenever they use it.

P.S. You should also define "takeoff speed," "transformative AI," and "warning shot" whenever you're using them in a precise way; their usage varies a lot too. (Using those terms to vaguely gesture at "the speed of progress around human-level AI," "powerful AI," and "legible scary AI event," respectively, seems ok).

Computing OverhangAI TakeoffAI

Frontpage

21

Taboo "compute overhang"

New Comment

8 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:00 PM

[-]Lukas_Gloor3y164

And I asked some friends what "hardware overhang" means and they had different responses (a plurality said it means sufficient hardware for human-level AI already exists, which is not a useful concept).

It's not a useful concept if we can't talk about the probability of "finding" particularly efficient AGI architectures through new insights. However, it seems intelligible and strategically important to talk about something like "the possibility that we're one/a few easy-to-find insight(s) away from suddenly being able to build AGI with a much smaller compute budget than the largest training runs to date." That's a contender for the concept of "compute overhang." (See also my other comment.)

Maybe what you don't like about this definition is that it's inherently fuzzy: even if we knew everything about all possible AGI architectures, we'd still have uncertainty about how long it'll take AI researchers to come up with the respective insights. I agree that this makes the concept harder to reason about (and arguably less helpful).

[-]A Ray3y70

I like pointing out this confusion. Here's a grab-bag of some of the things I use it for, to try to pull them apart:

actors/institutions far away from the compute frontier produce breakthroughs in AI/AGI tech (juxtaposing "only the top 100 labs" vs "a couple hackers in a garage")
once a sufficient AI/AGI capability is reached, that it will be quickly optimized to use much less compute
amount of "optimization pressure" (in terms of research effort) pursuing AI/AGI tech, and the likelihood that they missed low-hanging fruit
how far AI/AGI research/products are away from the highest-value-marginal-use of compute, and how changes making AI/AGI the biggest marginal profit of compute would change things
the legibility of AI/AGI research progress (e.g. in a high-overhang world, small/illegible labs can make lots of progress)
the likelihood of compute-control interventions to change the trajectory of AI/AGI research
the asymmetry between compute-to-build (~=training) and compute-to-run (~=inference) of advanced AI/AGI technology

probably also others im forgetting

[-]Lukas_Gloor3y73

I agree that it seems best for people to define the concept whenever they use it.

Instead of tabooing it, we could declare a canonical definition. I think the best candidate is something like: there is a compute overhang to the extent that the largest training runs could quickly be scaled up.

This proposal has little to do with hard vs. soft takeoff, which (IIRC) was the context in which Bostrom used "hardware overhang" in Superintelligence.

One thing that made the discussion confusing is that Bostrom originally discussed hard vs. soft takeoff as having relevance only after we build AGI, whereas Paul Christiano's view on soft takeoff introduced the idea that "takeoff" already starts before AGI.

This made me think that it could be useful to distinguish between "post-AGI" and "pre-AGI" compute overhangs. It could go as follows:

Pre-AGI compute overhang:

There's a pre-AGI compute overhang to the degree that the following could happen: we invent an algorithm that will get us to AGI before we scale up training runs to the biggest attainable sizes (on some short timescale).

So, on this definition, there are two ways in which we might already be in a pre-AGI compute overhang:

(1) Timelines are very short and we could get AGI with "current algorithms" (not necessarily GPT_n with zero tweaks, but obvious things to try that require no special insight) with less scaling effort than a Manhattan project.

(2) We couldn't get AGI with current algorithms via any less-than-maximal scaling effort (and maybe not even with a maximal one – that part isn't relevant for the claim), but there are highly significant algorithmic insights waiting for us (that we have a realistic chance of discovering). Once we incorporate these insights, we'll be in the same situation as described in (1).

Post-AGI compute overhang:

Once we build AGI with realistic means, using far-from-optimal algorithms, how much room is there for it to improve its algorithms during "takeoff"/intelligence explosion? "Post-AGI compute overhang" here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.

[Edit: Correction: "Post-AGI compute overhang" here describes the gap in "intelligence" of the first AGI vs. the "intelligence" of a more efficient design (using the same amount of training compute as that first AGI) that AI-aided progress could quickly discover.]

On that definition, it's actually quite straightforward that shorter timelines imply a smaller compute overhang (so maybe that's what Sam Altman meant here).

[-]Zach Stein-Perlman3y42

Yeah, these seem like useful concepts in some contexts too.

I don't understand this sentence:

"Post-AGI compute overhang" here describes the gap between compute used to build AGI in the first place vs. more efficient designs that AI-aided progress could quickly discover.

It's the gap between the training compute of 'the first AGI' and what?

[-]Lukas_Gloor3y40

I don't understand this sentence:

Oh, yeah, I butchered that entire description.

It's the gap between the training compute of 'the first AGI' and what?

What I had in mind was something like the gap between how much "intelligence" humans get from the compute they first build AGI with vs. how much "intelligence" AGI will get out of the same compute available, once it optimizes software progress for a few iterations.

So, the "gap" is a gap of intelligence rather than compute, but it's "intelligence per specified quantity of compute." (And that specified quantity is how much compute we used to build AGI in the first place.)

[-][anonymous]3y10

whereas Paul Christiano's view on soft takeoff introduced the idea that "takeoff" already starts before AGI.

I came up with a cool term I hope to see used for this. "Singularity Criticality". In my mind I'm seeing plutonium start to glow as it edges over the line for a critical mass.

What causes this is that AGI is not really a singleton, it is an integrated set of separate components that individually handle different elements of the AGI's cognition. Note that even "AGI from scaled up LLMs" will still have multiple components: multiple buffers, specialized vision and motion planning modules, long term memory storage, tool modules, and so on.

As a result, long before we know how to build the integrated system, we will have separate "AGI grade" components, and this is the present reality. We have many RL agents that are superhuman in ability and thus AGI grade.

Using those components we can automate/accelerate some of the tasks needed to reach AGI, so progress accelerates even without AGI existing. The existence of pre-AGI POC modules also increases human effort, financial investment, and increase in the production of compute hardware.

Anyways Singularity Criticality is empirical reality, it's observable.

[-]Josh Snider3y10

(a plurality said it means sufficient hardware for human-level AI already exists, which is not a useful concept)

That seems like a useful concept to me. What's your argument it isn't?

[-]Zach Stein-Perlman3y1-2

Briefly: with arbitrarily good methods, we could train human-level AI with very little hardware. Assertions about hardware are only relevant in the context of the relevant level of algorithmic progress.

Or: nothing depends on whether sufficient hardware for human-level AI already exists given arbitrarily good methods.

(Also note that what's relevant for forecasting or decisionmaking is facts about how much hardware is being used and how much a lab could use if it wanted, not the global supply of hardware.)

Moderation Log