Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

First things first: this post is about timelines (i.e. what year we first get transformative AGI), not takeoff speeds (i.e. foom vs gradual takeoff).

Claim: timelines are mostly not strategically relevant to alignment research, i.e. deciding what to work on. Why? Because at any given time, it would take ~18 months to take whatever our current best idea is, implement it, do some basic tests, and deploy it. (Really it probably takes less than 6 months, but planning fallacy and all that.) If AGI takeoff is more than ~18 months out, then we should be thinking “long-term” in terms of research; we should mainly build better foundational understanding, run whatever experiments best improve our understanding, and search for better ideas. (Note that this does not necessarily mean a focus on conceptual work; a case can be made that experiments and engineering feedback are the best ways to improve our foundational understanding.)

What about strategic decisions outside of object-level research? Recruitment and training strategies for new researchers might depend on how soon our investments need to pay off; do we look for a brilliant young person who will need three or five years of technical study before they’re likely to make any important progress, or a more experienced person who can make progress right now but is probably already near their skill ceiling? How much should existing technical researchers invest in mentoring new people? Those are questions which depend on timelines, but the relevant timescale is ~5 years or less. If AGI is more than ~5 years out, then we should probably be thinking “long-term” in terms of training; we should mainly make big investments in recruitment and mentorship.

General point: timelines are usually only decision-relevant if they’re very short. Like 18 months, or maybe 5 years for relatively long-term investments. The difference between e.g. 10 years vs 30 years vs 100 years may matter a lot for our chances of survival (and the difference may therefore be highly salient), but it doesn’t matter for most actual strategic decisions.

Meta note: there's a lot of obvious objections which I expect to address in the comments; please check if anyone has posted your objection already.

New Comment
36 comments, sorted by Click to highlight new comments since:

I think the timelines (as in, <10 years vs 10-30 years) are very correlated with the answer to "will first dangerous models look like current models", which I think matters more for research directions than what you allow in the second paragraph.
  
For example, interpretability in transformers might completely fail on some other architectures, for reasons that have nothing to do with deception. The only insight from the 2022 Anthropic interpretability papers I see having a chance of generalizing to non-transformers is the superposition hypothesis / SoLU discussion.

Yup, I definitely agree that something like "will roughly the current architectures take off first" is a highly relevant question. Indeed, I think that gathering arguments and evidence relevant to that question (and the more general question of "what kind of architecture will take off first?" or "what properties will the first architecture to take off have?") is the main way that work on timelines actually provides value.

But it is a separate question from timelines, and I think most people trying to do timelines estimates would do more useful work if they instead explicitly focused on what architecture will take off first, or on what properties the first architecture to take off will have.

I think that gathering arguments and evidence relevant to that question (and the more general question of "what kind of architecture will take off first?" or "what properties will the first architecture to take off have?") is the main way that work on timelines actually provides value.

Uh, I feel the need to off-topically note this is also the primary way to accidentally feed the AI industry capability insights. Those won't even have the format of illegible arcane theoretical results, they'd be just straightforward easy-to-check suggestions on how to improve extant architecture. If they're also backed by empirical evidence, that's your flashy demos stand-in right there.

Not saying it shouldn't be done, but here be dragons.

I think timelines are a useful input to what architecture takes off first. If the timelines are short, I expect AGI to look like something like DL/Transformers/etc. If timelines are longer there might be time for not-yet-invented architectures to take off first. There can be multiple routes to AGI, and "how fast do we go down each route" informs which one happens first.

Correlationally this seems true, but causally it's "which architecture takes off first?" which influences timelines, not vice versa.

Though I could imagine a different argument which says that timeline until the current architecture takes off (assuming it's not superseded by some other architecture) is a key causal input to "which architecture takes off first?". That argument I'd probably buy.

I definitely endorse the argument you'd buy, but I also endorse a broader one. My claim is that there is information which goes into timelines which is not just downstream of which architecture I think gets there first.

For example, if you told me that humanity loses the ability to make chips "tomorrow until forever" my timeline gets a lot longer in a way that isn't just downstream of which architecture I think is going to happen first. That then changes which architectures I think are going to get there first (strongly away from DL) primarily by making my estimated timeline long enough for capabilities folks to discover some theoretically-more-efficient but far-from-implementable-today architectures.

I think that gathering arguments and evidence relevant to that question . . . is the main way that work on timelines actually provides value.

I think policy people think timelines work is quite decision-relevant for them; I believe work on timelines mainly/largely provides value by informing their prioritization.

Relatedly, I sense some readers of this post will unintentionally do a motte-and-bailey with a motte of "timelines are mostly not strategically relevant to alignment research" and a bailey of "timelines are mostly not strategically relevant."

What are the main strategic decisions policy people face right now, and how are timelines relevant to those decisions?

Things like "buy all the chips/chip companies" still seem like they only depend on timelines on a very short timescale, like <5 years. Buy all the chips, and the chip companies will (1) raise prices (which I'd guess happens on a timescale of months) and (2) increase production (which I'd guess happens on a timescale of ~2 years). Buy the chip companies, and new companies will enter the market on a somewhat slower timescale, but I'd still guess it's on the order of ~5 years. (Yes, I've heard people argue that replacing the full stack of Taiwan semi could take decades, but I don't expect that the full stack would actually be bought in a "buy the chip companies" scenario, and "decades" seems unrealistically long anyway.)

None of this sounds like it depends on the difference between e.g. 30 years vs 100 years, though the most ambitious versions of such strategies could maybe be slightly more appealing on 10 year vs 30 year timelines. But really, we'd have to get down to ~5 before something like "buy the chip companies" starts to sound like a sufficiently clearly good idea that I'd expect anyone to seriously consider it.

Individual contributions to some tech breakthrough can be roughly sorted temporally in order of abstraction with the earliest exploratory contributions coming from scientist types and the final contributions (which get most of the reward) coming from research engineer types. Case examples - by the time the wright brothers were experimenting with early flyer designs the basic principles of aerodynamics had already been worked out decades earlier and the time window of influence was already long closed.

So if AGI is close (< 10 years) most all the key early exploratory theoretical work is already done - which would be things like bayesian statistics, optimization, circuit theory, key neuroscience etc. And in that case it's probably not wise to start spinning up a new research career that won't produce output for another 5 years, as the time window of influence closes well before the winning team actually writes their first line of code.

If AGI is immanent (< 5 years), the best strategy is perhaps to rapidly transform into more of a research engineer, and or network to relevance somehow.

Agreed with the sentiment, though I would make a weaker claim, that AGI timelines are not uniquely strategically relevant, and the marginal hour of forecasting work at this point is better used on other questions.

My guess is that the timelines question has been investigated and discussed so heavily because for many people it is a crux for whether or not to work on AI safety at all - and there are many more such people than there are alignment researchers deciding what approach to prioritize. Most people in the world are not convinced that AGI safety is a pressing problem, and building very robust and legible models showing that AGI could happen soon is, empirically, a good way to convince them.

In my view, there are alignment strategies that are unlikely to pay off without significant time investment, but which have large expected payoffs. For example, work on defining agency seems to fit this category.

There are also alignment strategies that have incremental payoffs, but still seem unsatisfactory. For example, we could focus on developing better AI boxing techniques that just might buy us a few weeks. Or we could discover likely takeover scenarios, and build warnings for them. 

There's an analogy for this in self driving cars. If you want to ship an impressive demo right away, you might rely on a lot of messy case handling, special road markings, mapping, and sensor arrays. If you want to solve self driving in the general case, you'd probably be developing really good end to end ML models. 

In my view, there are alignment strategies that are unlikely to pay off without significant time investment, but which have large expected payoffs. For example, work on defining agency seems to fit this category.

Yup, that's a place where I mostly disagree, and it is a crux. In general, I expect the foundational progress which matters mostly comes from solving convergent subproblems ( = subproblems which are a bottleneck for lots of different approaches). Every time progress is made on one of those subproblems, it opens up a bunch of new strategies, and therefore likely yields incremental progress. For instance, my work on abstraction was originally driven by thinking about agent foundations, but the Natural Abstraction Hypothesis is potentially relevant to more incremental strategies (like interpretability tools or retargeting the search).

Insofar as work on e.g. defining agency doesn't address convergent subproblems, I'm skeptical that the work is on the right path at all; such work is unlikely to generalize robustly. After all, if a piece of work doesn't address a shared bottleneck of a bunch of different strategy-variations, then it's not going to be useful for very many strategy-variations.

at any given time, it would take ~18 months to take whatever our current best idea is, implement it, do some basic tests, and deploy it

I'd be interested in your reasoning on that.

That's just generally how long relatively-complicated software engineering projects take. As a general rule, if a software project takes longer than 18 months, it's because the engineers ran into unsolved fundamental research problems. (Or because of managerial incompetence/organizational dysfunction, but I'm assuming a reasonably competent team.)

. As a general rule, if a software project takes longer than 18 months, it's because the engineers ran into unsolved fundamental research problems

Shouldn't 18 months be a upper bound, rather than your estimate, on the length of a software project then? 

It is an upper bound. That's why I said in the OP:

(Really it probably takes less than 6 months, but planning fallacy and all that.)

Ah, I see. Thanks.

Same - also interested if John was assuming that the fraction of deployment labor that is automated changes negligibly over time pre-AGI.

That's an interesting and potentially relevant question, but a separate question from timelines, and mostly not causally downstream of timelines.

Not sure if it was clear, but the reason I asked was because it seems like if you think the fraction changes significantly before AGI, then the claim that Thane quotes in the top-level comment wouldn't be true.

Oh, I see. Certainly if the time required to implement our current best idea goes down, then the timescale at which we care about timelines becomes even shorter.

Why? Because at any given time, it would take ~18 months to take whatever our current best idea is, implement it, do some basic tests, and deploy it.

I don't understand how the existence of this gap makes you think that timelines don't matter. 

If your timelines are short, shouldn't an active priority be to eg. reduce that gap? 

Funding a group of engineers to actively take our best current alignment idea and integrate it into prod models might be a waste of time on long timelines, but very important on short timelines.

It's not that timelines don't matter at all, it's that they only matter at very short timescales, i.e. nontrivial probability of takeoff on the order of 18 months. The difference between 10 years and 100 years does not particularly matter. (And "difference between 10 years and 100 years" is much closer to what most timeline estimates look at; most people estimating timelines are very confident that takeoff won't happen in the next 18 months.)

To identify the crux here, would you care about timelines if it took five years to bring our best alignment idea to production? 

In that case, I would care about timelines insofar as there was significant uncertainty about the probability of takeoff on the order of 5 years. So I'd probably care a little about the difference between 10 years and 100 years, but still mostly not care about the difference between 30 years and 100 years.

It seems like your model is that we should be working in one of two modes:

  • Developing better alignment ideas
  • Implementing our current best alignment idea

However, in my model, there are a lot of alignment ideas which are only worth developing given certain timelines. [edit: Therefore, "you should be developing better alignment ideas anyway" is a very vague and questionably actionable strategy.]

Do you believe this is the crux? 

[+][comment deleted]10

Don't timelines change your views on takeoff speeds? If not, what's an example piece of evidence that updates your timelines but not your takeoff speeds?

That's not how the causal arrow works. There are some interesting latent factors which influence both timelines and takeoff speeds, like e.g. "what properties will the first architecture to take off have?". But then the right move is to ask directly about those latent factors, or directly about takeoff speeds. Timelines are correlated with takeoff speeds, but not really causally upstream.

I agree. I think of this as timelines not being particularly actionable: even in the case of a very short timeline of 5 years, I do not believe that the chain of reasoning would be "3.5 years ago I predicted 5 years, and I also predicted 1.5 years to implement the best current idea, so it is time to implement the best current idea now."

Reasoning directly from the amount of time feels like a self-fulfilling prophecy this way. On the other hand, it feels like the model which generated the amount of time should somehow be strategically relevant. On the other other hand my model has quite collapsed in on itself post-Gato, so my instinct is probably truthfully the reverse: a better sense of what is strategically relevant to alignment causes more accurate timelines through a better generative model.

I do not believe that the chain of reasoning would be "3.5 years ago I predicted 5 years, and I also predicted 1.5 years to implement the best current idea, so it is time to implement the best current idea now."

Why not? 

Chiefly because this is walking face-first into a race-to-the-bottom condition on purpose. There is a complete lack of causal information here.

I should probably clarify that I don't believe this would be the chain of reasoning among alignment motivated people, but I can totally accept it from people who are alignment-aware-but-not-motivated. For example, this sort of seems like the thinking among people who started OpenAI initially.

A similar chain of reasoning an alignment motivated person might follow is: "3.5 years ago I predicted 5 years based on X and Y, and I observe X and Y are on on track. Since I also predicted 1.5 years to implement the best current idea, it is time to implement the best current idea now."

The important detail is that this chain of reasoning rests on the factors X and Y, which I claim are also candidates for being strategically relevant.

[+][comment deleted]4-1