Five Hinge‑Questions That Decide Whether AGI Is Five Years Away or Twenty
For people who care about falsifiable stakes rather than vibes TL;DR All timeline arguments ultimately turn on five quantitative pivots. Pick optimistic answers to three of them and your median forecast collapses into the 2026–2029 range; pick pessimistic answers to any two and you drift past 2040. The pivots (I think) are: 1. Which empirical curve matters (hardware spend, algorithmic efficiency, or revenue) 2. Whether software‑only recursive self‑improvement (RSI) can accelerate capabilities faster than hardware can be installed. 3. How sharply compute translates into economic value once broad “agentic” reliability is reached. 4. Whether automating half of essential tasks ignites runaway growth or whether Baumol’s law keeps aggregate productivity anchored until all bottlenecks fall 5. How much alignment fear, regulation, and supply‑chain friction slow scale‑up The rest of this post traces how the canonical short‑timeline narrative AI 2027 and the long‑timeline essays by Ege Erdil and Zhendong Zheng + Arjun Ramani diverge on each hinge, and proposes concrete bets that will force regular public updates. Shared premises * Six doublings in frontier training compute between GPT‑2 (2019) and GPT‑4 (2023) * GPT‑4‑level systems demonstrably replace some cognitive tasks * Alignment is non‑trivial; nobody claims a free deployment lunch Agreement in the forecasting/timelines community ends at the tempo question. Hinge #1: Which curve do we extrapolate? The first divide concerns what exactly we should project into the future. Short‑timeline advocates emphasise frontier‑training compute and algorithmic efficiency, or even just the general amalgamation of all benchmarks as "intelligence extrapolation". They point to six straight doublings in effective training FLOP between GPT‑2 and GPT‑4, and they cite scaling‑law papers showing a 1.6x yearly reduction in compute required to reach any fixed loss. This is the engine behind the claim in AI 2027, that “CapEx gr
Yes - the general argument is "task length isn't sufficiently correlated with actual use for remote work, so you also need to look at other things" (see the EpochAI post on this)