This is a replication and distillation of the extensive analysis made by Ajeya Cotra’s Draft report on AI timelines. In this work, I aim at making the argument as concise and clear as possible, while keeping the core considerations of the original report.
If the following assumptions are true:
Then AI labs will be able to develop human-level AI when amounts of compute comparable to the compute used by nature to produce human brains will be available, or earlier (if labs are much more efficient than that).
This gives us upper bounds on how soon human-level AI will be possible to develop, by forecasting when the compute used on the biggest training runs and will match the estimated amount of compute needed to create adult human brains.
In this work, I re-derive simple estimates for compute requirements (using estimates of how many FLOPs the human brain uses) and forecasts (by extrapolating current trends in compute price reduction and spending increases).
The graph below shows how much compute will be available during this century for different levels of compute price reduction and maximum amounts of money which will be eventually be spent on compute.
In the next sections, I will explain how these values were estimated. You can make your own estimates using small Colab notebook.
Future compute spent is estimated by breaking down this quantity in the following way:
Compute spent = fraction of GDP spent on the biggest AI training run (in 2020 dollars) x FLOPs per fraction of GDP (in FLOP per 2020 dollars)
FLOPs are hardware FLOPs and do not take into account algorithmic progress. See the “simplifications” section for a justification of this choice.
2020 spend on the biggest training run was around $10M (for GPT-3), and it currently seems to triple every three years according to Epoch. (A trend slightly broken by speculations on current price of Gemini and GPT-4, but this doesn’t change the conclusion much).
This trend can’t go on forever, and I model future compute spend as an exponential decay: S=M−Ce−(y−y0)/λ
The only missing parameter, given the initial constraints, is the maximum amount M of compute which will eventually be spent on an AI training run.
I give results for two scenarios:
In principle, this spend could be much higher if spread out over many years, but given capital costs, spent below 100B seems likely if human-level AI hasn’t been reached or isn’t about to be reached.
Current compute costs are around 10^17 FLOP/$ in 2020 according to the bio anchors’ report (Epoch estimates it at 10^18 in 2023 using a similar estimate method but without incorporating hardware utilization, so I stick with the former). According to Epoch, FLOP/s/$ is increasing by a factor of 1.32 every year (same numbers in the bio anchors’ report). Because here the unit is 2020 dollars, price drop should take into account economic growth.
I model compute cost reduction as an exponential decay: dFLOP/$dy=m+Ce−(y−y0)/γ. (A sigmoid would probably be a more realistic fit, but the exponential decay is simpler and has one fewer hard-to-guess parameter.)
Assuming that increases in FLOP/s/$ and FLOPs/$ are roughly proportional (which is inaccurate, but good enough given the other uncertainties at play), I give results for the following scenarios:
Here, I give two separate estimates:
After consulting with experts, Joe Carlsmith estimates that the brain uses roughly between 10^14 and 10^16 FLOP/s. Given that there are 10^9 seconds in a (30-year) human life, human brains develop using (at most) 10^23-10^25 FLOPs. I give results for this range, as well as the surrounding ranges (algorithms 100x more efficient than the brain, and 100x less efficient than the human brain).
Maybe we’ll be able to find something as efficient as the brain’s genome without much compute, but maybe we won’t. But doing as well as evolution, a much simpler program, should be possible.
I give results for very simple estimates of how much compute evolution requires, with ranges spanning 4 OOMs: 2 for uncertainties about how many OOMs the brain uses, and 2 for uncertainties about how efficient AI labs will be at emulating brains, centered at “as efficient”.
It’s likely AI labs will find procedures much more efficient than full evolution of mammals, which should increase our credence in scenarios with smaller requirements.
The main simplification is that I expose many scenarios and ranges, and I don’t attempt to put probabilities on them. For the sake of expressing upper-bounds on timelines, I think the mapping from scenarios to compute requirements ranges is better than raw probability distribution on years.
For the sake of simplicity, I dropped many considerations from the original report which seemed either fuzzy, weak, or unimportant to me:
As a consequence, this work doesn’t rely at all on the current deep learning paradigm beyond supposing that GPU-compute will be the bottleneck.
Thanks to Ajeya Cotra for writing the original report on bio anchors and for giving some feedback on this distillation.
This work is a small side-project and doesn’t represent the views or priorities of my employer.
(using estimates of how many FLOPs the human brain uses)
Which estimates in particular? What number do you use, 10^15?
Joe Carlsmith estimates that the brain uses roughly between 10^14 and 10^16 FLOP/s
Cool, thanks. One of the cruxes between me & Ajeya is that I think the performance of the GPT series so far is evidence that AGIs will need less flops than 10^15, maybe more like 10^13 or so. (GPT-3 is 10^11)
Note that I don't include a term for "compute inefficiency relative to the brain" (which kicks in after the 10^15 estimate in Ajeya's report). This is both because I include this inefficiency in the graph (there are ranges for 1% & 100x) and because I ignore algorithmic efficiency improvements. The original report down weights the compute efficiency of human-made intelligence by looking at how impressive current algorithms look compared to brain, while I make the assumption that human-made intelligence and human brains will probably look about as impressive when we have the bare metal FLOPs available. So if you think that current algorithms are impressive, it matters much less for my estimation than for Ajeya's!
This is why my graph already starts at 10^24 FLOPs, right in the middle of the "lifetime anchor" range! (Note: GPT-3 is actually 2x less than 10^24 FLOPs, and Palm is 2x more than that, but I have ~1 OOM uncertainties around the estimate for the lifetime compute requirements anyway.)