One can apply similar methodological arguments to a different problem and test whether they persist. The amount of civilisations not in the Solar Systems is thought to be estimatable via the Drake equation. Drake's original estimates implied that the Milky Way contains between 1K and 100M civilisations. The only ground truth that we know is the fact that we have yet to find any reliable evidence of such civilisations. But I don't understand where the equation itself is erroneous.
Returning to the AI timeline crux, Cotra's idea was the following. TAI is created once someone spends enough compute. Cotra's main idea was that compute_required = (compute_under_2020_knowledge)/(knowledge_factor(t)), compute_affordable increases exponentially before the bottlenecks related to the world's economy. The estimates on compute_affordable required mankind only to keep track of who produces it, how it is done and who is willing to pay. A similar procedure was done in the AI-2027 compute forecast.
Then Cotra proceeded to wildly misestimate. Her idea of the knowledge factor was that it makes creation of the TAI twice easier every 2-3 years, which I doubt that I would understand for reasons described in the collapsed sections. Cotra's ideas on compute_under_2020_knowledge are total BS for reasons I detailed in another comment. Therefore, I fail to understand where Cotra was mistaken aside from using parameters that are total BS. Nor do I think that if Cotra's model was correct aside from BSed parameters, it wouldn't be a natural move to correct the parameters.
Cotra's rationalisation for the TAI to become twice as easy to create every few years
I consider two types of algorithmic progress: relatively incremental and steady progress from iteratively improving architectures and learning algorithms, and the chance of “breakthrough” progress which brings the technical difficulty of training a transformative model down from “astronomically large” / “impossible” to “broadly feasible.”
For incremental progress, the main source I used was Hernandez and Brown 2020, “Measuring the Algorithmic Efficiency of Neural Networks.” The authors reimplemented open source state-of-the-art (SOTA) ImageNet models between 2012 and 2019 (six models in total). They trained each model up to the point that it achieved the same performance as AlexNet achieved in 2012, and recorded the total FLOP that required. They found that the SOTA model in 2019, EfficientNet B0, required ~44 times fewer training FLOP to achieve AlexNet performance than AlexNet did; the six data points fit a power law curve with the amount of computation required to match AlexNet halving every ~16 months over the seven years in the dataset. They also show that linear programming displayed a similar trend over a longer period of time: when hardware is held fixed, the time in seconds taken to solve a standard basket of mixed integer programs by SOTA commercial software packages halved every ~13 months over the 21 years from 1996 to 2017.
Grace 2013 (“Algorithmic Progress in Six Domains”) is the only other paper attempting to systematically quantify algorithmic progress that I am currently aware of, although I have not done a systematic literature review and may be missing others. I have chosen not to examine it in detail because a) it was written largely before the deep learning boom and mostly does not focus on ML tasks, and b) it is less straightforward to translate Grace’s results into the format that I am most interested in (“How has the amount of computation required to solve a fixed task decreased over time?”). Paul is familiar with the results, and he believes that algorithmic progress across the six domains studied in Grace 2013 is consistent with a similar but slightly slower rate of progress, ranging from 13 to 36 months to halve the computation required to reach a fixed level of performance.
This means that the compute required is halving every 16 months for AlexNet, every 13 months for linear programming. While Claude Opus 4.5 does seem to think that Paul's belief is close to what Grace's paper implies, the paper's relevance is likely undermined by Cotra's own criticism. Next Cotra listed actual assumptions for each of the models, including the two clearly BSed ones. I marked her ideas on when the required compute is halved in bold:
Cotra's actual assumptions
I assumed that:
Training FLOP requirements for the Lifetime Anchor hypothesis (red) are halving once every 3.5 years and there is only room to improve by ~2 OOM from the 2020 level -- moving from a median of ~1e28 in 2020 to ~1e26 by 2100.
Training FLOP requirements for the Short horizon neural network hypothesis (orange) are halving once every 3 years and there is room to improve by ~2 OOM from the 2020 level -- moving from a median of ~1e31 in 2020 to ~3e29 by 2100.
Training FLOP requirements for the Genome Anchor hypothesis (yellow) are halving once every 3 years and there is room to improve by ~3 OOM from the 2020 level -- moving from a median of ~3e33 in 2020 to ~3e30 by 2100.
Training FLOP requirements for the Medium-horizon neural network hypothesis (green) are halving once every 2 years and there is room to improve by ~3 OOM from the 2020 level -- moving from a median of ~3e34 in 2020 to ~3e31 by 2100.
Training FLOP requirements for the Long-horizon neural network hypothesis (blue) are halving once every 2 years and there is room to improve by ~4 OOM from the 2020 level -- moving from a median of ~1e38 in 2020 to ~1e34 by 2100.
Training FLOP requirements for the Evolution Anchor hypothesis (purple) are halving once every 2 years and there is room to improve by ~5 OOM from the 2020 level -- moving from a median of ~1e41 in 2020 to ~1e36 by 2100.
As far as Kokotajlo's memory can be trusted after the ChatGPT moment, he thought that there would be a 50% chance to reach AGI in 2030.
Except that @Daniel Kokotajlo wrote an entire sequence where the only post published after the ChatGPT moment is this one. Kokotajlo's sequence was supposed to explain that Cotra's distribution of training compute for a TAI created by 2020's ideas is biased towards requiring far more compute than is actually needed.
Kokotajlo's quote related to Cotra's errors
Ajeya's timelines report is the best thing that's ever been written about AI timelines imo. Whenever people ask me for my views on timelines, I go through the following mini-flowchart:
1. Have you read Ajeya's report?
--If yes, launch into a conversation about the distribution over 2020's training compute and explain why I think the distribution should be substantially to the left, why I worry it might shift leftward faster than she projects, and why I think we should use it to forecast AI-PONR instead of TAI.
--If no, launch into a conversation about Ajeya's framework and why it's the best and why all discussion of AI timelines should begin there.
However, Kokotajlo's comment outright claims that everyone's timelines should be a variation of Cotra's model.
Kokotajlo praising Cotra's model
So, why do I think it's the best? Well, there's a lot to say on the subject, but, in a nutshell: Ajeya's framework is to AI forecasting what actual climate models are to climate change forecasting (by contrast with lower-tier methods such as "Just look at the time series of temperature over time / AI performance over time and extrapolate" and "Make a list of factors that might push the temperature up or down in the future / make AI progress harder or easier," and of course the classic "poll a bunch of people with vaguely related credentials."
There's something else which is harder to convey... I want to say Ajeya's model doesn't actually assume anything, or maybe it makes only a few very plausible assumptions. This is underappreciated, I think. People will say e.g. "I think data is the bottleneck, not compute." But Ajeya's model doesn't assume otherwise! If you think data is the bottleneck, then the model is more difficult for you to use and will give more boring outputs, but you can still use it. (Concretely, you'd have 2020's training compute requirements distribution with lots of probability mass way to the right, and then rather than say the distribution shifts to the left at a rate of about one OOM a decade, you'd input whatever trend you think characterizes the likely improvements in data gathering.)
The upshot of this is that I think a lot of people are making a mistake when they treat Ajeya's framework as just another model to foxily aggregate over. "When I think through Ajeya's model, I get X timelines, but then when I extrapolate out GWP trends I get Y timelines, so I'm going to go with (X+Y)/2." I think instead everyone's timelines should be derived from variations on Ajeya's model, with extensions to account for things deemed important (like data collection progress) and tweaks upwards or downwards to account for the rest of the stuff not modelled.
What Kokotajlo likely means by claiming that "Ajeya's timelines report is the best thing that's ever been written about AI timelines imo" is that Cotra's model was a good model where the values of parameters are total BS. Her report consists of four parts. In my opinion, two clearest example of BS are the genome anchor which had the transformative model "have about as many parameters as there are bytes in the human genome (~7.5e8 bytes)" and the evolution anchor which claimed "that training computation requirements will resemble the amount of computation performed in all animal brains over the course of evolution from the earliest animals with neurons to modern humans". This is outright absurd since each animal has its own brain which is trained independently of others, and yet the human brain's architecture and training are outright enough to have a chance to create the AI who does what a human genius can do.
It could also be interesting to model potential memetic evolution. Suppose that the model is pretrained on a mixed dataset where some documents describe aligned AIs, while others describe misaligned[1] AIs. Then another model is pretrained on another mixed dataset where the ratio of aligned-to-misaligned is determined by the previous model's choices, etc. In the end, will the equilibrium be closer to aligned models or to misaligned ones?
I also suspect that one might be able to control the misalignment type, but I don't understand whether this effect is detectable in the regime you describe. Were the model to believe that misaligned AIs decide to become superintelligent teachers instead of superintelligent servants, it might rule against commiting genocide or disempowering the humans.
As far as I understand, * means something that something that one would want, agree, decide, etc. under ideal reflection conditions (e.g. knowing most plausibly relevant arguments, given a long time to think, etc) See, e.g. the CEV as defined not in relation to alignment targets or Wei Dai's metaethical alternatives 3-5.
Could you share your model, if you haven't modeled it in an incognito chat? And don't forget to check out my comment below. If you are right, then the alternate Claude who succeeded at 15-second-long tasks would have a lower 50% time horizon.
P.S. I also asked the AIs a similar question. Grok 4 failed to answer, Gemini 3 Pro estimated the revised 80% horizon as 35-40 mins and the 50% horizon as 6-7 hrs. GPT-5's free version gave me this piece of slop, EDIT: Claude Opus 4.5 estimated the revised 80% horizons at 30-45 or 45-60 mins and 50% horizons as 2-3 hrs or 3-4 hrs.
This makes me think of the previous model with the biggest 50%/80% time horizon ratio, Grok 4. It had funny failures at 2 sec, 2min and 2h long tasks. What if an alternate-universe Claude who, like GPT-5.1-Codex-Max, succeeded at ALL tasks shorter than a minute, would have achieved a far bigger 80% time horizon? And if GPT-5.2 and Gemini 3 Pro had the failures at less-than-a-minute-long tasks ironed out, as happened with GPT-5 vs Grok 4?
EDIT: in theory, the alternate Claude could also end up with a worse 50% time horizon. But the real Claude succeeded on a quarter of 2-4 hr long tasks and about a half of 4-16 hr long tasks.
Kokotajlo's team has already prepared a case Against Misalignment As "Self-Fulfilling Prophecy". Additionally, I doubt that "it would be nearly impossible to elicit malicious intent without simultaneously triggering a collapse in capability". Suppose that the MechaHitler persona, unlike the HHH persona, is associated with being stupidly evil because there were no examples of evil genius AIs. Then the adversary trying to elicit capabilities could just have to ask the AI something along the lines of preparing the answer and roleplaying a genius villain. It might also turn out that high-dose RL and finetuning on solutions with comments undermines the link between the MechaHitler persona and lack of capabilities.