I'm trying to look at how increasing model time-horizons amplifies AI researcher productivity, for example, if a researcher had a programming agent which could reliably complete programming tasks of length up to a week, would the researcher be able to just automate 1000s of experiments in parallel using these agents? Like, come up with a bunch of possibly-interesting ideas and just get the agent to iterate over a bunch of variations of each idea? Or are experiments overwhelmingly compute constrained rather than programming-time constrained?
Someone approaches you with a question:
"I have read everything I could find that rationalists have written on AI safety. I came across many interesting ideas, I studied them carefully until I understood them well, and I am convinced that many are correct. Now I'm ready to see how all the pieces fit together to show that an AI moratorium is the correct course of action. To be clear, I don't mean a document written for the layperson, or any other kind of introductory document. I'm ready for the real stuff now. Show me your actual argument in all its glory. Don't hold back."
After some careful consideration, you:
(a) helpfully provide a link to A List of Lethalities
(b) suggest that he read the sequences
(c) patiently explain that if he was smart enough to understand the argument then he would have already figured it out for himself
(d) leave him on read
(e) explain that the real argument was written once, but it has since been taken down, and unfortunately nobody's gotten around to rehosting it since
(f) provide a link to a page which presents a sound argument[0] in favour of an AI moratorium
===
Hopefully, the best response here is obvious. But currently no such page exists.
It's a stretch to expect to be taken seriously without such a page.
[0]By this I mean an argument whose premises are all correct and which collectively entail the conclusion that an AI moratorium should be implemented.
How good is the argument for an AI moratorium? Tools exist which would help us get to the bottom of this question. Obviously, the argument first needs to be laid out clearly. Once we have the argument laid out clearly, we can subject it to the tools of analytic philosophy.
But I've looked far-and-wide and, surprisingly, have not found any serious attempt at laying the argument out in a way that makes it easily susceptible to analysis.
Here’s an off-the-cuff attempt:
P1. ASI may not be far off
P2. ASI would be capable of exterminating humanity
P3. We do not know how to create an aligned ASI
P4. If we create ASI before knowing how to align ASI, the ASI will ~certainly be unaligned
P5. Unaligned ASI would decide to exterminate humanity
P6. Humanity being exterminated by ASI would be a bad thing
C. Humanity should implement a moratorium on AI research until we know how to create an aligned ASI
My off-the-cuff formulation of the argument is obviously far too minimal to be helpful. Each premise has a wide literature associated with it and should itself have an argument presented for it (and the phrasing and structure can certainly be refined).
If we had a canonical formulation of the argument for an AI moratorium, the quality of discourse would immediately, immensely improve.
Instead of constantly talking past each other, retreading old ground, and spending large amounts of mental effort just trying to figure out what exactly the argument for a moratorium even is, one can say “my issue is with P6”. Their interlocutor would respond “What’s your issue with the argument for P6?”, and the person would say “Subpremise 4, because it's question-begging”, and then they are in the perfect position for an actually very productive conversation!
I’m shocked that this project has not already been carried out. I’m happy to lead such a project if anyone wants to fund it.
With pre-RLVR models we went from a 36 second 50% time horizon to a 29 minute horizon.
Between GPT-4 and Claude-3.5 Sonnet (new) we went from 5 minutes to 29 minutes.
I've looked carefully at the graph, but I saw no signs of a plateau nor even a slowdown.
I'll do some calculation to ensure I'm not missing anything obvious or deceiving myself.
I don't any sign of a plateau here. Things were a little behind-trend right after GPT-4, but of course there will be short behind-trend periods just as there will be short above-trend periods, even assuming the trend is projectable.
I'm not sure why you are starting from GPT-4 and ending at GPT-4o. Starting with GPT-3.5, and ending with Claude 3.5 (new) seems more reasonable since these were all post-RLHF, non-reasoning models.
AFAIK the Claude-3.5 models were not trained based on data from reasoning models?
I don't think there was a plateau. Is there a reason you're ignoring Claude models?
Greenblatt's predictions don't seem pertinent.
There’s a high bar to clear here: LLM capabilities have so far progressed at a hyper-exponential rate with no signs of a slowdown [1].
So, an argument for the claim that we’re about to plateau has to be more convincing than induction from this strong pattern we’ve observed since at least the release of GPT-2 in February 2019.
Your argument does not pass this high bar. You have made the same kind of argument that has been made again and again (which have been proven wrong again and again) throughout the past seven years we have been scaling up GPTs.
One can’t simply point out the ways in which the things that LLMs cannot currently do are hard in a way in which the things that LLMs currently can do are not. Of course, the things they cannot do are different from the things they can. This has also been true of the capability gains we have observed so far, so it cannot be used as evidence that this observed pattern is unlikely to continue.
So, you would need to go further. You would need to demonstrate that they’re different in a way that meaningfully departs from how past, successfully gained capabilities differed from earlier ones.
To make this more concrete, claims based on supposed architectural limitations are not an exception to this rule: many such claims have been made in the past and proven incorrect. The base rate here is unfavourable to the pessimist.
Even solid proofs of fundamental limitations are not by their nature sufficient: these tend to be arguments that LLMs cannot do X by means Y, rather than arguments that LLMs cannot do X.
To be convincing, you have to make an argument that fundamentally differentiates your objection from past failed objections.
[1] based on METR's research https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Could there be an observation bias at play here? Could it be that most extremely beautiful women do live glamorous lives but you are not a part of those scenes?
The common theme here is that the capabilities frontier is more jagged than expected. So the way in which people modeled takeoff in the pre-LLM era was too simplistic.
Takeoff used to be seen as equivalent to the time between AGI and ASI.
In reality we got programmes which are not AGI, but do have capabilities that most in the past would have assumed to entail AGI.
So, we have pretty-general intelligence that's better than most humans in some areas, and is amplifying programming and mathematics productivity. So, I think takeoff has begun, but it's under quite different conditions than people used to model.
It's not clear to me that this is a strong enough theory to inform how we think about LLM psychosis. The gap between the two phenomena is just too big.
In fact, I'd probably characterise the yes-man situation as some form of delusion less extreme than psychosis.
The CEO's ideas begin grounded in reality (they must have been been in order for him to amass his yes-men). And even once surrounded by yes-men they remain constrained by the scope of his role as CEO, and hard data on profit, growth, market-share, etc. keen him tethered.
LLM psychosis is different because people jump into theories almost arbitrarily disconnected from reality, which are immediately amplified by the LLM, which affirms your ideas, provides additional evidence, lists 50 different reasons you're clearly right, etc.
I think the better parallel getting caught up in a conspiracy-theory, whose believers manage to contort any evidence so it confirms the theory, dismiss anyone who disagrees as brainwashed, have arguments which seem perfectly logical to anyone who doesn't happen to have specialised knowledge, etc.
This parallel seems potentially more informative.
Is this feeling reasonable?
A selfish person will take the gamble of 5% risk of death for a 95% chance of immortal utopia.
A person who tries to avoid moral shortcomings such as selfishness will reject the "doom" framing because it's just a primitive intelligence (humanity) being replaced with a much cleverer and more interesting one (ASI).
It seems that you have to really thread the needle to get from "5% p(doom)" to "we must pause, now!". You have to reason such that you are not self-interested but are also a great chauvinist for the human species.
This is of course a natural way for a subagent of a instrumentally convergent intelligence, such as humanity, to behave. But unless we're taking the hypocritical position where tiling the universe with primitive desires is OK as long as they're our primitive desires it seems that so-called doom is preferable to merely human flourishing.
So it seems that 5% is really too low a risk from a moral perspective, and an acceptable risk from a selfish perspective.