In the Sable story (IABIED), AI obtains dangerous capabilities such as self-exfiltration, virus design, persuasion, and AI research. It uses a combination of those capabilities to eventually conduct a successful takeover against humanity. Some have criticised that this apparently implies the AI achieving these capabilities suddenly with little warning. The argument goes that since we have seen gradual progress in the current LLM paradigm, we should expect that this is unlikely (Here is Will MacAskill making a version of this argument)[1]. I think this suffers from a confusion about underlying variables (e.g. AI intelligence increasing gradually) and the actual relevant capabilities as we care about them (can it outwit humanity, can it do AI research). Intuitively, if an AI system gains one IQ point each day, we should still expect a relatively sudden period in which it shifts from “it sometimes outwits us” to “it can reliably outwit us” despite gradual progress. The relevant question we want to answer is “can it outwit us?” not “how smart is it?”.
Despite intelligence rising gradually, relevant capabilities can appear suddenly.
Claude Code
There has recently been major excitement about Claude Code’s ability to automate much of software development. Cursor CEO Michael Truell reports that it was able to write a browser autonomously. There is however no sign that Opus 4.5 represents a paradigm shift, it was simply iterative progress over previous versions, so why does it feel like the system suddenly got much more capable despite incremental improvements? What happened is that even though an underlying variable (coding skills) went up gradually, the actual capability that we care about ("can it automate SE?") emerged much more suddenly when a critical threshold was crossed. While just recently early coding agents produced cascading errors and didn't speed up experienced developers this suddenly changed when an underlying threshold in reliability and skill was crossed.
GPT-1 released June 2018. GitHub Copilot launched June 2021. Opus 4.5 shipped November 2025. The flip from being somewhat useful (Claude 3.7, February 2025) to revolutionizing coding (Opus 4.5, November 2025) took 9 months. This is in fact a common pattern and this should suggest to us that dangerous capabilities might similarly emerge suddenly from gradual progress.
Horses and Chess
Many of you have probably seen Andy Jones’ intriguing post on horses and automobiles, where he argues apparently on the topic of job loss from AI: Steam engines were invented in 1700, 200 years of steady improvement of engine technology followed, For the first 120 years, horses didn't notice, but then between 1930 and 1950, 90% of US horses disappeared. According to Andy Jones, progress was steady, equivalence was sudden.
Computer chess has been tracked since 1985. For 40 years, engines improved by 50 Elo per year. In 2000, a grandmaster expected to win 90% of games against a computer. Ten years later, that same grandmaster would lose 90%.
Andy Jones notes about his work at Anthropic that LLMs started to answer questions of new hires: "In December, it was some of those questions. Six months later, 80% of the questions I'd been asked had disappeared."
Again, while progress on engine technology or chess was gradual, the actual capability that we care about — can this replace horses or can it beat the best humans at chess —suddenly emerged.
What makes an AI dangerous?
It appears clear that an AI system with capabilities such as self-exfiltration, virus design, persuasion, and AI research could be existentially dangerous. If each of them arrives relatively suddenly from gradual progress, we might be in a situation where suddenly the puzzle is completed and a critical mass of such capabilities is reached. With very little warning we could inadvertently add the last puzzle piece and switch from a passively safe to a dangerous regime.
None of these capabilities listed above appear wildly superhuman—all are at the top of human performance, perhaps sped-up or slightly above top humans. All are strongly correlated with the overall intelligence of the system. It seems quite natural to think of dangerous capabilities as the general ability of the AI to outwit humanity, this seems quite analogous to the ELO score of chess players, where we would expect a pretty sudden S-curve in win likelihood as the ELO of the AI player increases.
What are some Caveats?
These capabilities might not emerge at around the same time or in the same model and this could give us time to study them in isolation before the AI systems are dangerous. Because they are all tied to intelligence and don't appear wildly superhuman, it is plausible that they will arrive closely together, but it might not matter much if they don't.
For one, there are probably many dangerous capabilities we are not tracking, so we wouldn’t notice them appearing in the first place. As an example, this could be the ability to perform hypnosis on humans, or something similarly strange. Models have developed decent capabilities at jailbreaking other models without explicit training.
Many labs are very focused on RSI automated AI research capabilities, such as Anthropic or OpenAI. This acts like an amplifier, since the system can improve its own shortcomings. So once we reach dangerous levels of AI research capability, we might suddenly get a system with a wide range of dangerous capabilities in other areas.
Even if one crucial dangerous capability lags behind the others for much longer, it's unclear if that helps. While this would give us the ability to study the other dangerous capabilities in isolation, this also implies that then once the last puzzle piece is put in place, the other dangerous capabilities are already highly developed. As an example, if we imagine that it takes a lot longer to reach automated AI research capabilities than expected, it would give us more time to study the emergence of other dangerous capabilities such as persuasion. On the other hand, once we then get automated AI research capabilities, we would expect persuasion abilities to have had even much more time to strengthen.
It's also unclear what we would do with the information that our AI systems have increasingly dangerous capabilities, say, superhuman persuasion?
Conclusion
The same iterative progress that crossed the coding threshold will cross thresholds for persuasion, for autonomous AI research, for bio-research. While intelligence or science knowledge of models might increase gradually, we should expect a relatively sudden emergence of the relevant capabilities. Things can go very fast from “it can do this with some probability” to “it can reliably do this”.
What early warning signs should we look for? Perhaps a substantial AI research paper written by AI. Then most AI research could be done by AIs 9-12 months later. Could be this year.
In the Sable story (IABIED), AI obtains dangerous capabilities such as self-exfiltration, virus design, persuasion, and AI research. It uses a combination of those capabilities to eventually conduct a successful takeover against humanity. Some have criticised that this apparently implies the AI achieving these capabilities suddenly with little warning. The argument goes that since we have seen gradual progress in the current LLM paradigm, we should expect that this is unlikely (Here is Will MacAskill making a version of this argument)[1]. I think this suffers from a confusion about underlying variables (e.g. AI intelligence increasing gradually) and the actual relevant capabilities as we care about them (can it outwit humanity, can it do AI research). Intuitively, if an AI system gains one IQ point each day, we should still expect a relatively sudden period in which it shifts from “it sometimes outwits us” to “it can reliably outwit us” despite gradual progress. The relevant question we want to answer is “can it outwit us?” not “how smart is it?”.
Claude Code
There has recently been major excitement about Claude Code’s ability to automate much of software development. Cursor CEO Michael Truell reports that it was able to write a browser autonomously. There is however no sign that Opus 4.5 represents a paradigm shift, it was simply iterative progress over previous versions, so why does it feel like the system suddenly got much more capable despite incremental improvements? What happened is that even though an underlying variable (coding skills) went up gradually, the actual capability that we care about ("can it automate SE?") emerged much more suddenly when a critical threshold was crossed. While just recently early coding agents produced cascading errors and didn't speed up experienced developers this suddenly changed when an underlying threshold in reliability and skill was crossed.
GPT-1 released June 2018. GitHub Copilot launched June 2021. Opus 4.5 shipped November 2025. The flip from being somewhat useful (Claude 3.7, February 2025) to revolutionizing coding (Opus 4.5, November 2025) took 9 months. This is in fact a common pattern and this should suggest to us that dangerous capabilities might similarly emerge suddenly from gradual progress.
Horses and Chess
Many of you have probably seen Andy Jones’ intriguing post on horses and automobiles, where he argues apparently on the topic of job loss from AI: Steam engines were invented in 1700, 200 years of steady improvement of engine technology followed, For the first 120 years, horses didn't notice, but then between 1930 and 1950, 90% of US horses disappeared. According to Andy Jones, progress was steady, equivalence was sudden.
Computer chess has been tracked since 1985. For 40 years, engines improved by 50 Elo per year. In 2000, a grandmaster expected to win 90% of games against a computer. Ten years later, that same grandmaster would lose 90%.
Andy Jones notes about his work at Anthropic that LLMs started to answer questions of new hires: "In December, it was some of those questions. Six months later, 80% of the questions I'd been asked had disappeared."
Again, while progress on engine technology or chess was gradual, the actual capability that we care about — can this replace horses or can it beat the best humans at chess —suddenly emerged.
What makes an AI dangerous?
It appears clear that an AI system with capabilities such as self-exfiltration, virus design, persuasion, and AI research could be existentially dangerous. If each of them arrives relatively suddenly from gradual progress, we might be in a situation where suddenly the puzzle is completed and a critical mass of such capabilities is reached. With very little warning we could inadvertently add the last puzzle piece and switch from a passively safe to a dangerous regime.
None of these capabilities listed above appear wildly superhuman—all are at the top of human performance, perhaps sped-up or slightly above top humans. All are strongly correlated with the overall intelligence of the system. It seems quite natural to think of dangerous capabilities as the general ability of the AI to outwit humanity, this seems quite analogous to the ELO score of chess players, where we would expect a pretty sudden S-curve in win likelihood as the ELO of the AI player increases.
What are some Caveats?
These capabilities might not emerge at around the same time or in the same model and this could give us time to study them in isolation before the AI systems are dangerous. Because they are all tied to intelligence and don't appear wildly superhuman, it is plausible that they will arrive closely together, but it might not matter much if they don't.
For one, there are probably many dangerous capabilities we are not tracking, so we wouldn’t notice them appearing in the first place. As an example, this could be the ability to perform hypnosis on humans, or something similarly strange. Models have developed decent capabilities at jailbreaking other models without explicit training.
Many labs are very focused on RSI automated AI research capabilities, such as Anthropic or OpenAI. This acts like an amplifier, since the system can improve its own shortcomings. So once we reach dangerous levels of AI research capability, we might suddenly get a system with a wide range of dangerous capabilities in other areas.
Even if one crucial dangerous capability lags behind the others for much longer, it's unclear if that helps. While this would give us the ability to study the other dangerous capabilities in isolation, this also implies that then once the last puzzle piece is put in place, the other dangerous capabilities are already highly developed. As an example, if we imagine that it takes a lot longer to reach automated AI research capabilities than expected, it would give us more time to study the emergence of other dangerous capabilities such as persuasion. On the other hand, once we then get automated AI research capabilities, we would expect persuasion abilities to have had even much more time to strengthen.
It's also unclear what we would do with the information that our AI systems have increasingly dangerous capabilities, say, superhuman persuasion?
Conclusion
The same iterative progress that crossed the coding threshold will cross thresholds for persuasion, for autonomous AI research, for bio-research. While intelligence or science knowledge of models might increase gradually, we should expect a relatively sudden emergence of the relevant capabilities. Things can go very fast from “it can do this with some probability” to “it can reliably do this”.
What early warning signs should we look for? Perhaps a substantial AI research paper written by AI. Then most AI research could be done by AIs 9-12 months later. Could be this year.
Will still acknowledges things might go very fast months to years so this doesn't necessarily refute the point I am trying to make