The ARC-AGI team evaluated Claude Sonnet 4.5. On the ARC-AGI-1 leaderboard, OpenAI's models o4-mini, o3 and GPT-5 formed a nearly straight line. Claude Sonnet 4.5 was slightly below said line when thinking in 1K,4K or 16K tokens and held the line when thinking in 8K or 32K tokens. This could imply that the benchmark has scaling laws for non-distilled models, and OpenAI and Anthropic reached these laws.
On the ARC-AGI-2 leaderboard Claude Sonnet 4.5 Thinking became the new leader between $0.142/task and $0.8/task; it also completed 13.6% of tasks, which is the biggest result among LLMs except[1] for Grok 4 which also cost $2.17/task.
The distance between GPT-5 and Claude Sonnet 4.5 release dates is 53 days, while the distance between o3 and Claude Opus 4 is 36 days. While 53 days instead of 36 days are a likely result of the post-o3 slowdown, Claude ended up scoring ~1.3 times more both times. [2]
Were research taste similar to the ARC-AGI-2 benchmark, Claude would have achieved a ~1.3 times bigger acceleration of AI research than GPT-N if both companies had superhuman coders. I suspect that Claude would reduce the period of AI R&D necessary for Agent-3 to become Agent-4 while potentially lengthening the period necessary for Agent-3 to automate AI research. What does it all mean for inter-company proxy wars? That OpenAI will lose them to Anthropic? That the two companies will race towards the AGI and misalign both AIs?
There are also two experimental systems created solely for the benchmark by E.Pang and J.Berman.
Historically, the ARC-AGI-1 benchmark had far bigger delays between OpenAI's models reaching a level and Anthropic models outreaching it; the o1mini-Claude Sonnet 3.7 pair and o3mini-Claude Sonnet 4 pair were separated by, respectively, 165 and 111 days.
The AI sycophancy-related trance is probably one of the worst news in AI alignment. About two years ago someone proposed to use prison guards to ensure that they aren't CONVINCED to release the AI. And now the AI demonstrates that its primitive version can hypnotise the guards. Does it mean that human feedback should immediately be replaced with AI feedback or feedback on tasks with verifiable reward? Or that everyone should copy the KimiK2 sycophancy-beating approach? And what if it instills the same misalignment issues in all models in the world?
Alternatively, someone proposed a version of the future where the humans are split between revering different AIs. My take on writing scenarios has a section where the American AIs co-research and try to co-align the successor to their values. Is it actually plausible?
The scenarios related to futures of mankind with the AI race[1] by now either lack concrete details, like the take of Yudkowsky and Soares or the story about AI taking over by 2027, or are reduced to modifications of the AI-2027 forecast due to the immense amount of work that the AI Futures team did.
The AI-2027 forecast in a nutshell can be described as follows. The USA's leading company and China enter the AI race, the American rivals are left behind, the Chinese ones are merged. By 2027[2] the USA creates a superhuman coder, China steals it, and the two rivals automate AI research with the USA's leading company having just twice as much compute and moving just twice as fast as China. Once the USA creates a superhuman AI researcher, Agent-4, the latter decides to align Agent-5 to Agent-4, but is[3] caught.
Agent-4 is put on trial. In the Race Ending[4] it is found innocent. Since China cannot afford to slow down without falling further behind, China races ahead in both endings. As a result, the two agents perform AI takeover.[5]
In the Slowdown Ending, however, Agent-4 is put on suspicion, loses the shared memory bank and the ability to coordinate. Then new evidence appears, Agent-4 is found guilty and interrogated. After that, Safer-1 becomes fully transparent because it uses a faithful CoT.[6] The American leading AI company is merged with former rivals, and the union does create a fully aligned[7] Safer-2, who in turn creates superintelligence. Then the superintelligence receives China from the Chinese counterpart of Agent-4 and turns the lightcone into utopia for some people who end up being the public.[8]
The authors have tried to elicit feedback and even agreed that timeline-related arguments change the picture. Unfortunately, as I described here, the authors saw so little feedback that @Daniel Kokotajlo ended up thanking the two authors whose responses were on the worse side.
However, the AI-2027 forecast does admit modifications. It stands on the five pillars: compute, timelines, takeoff speed, goals and security.
What else could modify the scenario? The appearance of another company with, say, ХерняGPT-neuralese?[12]
Here I leave out the future's history assuming solved alignment or the AI and Leviathan scenario where there is no race with China because the scenario was written in 2023, but engineers decide to create the ASI in 2045 without having solved alignment.
Even the authors weren't so sure about the year of arrival of superhuman coders. And the timelines were pushed, presumably to 2032 with a chance of a breakthrough believed to be 8%/yr. I and Seth Herd doubt the latter digit.
The prediction that Agent-4 will be caught is doubted even by the forecast's authors.
Which would also happen if Agent-4 wasn't caught. However, the scenario where Agent-4 was never misaligned is likely the vision of AI companies.
While the forecast has the AIs destroy mankind and replace it with pets, takeover could have also ended with the AI disempowering humans.
Safer-1 is supposed to accelerate AI research 20 times in comparison with AI research with no help of the AIs. What I don't understand is how a CoT-based agent can achieve such an acceleration.
The authors themselves acknowledge that the Slowdown Ending "makes optimistic technical alignment assumptions".
However, the authors did point out the possibility of a power grab and link to the Intelligence Curse in a footnote. In this case the Oversight Committee constructs its version of utopia or the rich's version of utopia where people are reduced to their positions.
I did try to explore the issue myself, but this was a fiasco.
Co-deployment was also proposed by @Cleo Nardo more than two months later.
While Alvin Anestrand doesn't consider the Race Ending, he believes that it becomes less likely due to the chaos brought by rogue AIs.
Which is a parody on Yandex.
It looks as if scaling laws of various benchmarks tend to be multilinear:
EDIT: added two links on images illustrating the patterns related to the two ARC-AGI benchmarks.
While GPT-5's horizon of 137 mins continued the slower trend since o3, it might be the result of spurious failures, without which GPT-5 could've reached a horizon of 161 min, which is almost on par with Greenblatt's prediction.
The ARC-AGI leaderboard got an update. IIRC, the base LLM Qwen3-235b-a22b Instruct (25/07) is the first Chinese model to excel at the Pareto frontier. Or is it likely to be closely matched by the West, as happened with DeepSeek R1 (released on Jaunary 20?) and o3-mini (January 31)? And is China likely to cheaply create higher-level models like an analogue of o3 BEFORE the West? If China does, then how are the two countries to reach the Slowdown Ending?
The two main problems with the slowdown ending of the AI-2027 scenario are the two optimistic assumptions, which I plan to cover in two different posts.
Why does the Race Ending of the AI-2027 Forecast claim that "there are compelling theoretical reasons to expect no aliens for another fifty million light years beyond that"? If it's false, then sapient alien lifeforms should also be moral patients in a way. For example, this implies that all or almost all resources in their home system (and, apparently, some part of space around them) should belong to them, not to humans or a human-aligned AI. And that's ignoring the possibility that humans encounter a planet having the chance to generate a sapient lifeform...
If we were to view raising the humans from birth to adulthood and training the AI agents from birth to deployment as similar processes, then what human analogues do the six goal types from the AI-2027 forecast have? The analogues of developers are, obviously, the adults who have at least partial control over the human's life. Then the analogues of written Specs and developer-intended goals are the adults' intentions; the analogues of reward/reinforcement seems to be short-term stimuli and the morals of one's communities. I also think that the best analogue for proxies and/or convergent goals is possession of resources (and knowledge, but the latter can be acquired without ethical issues), while the 'other goals' are, well, ideologies, morality[1] and tropes absorbed from the most concentrated form of training data available to humans, which is speech in all its forms.
What exactly do the analogies above tell us about the perspectives of alignment? The possession of resources is the goal behind aggressive wars, colonialism and related evils[2]. If human culture managed to make them unacceptable, then does it imply that the AI will also not try the AI takeover?
I also think that humans rarely develop their own moral codes or ideologies; instead, they usually adopt some moral code or ideology close to the one existing in the "training data". Could anyone comment on this?
And crimes, but criminals, unlike colonizers, also try to avoid conflicts with the law enforcers that have at least similar power.