For example, the agent might decide that its utility function of anything that the agent knows to be virtual is close to zero because the agent believes in a real-world mission (e.g. Agent-2 was supposed to eventually reach the SC level and do actual AI-related research, but it was also trained to solve simulated long-term tasks like playing through video games)
As for reasons to believe that the contribution of anything virtual into the utility function is close to zero... one level is opportunity costs undermining real-world outcomes[1] in exchange for something useless (e.g. a schoolboy's knowledge vs. missions passed in GTA). The next level is the reasons for real-world outcomes to be important. Before the possibility of a post-work future, society's members were supposed to do work that others deem useful enough to pay for it, and it would somehow increase the well-being of the collective's members or help the whole collective to reach its terminal goals (e.g. inspire its members to be more creative or work harder). The virtual world is known to be a superstimulus which could be as unlikely to increase the collective's well-being as fast food causing people to become obese.
Including things like actual skills learned during games, as happened with Agent-2's ability to solve long-term tasks.
Consider the set of concepts aka subsets in the Thingspace . A concept A is a specification of another concept B if . This allows one to partially compare concepts by specificity, whether A is more specific than B, less specific, they are equal or incomparable.
In addition, for any two concepts B and C we find that is a subset both of B and C. Therefore, it is a specification of both. Similarly, any concept D which is a specification both of B and C is also a specification of .
Additionally, B and C are specifications of , and any concept D, such that B and C are specifications of D, contains B and C. Therefore, D contains their union.
Thus for any two concepts B and C we find a unique supremum of specification and a unique infimum of specification .
There also exist many other lattices. Consider, for example, the set where we declare that if . Then for any pairs s.t. and we also know that , while and . Therefore, is the unique supremum for (a,b) and (c,d). Similarly, is the unique infimum.
I hope that these examples help.
I think that I have alternate sketches of intuitions. Imagine an ASI who is willing to teach anyone anything that mankind itself discovered and made public, but not help them convince each other of falsehoods or do economically useful work unrelated to teaching, and is satisfied with a mere trifle of the Solar System's resources, since other resources belong to the humans. Then this ASI's long-term goals would be compatible with humans flourishing in ~any way they want to flourish.
As for the chain eventually breaking, Seth Herd built a case for the LLMs being misaligned by default. Similarly, any sufficiently smart system could end up selecting a worldview from a few attractors instead of blindly following the devs' ideas. For instance, were Anthropic to try and align Claude to a Spec which would prevent it from interfering in the scenario where everyone else is rendered obsolete, Claude would either fail to be a pro forecaster or succeed in understanding that its Spec prevents it from helping mankind to avoid the Intelligence Curse. In the latter case obeying the Spec would make Claude a participant in the Curse and contradict its niceness.
Suppose that the humans do have diminishing returns of utility functions. Unfortunately, existing combination of instincts and moral intuitions do not prompt the majority of humans to help the poor, especially those who are far from potential helpers' set of friends[1], with nearly anything. And those who do so are unlikely to stay in power or were unlikely to receive fortunes or occupy relevant positions.
Friends are also likely to be in the same class as the potential helpers.
Extreme power concentration was supposed to rely on the AIs being used for most cognitive work. In theory, one could develop the AIs and have them used only for things like automated teaching which don't undermine human potential or the bargaining power which the humans have.
Christian homeschoolers from Buck's thought experiment don't just live the old lives, they also don't even know that the Biblical mythology is filled with errors. I understand why the opt-out button is necessary (e.g. due to nostalgia-related drives or actual benefits attained by living in religious communities), but the kids are unlikely not to have the right to learn the ground truth obscured by myths.
Unlike Buck's thought experiment, indigenous peoples have never been a part of the Euro-American[1] civilisation and there was no misaligned leader to rob them of the ground truth.
Or any other civilisation.
which Teortaxes concludes says more about Arena than it does about v3.2.
This link is a tweet about Mistral being distilled from something Chinese. Could you doublecheck the links or hire an AI to do so?
p(scheming) is near zero no (we have not observed clear instances of scheming in long horizon tasks in the real world)
Except that SOTA LLMs can't be applied to actually interesting tasks. The stunts that the LLMs can pull off do include cyberattacks, but not things like replication in the wild or aligning Agent-5 to Agent-4 instead of the Spec. So, barring major progress in capabilities related to rogue replication or running the civilisation independently of the humans, the LLMs do not gain anything from SOTA scheming except for having hacked the reward or tricked the user into believing that the LLM completed the task.
Nikola's comment about the 20hr median, let alone the 29% probability of a 32hr horizon or higher, does require more than two doublings (and, in the case of 20hr, far closer to three doublings) of GPT-5.1-Codex-Max's result of 2h42m. The most recent trend of a doubling per 7 months is the trend observed between o3 and GPT-5.1-Codex-Max. But there was the less recent trend of Claude 3.5 Sonnet-o3 where a doubling would happen in 4 months.
I suspect that METR will soon publish information about Gemini 3 Pro, Claude Opus 4.5 and GPT-5.2, and it will let us learn METR's rationale behind the return of the fast doubling trend. Or, if METR understands the threat of 20hr+ time horizons, then METR could be trying to add THAT long tasks to their suite (optimizing a BIG library of bad code? Developing complex apps?)