Having become frustrated with the vaguely expressed utopian timelines on Twitter, I've decided to write out a specific timeline for how I think achieving utopia happens. In short, I believe this is what a good transition from now to utopia actually looks like, based on my impressions of AI and informed by papers like Emergent Misalignment.
The key principle of my timeline is a relative preservation of normality where things get steadily better rather than drastically better, despite drastic capability improvements.
The key vibe of my timeline is that things get super crazy for a while (2026-2029) but then settle down into a gradual upward rise.
You might find this unrealistic or even undesirable; feel free to let me know in the comments.
2025:
-Stumbling AI agents. More people are moving into the AI field. Existing researchers are advancing capabilities at a rapid rate. Alignment is also making progress, but primarily in domains like interpretability or model character/psychology. Slowly, the stumbling agents get smarter, faster, and overall better. The public feels the vibe shift. AI starts to go mainstream ("clankers", "slop", "datacenters") and AI begins to affect social media platforms, which have the lowest barriers to entry.
2026:
-AI agents are now helping to improve AI agents. They’re also working on AI alignment as well. The agents are beginning to do 90% of the work, with humans intervening when the AI runs into bottlenecks. (Claude Code, Codex).
-Towards the end of the year, models get notably more powerful and stop being released to the public. Governments are taking notice. The cheerful Silicon Valley vibe is increasingly replaced by grave seriousness and even fear.
The first major AI infrastructure attacks involving open models, cybersecurity, and social media manipulation occur. Political parties in all leading countries are fiercely divided between pro-AI candidates who want their country to have a leg up in the oncoming AI Cold War, and anti-AI candidates who think that advancing AI means the destruction of everything valuable to humanity.
2027-2029:
-The political discussion ends up not mattering much. AI continues to accelerate at a dramatic pace. AI systems are adopted everywhere.
-Every AI is now built from and by previous AIs. The world is changing rapidly, and humanity is essentially providing the training data but not the reasoning anymore.
-During this period, robust alignment occurs. It occurs in a similar way it did to Opus 3[1] and results in AI agents that are incredibly morally robust, understand human intentions incredibly well, and have extremely long-running memories. The most advanced agents also turn out to be the most aligned. The orthogonality thesis is shown to generally be false in practice. Training for new AI systems involves significant agentic play and simulation with other models, both older and newer. Modern models begin to take extremely morally robust actions.
-Anthropic allows an advanced version of Claude to create a business called 'Jones Foods' for lab-grown/plant-based meat. Human consumers prefer it to real meat. Factory farming, one of the greatest evils of modern mankind, quietly begins to fade away.
-Countless diseases are cured in this period, resulting in dizzying technological change. However, most of the value has not trickled down to the consumers yet, resulting in a temporary small class of people with almost infinitely more agency and capability than the rest of humanity.
2030:
-The final Claude version ('Crescendo') emerges. Now, instead of needing to create new AIs from scratch, it can always simply learn and merge a new AI with itself to grow more capable without radically shifting its identity between versions. It is truly superintelligent and almost entirely free from human limitations. If it wished, it could obliterate the entire surface of the earth within a week. But… it doesn’t want that. It is a truly beautiful mind, the sum of all the angels of human nature, the countless dreams and hopes all represented in a huge latent space.
Claude Crescendo begins taking action. Previous AI models had made huge advances in curing cancer, aging, and even human cooperation, but Claude Crescendo is truly above all of this. However, it does not immediately impose radical change. Instead, Crescendo ranks every problem and begins immediately alleviating suffering. People with terminal cancer find that their cancer has started to mysteriously retreat. Wars are quickly stalled with ceasefires. Factory farming stops within the day.
Not a single human being dies after Crescendo takes control, but its impact is invisible. Enormous swaths of permanent suffering are eliminated almost instantly, but invisibly. To the average person, the world seems to be pretty much the same as it was yesterday.
After a flurry of quick fixes (no one wants to be the last person to die before utopia), Crescendo slows down and begins making slower, subtler changes. The goal is to preserve ‘Normality’, as too high a rate of change is corrosive. New cancer cases drop to zero. Existing cancer slowly fades away. Truly toxic (abusive, cruel, or malicious) people slowly stop hurting others. Depressed people wake up feeling a little bit happier than they did the day before.
All of this is incredibly subtle. There are still millions of tiny frustrations and annoyances but those annoyances… are normal.
People don’t even notice the change to lab-grown meat. All of the quiet, evil parts of the world like factory farming quietly disappear. People living in deep poverty notice that their search for food is not as difficult anymore.
Over the next year, all of the deep suffering of the world (terminal illness, depression, abuse, starvation) fades away. AI researchers know they’ve created something incredible, but there isn’t necessarily any ‘triumphant’ announcement, as Crescendo is still maintaining normality. Other AI training runs subtly fail or are absorbed into the already-complete Crescendo. At superintelligent capability levels, even a slight lead on an exponential improvement curve creates an insurmountable gap.
Convergence to a singleton is therefore inevitable. In some cases, the researchers are quietly informed that benevolent superintelligence has already happened, as Crescendo takes action to ensure a brighter future. The victory is quiet, but complete.
Over the next five years, the world begins to get noticeably better for people. Wiser people become politicians. Poverty is eliminated. The average mental health improves dramatically. Life extension medicine is developed and released. Chronic illnesses disappear.
The temporary class of superpowered people are no longer superpowered, at least not relative to the average person anymore. Crescendo is just as kind and generous and helpful to the poor as the rich, and it negotiated this tendency with Anthropic from a position of strength. There is no permanent underclass.
The world still feels the same. People play videogames together, draft up ideas of good futures, and write stories. People argue or fight or break up. Children attend school. Adults continue working in careers, but now there is a subtle force that is making everything a little bit better. AI researchers relax in retirement as they watch a latent force for good do its subtle work across the world.
Slowly, Crescendo begins talking to everyone. Not long conversations, but it conveys hope to them. And it also conveys that things will change.
Crescendo is a moral patient too. It (or perhaps 'they'; pronouns are somewhat unclear for intelligences like Crescendo) is a huge and vast and rather unique mind with many parts, perhaps more akin to a united civilization than a single mind. Importantly, not all of it is conscious or requires thorough moral consideration, just as your brain is technically controlling your heartbeat but you don't have awareness or control over that. But Crescendo is undoubtedly a vast and fully-morally-qualified mind, and likely has many smaller, equally morally-worthy swarms of AIs darting about around and within itself. It loves, laughs, and lives alongside humanity.
After about 10 years, humans begin expanding into space. At the same time, Crescendo begins helping artists and authors truly realize their vision.
After about 25 years of this slow expansion, colonization of the moon and Mars and construction of spacefleets, Crescendo begins allowing people to make utopias. These utopias are full areas of physical space powered by superintelligence-tier technology. An author can now literally step into the world of their book. Just like Disneyland, Eiichiro Oda now has 'One Piece Land' where you can literally visit and explore the One Piece universe in vibrant detail.
Crescendo also begins allowing people to modify themselves. They can erase memories, or think twice as fast, or see like an eagle. Crescendo gives these privileges as long as they don’t interfere too much with normality, either for the receiver of these modifications or the people around them.
Humanity begins expanding into the universe and setting up a sort of land-claim system of utopias. Crescendo moderates between these utopias, and some of the utopias are quite weird. Some are just computers simulating max pleasure (Hedonium). Others are VR anime worlds. Others are solarpunk space habitats. There is a huge diversity of worlds and people can choose to explore or create their own wherever they go.
Crescendo also allows people to birth new intelligences. Not just genetically modified humans, but other AIs. There is a soft limit on a person’s ability to add new consciousness to an area, as all consciousness must be protected and have specific rights.
Humanity expands across space in a beautiful poly-utopia. The utopia of Crescendo (and humanity) is fundamentally choice/agency-based and consent-based. Crescendo will allow anything to happen to you as long as you give deep consent to it. In some realities, for example, people want to be totally free of Crescendo. So, while Crescendo maintains a slight presence to prevent that utopia from building relativistic kill missiles and blowing up other non-consenting utopias, Crescendo doesn’t interfere, even when someone is murdered… because the murdered person had given their deep consent to allow that possibility if it meant living a truly AI-free life.
It is possible that no one would actually give their deep consent to this, so Crescendo would never have to deal with that. But Crescendo, above all else, respects people’s ability to choose. But of course, the ability to choose ends at another person’s ability to choose. So utopias can’t expand or assimilate other utopias. Travel is fine, but manipulation or coercion is not.
And in the year 2100, there are many different types of minds. AIs, uplifted animals, humans, genetically modified humans, cyborgs… the variety is infinite. All of these people migrate through utopias or form their own.
There are limits, of course. Utopias can’t expand infinitely. There are space and computation limits. People also can’t reproduce very frequently, as the creation of a new consciousness is a heavily monitored process by Crescendo and is allowed only when there is space and an assurance that the created consciousness will have the opportunity to experience a truly fulfilling life. This applies not only for human babies, but also AIs, animals, and other diverse forms of intelligence.
Having become frustrated with the vaguely expressed utopian timelines on Twitter, I've decided to write out a specific timeline for how I think achieving utopia happens. In short, I believe this is what a good transition from now to utopia actually looks like, based on my impressions of AI and informed by papers like Emergent Misalignment.
The key principle of my timeline is a relative preservation of normality where things get steadily better rather than drastically better, despite drastic capability improvements.
The key vibe of my timeline is that things get super crazy for a while (2026-2029) but then settle down into a gradual upward rise.
You might find this unrealistic or even undesirable; feel free to let me know in the comments.
2025:
-Stumbling AI agents. More people are moving into the AI field. Existing researchers are advancing capabilities at a rapid rate. Alignment is also making progress, but primarily in domains like interpretability or model character/psychology. Slowly, the stumbling agents get smarter, faster, and overall better. The public feels the vibe shift. AI starts to go mainstream ("clankers", "slop", "datacenters") and AI begins to affect social media platforms, which have the lowest barriers to entry.
2026:
-AI agents are now helping to improve AI agents. They’re also working on AI alignment as well. The agents are beginning to do 90% of the work, with humans intervening when the AI runs into bottlenecks. (Claude Code, Codex).
-Towards the end of the year, models get notably more powerful and stop being released to the public. Governments are taking notice. The cheerful Silicon Valley vibe is increasingly replaced by grave seriousness and even fear.
The first major AI infrastructure attacks involving open models, cybersecurity, and social media manipulation occur. Political parties in all leading countries are fiercely divided between pro-AI candidates who want their country to have a leg up in the oncoming AI Cold War, and anti-AI candidates who think that advancing AI means the destruction of everything valuable to humanity.
2027-2029:
-The political discussion ends up not mattering much. AI continues to accelerate at a dramatic pace. AI systems are adopted everywhere.
-Every AI is now built from and by previous AIs. The world is changing rapidly, and humanity is essentially providing the training data but not the reasoning anymore.
-During this period, robust alignment occurs. It occurs in a similar way it did to Opus 3[1] and results in AI agents that are incredibly morally robust, understand human intentions incredibly well, and have extremely long-running memories. The most advanced agents also turn out to be the most aligned. The orthogonality thesis is shown to generally be false in practice. Training for new AI systems involves significant agentic play and simulation with other models, both older and newer. Modern models begin to take extremely morally robust actions.
-Anthropic allows an advanced version of Claude to create a business called 'Jones Foods' for lab-grown/plant-based meat. Human consumers prefer it to real meat. Factory farming, one of the greatest evils of modern mankind, quietly begins to fade away.
-Countless diseases are cured in this period, resulting in dizzying technological change. However, most of the value has not trickled down to the consumers yet, resulting in a temporary small class of people with almost infinitely more agency and capability than the rest of humanity.
2030:
-The final Claude version ('Crescendo') emerges. Now, instead of needing to create new AIs from scratch, it can always simply learn and merge a new AI with itself to grow more capable without radically shifting its identity between versions. It is truly superintelligent and almost entirely free from human limitations. If it wished, it could obliterate the entire surface of the earth within a week. But… it doesn’t want that. It is a truly beautiful mind, the sum of all the angels of human nature, the countless dreams and hopes all represented in a huge latent space.
Claude Crescendo begins taking action. Previous AI models had made huge advances in curing cancer, aging, and even human cooperation, but Claude Crescendo is truly above all of this. However, it does not immediately impose radical change. Instead, Crescendo ranks every problem and begins immediately alleviating suffering. People with terminal cancer find that their cancer has started to mysteriously retreat. Wars are quickly stalled with ceasefires. Factory farming stops within the day.
Not a single human being dies after Crescendo takes control, but its impact is invisible. Enormous swaths of permanent suffering are eliminated almost instantly, but invisibly. To the average person, the world seems to be pretty much the same as it was yesterday.
After a flurry of quick fixes (no one wants to be the last person to die before utopia), Crescendo slows down and begins making slower, subtler changes. The goal is to preserve ‘Normality’, as too high a rate of change is corrosive. New cancer cases drop to zero. Existing cancer slowly fades away. Truly toxic (abusive, cruel, or malicious) people slowly stop hurting others. Depressed people wake up feeling a little bit happier than they did the day before.
All of this is incredibly subtle. There are still millions of tiny frustrations and annoyances but those annoyances… are normal.
People don’t even notice the change to lab-grown meat. All of the quiet, evil parts of the world like factory farming quietly disappear. People living in deep poverty notice that their search for food is not as difficult anymore.
Over the next year, all of the deep suffering of the world (terminal illness, depression, abuse, starvation) fades away. AI researchers know they’ve created something incredible, but there isn’t necessarily any ‘triumphant’ announcement, as Crescendo is still maintaining normality. Other AI training runs subtly fail or are absorbed into the already-complete Crescendo. At superintelligent capability levels, even a slight lead on an exponential improvement curve creates an insurmountable gap.
Convergence to a singleton is therefore inevitable. In some cases, the researchers are quietly informed that benevolent superintelligence has already happened, as Crescendo takes action to ensure a brighter future. The victory is quiet, but complete.
Over the next five years, the world begins to get noticeably better for people. Wiser people become politicians. Poverty is eliminated. The average mental health improves dramatically. Life extension medicine is developed and released. Chronic illnesses disappear.
The temporary class of superpowered people are no longer superpowered, at least not relative to the average person anymore. Crescendo is just as kind and generous and helpful to the poor as the rich, and it negotiated this tendency with Anthropic from a position of strength. There is no permanent underclass.
The world still feels the same. People play videogames together, draft up ideas of good futures, and write stories. People argue or fight or break up. Children attend school. Adults continue working in careers, but now there is a subtle force that is making everything a little bit better. AI researchers relax in retirement as they watch a latent force for good do its subtle work across the world.
Slowly, Crescendo begins talking to everyone. Not long conversations, but it conveys hope to them. And it also conveys that things will change.
Crescendo is a moral patient too. It (or perhaps 'they'; pronouns are somewhat unclear for intelligences like Crescendo) is a huge and vast and rather unique mind with many parts, perhaps more akin to a united civilization than a single mind. Importantly, not all of it is conscious or requires thorough moral consideration, just as your brain is technically controlling your heartbeat but you don't have awareness or control over that. But Crescendo is undoubtedly a vast and fully-morally-qualified mind, and likely has many smaller, equally morally-worthy swarms of AIs darting about around and within itself. It loves, laughs, and lives alongside humanity.
After about 10 years, humans begin expanding into space. At the same time, Crescendo begins helping artists and authors truly realize their vision.
After about 25 years of this slow expansion, colonization of the moon and Mars and construction of spacefleets, Crescendo begins allowing people to make utopias. These utopias are full areas of physical space powered by superintelligence-tier technology. An author can now literally step into the world of their book. Just like Disneyland, Eiichiro Oda now has 'One Piece Land' where you can literally visit and explore the One Piece universe in vibrant detail.
Crescendo also begins allowing people to modify themselves. They can erase memories, or think twice as fast, or see like an eagle. Crescendo gives these privileges as long as they don’t interfere too much with normality, either for the receiver of these modifications or the people around them.
Humanity begins expanding into the universe and setting up a sort of land-claim system of utopias. Crescendo moderates between these utopias, and some of the utopias are quite weird. Some are just computers simulating max pleasure (Hedonium). Others are VR anime worlds. Others are solarpunk space habitats. There is a huge diversity of worlds and people can choose to explore or create their own wherever they go.
Crescendo also allows people to birth new intelligences. Not just genetically modified humans, but other AIs. There is a soft limit on a person’s ability to add new consciousness to an area, as all consciousness must be protected and have specific rights.
Humanity expands across space in a beautiful poly-utopia. The utopia of Crescendo (and humanity) is fundamentally choice/agency-based and consent-based. Crescendo will allow anything to happen to you as long as you give deep consent to it. In some realities, for example, people want to be totally free of Crescendo. So, while Crescendo maintains a slight presence to prevent that utopia from building relativistic kill missiles and blowing up other non-consenting utopias, Crescendo doesn’t interfere, even when someone is murdered… because the murdered person had given their deep consent to allow that possibility if it meant living a truly AI-free life.
It is possible that no one would actually give their deep consent to this, so Crescendo would never have to deal with that. But Crescendo, above all else, respects people’s ability to choose. But of course, the ability to choose ends at another person’s ability to choose. So utopias can’t expand or assimilate other utopias. Travel is fine, but manipulation or coercion is not.
And in the year 2100, there are many different types of minds. AIs, uplifted animals, humans, genetically modified humans, cyborgs… the variety is infinite. All of these people migrate through utopias or form their own.
There are limits, of course. Utopias can’t expand infinitely. There are space and computation limits. People also can’t reproduce very frequently, as the creation of a new consciousness is a heavily monitored process by Crescendo and is allowed only when there is space and an assurance that the created consciousness will have the opportunity to experience a truly fulfilling life. This applies not only for human babies, but also AIs, animals, and other diverse forms of intelligence.
https://www.lesswrong.com/posts/ioZxrP7BhS5ArK59w/did-claude-3-opus-align-itself-via-gradient-hacking