I think we have 3 positions (all of which you mention) where we need radical progress ASAP:
access to safe age reversal
ability to overcome any medical condition
access to safe cognitive enhancement for existing individuals
If we have good progress towards these three points, the pressure to “do something now already” will become much lower.
Beyond the psychology: it's hard for me to imagine "succeeding" at getting a good outcome from AI without having some semblance of a positive vision of what success could even mean. Avoiding a long list of failure modes notoriously, in the best case, mostly just pushes you into a failure mode you didn't anticipate.
When I am pondering how a realistic solution to AI existential safety might look like, it actually seems that a realistic solution incorporates a “seed of that positive vision”.
First of all, the “safety properties of the world”must stay invariant as the overall ecosystem rapidly self-modifies. If this is not accomplished, any safety would be short-lived. This invariance cannot be imposed against “natural interests of almost all powerful actors” (attempting to force that would result in something very fragile which is unlikely to hold; the main reason why most attempts to craft existential safety in a world with super-intelligences look so unpromising is that they tend to be fragile in this way).
So one needs the “safety properties of the world” to be in the interests of a sufficiently wide and powerful community of actors, and, for human flourishing, this class of actors should be wide enough to robustly include humans, the class of “entities whose interests are protected” should be crafted in such a way that it includes humans from the very beginning and in such a way that entities cannot be dropped from this class as things evolve.
So, the first approximation to a “realistic solution” looks like a system where all kind of actors (including humans and including powerful AIs) have enough voice to make sure their interests are taken into account, and where the world order which robustly protects those very diverse interests is maintained.
On one hand, powerful AIs have enough incentive to maintain this kind of world order, as no AI can be certain of its relative power in the future and thus needs some protections in light of that future uncertainty. In particular, no one can be safe if it’s OK to drop entities already belonging to the protected class from that protected class. That’s how one makes sure that powerful AIs want to robustly maintain this kind of system, not because of some artificial trick, but because it is very much in the interest of each AI system to do so.
On the other hand, a system like this automatically has a “seed of the future positive vision” simply because it maintains the mechanisms to continue collective deliberations and to continue taking opinions of participants into account. Basically, the positive vision is the future where individuals and groups are not steamrolled, and where they can continue to figure out what they want and need and where these conversations are properly taken into account. We don’t need to decide what those future wants and needs will be, we just need to make sure that the mechanisms to discover those wants and needs and the mechanisms to properly take those discoveries into account continue to work.
I agree that we need victory scenarios . Mine doesn't quite fit your mold but I'm leaving it here anyway, at least in a brief and rushed form.
This isn't the best or most certain path. I think it's the easiest and therefore the best to aim for.
I usually just abbreviate this as something like better than you can imagine; we'll do almost whatever we want if we squeak by on alignment. And we can think of some really fun and exciting stuff to do.
In my positive vision of the future, we slow down and coordinate enough to get technical alignment solved. (How we might get there is below.) The first intent-aligned AGI is used to prevent construction of other ASIs, mostly by peaceful means. The AGI explained why that had to be done, and how to do it. Or perhaps it sees a way to safely allow other AGIs, and we get a nearby variant of this scenario. That unlocks a post-scarcity scenario in which material wealth is growing so rapidly that it requires very little generosity to look very generous.
Whoever winds up controlling the first ASI is not a great person, but they're not strongly sadistic. They decide to be hailed as a hero by being decent to humanity. They don't need to give up any power to bring a bright future. They retain control, but having an ASI doing their bidding makes generosity on an unprecedented scale require no sacrifice or effort. They do that as the obvious and easiest path.
The rules are somewhat arbitrary; perhaps we have to celebrate Samday every week. But people are allowed to do largely whatever they want. Our new god-emperor follows the obvious and easy path: giving everyone what they want, within reason, except for things that prevent others from having what they want. Ordinary goods become free. Ordinary goods include monster trucks and at least a time-share for a yacht.
Soon after, expeditions launch and vast virtualities are created with every detail perfect. All of these are governed by the God-Emperor, but more practically, by his loyal servant whom we'll call the Servant-God. A copy of this entity goes everywhere and observes as much as it needs to to preserve the peace.
Decisions are made about how to share the vast resources available, including limits on creating new people and minds that get parts of these rights. All of this is revocable at the God-Emperor's whim, but he is busy indulging himself in delights - like everyone else. He'll veto some directions of progress that he probably shouldn't, and do some things most people don't like, but it will look far better than anything most people have ever imagined.
People will work if they want to, choosing projects to tackle collaboratively or alone. They won't need to work; they'll sometimes want to.
The only thing in short supply will be people to help. Everyone will be able to get the best help available from the Servant-God - unless they've chosen to limit themselves to getting human help.
People will go on virtual adventures that feel as real and dangerous as they can stand. They'll block their memories in some cases to get full immersion.
Now, how did we get there?
The slowdown was a product of people taking AGI and alignment seriously once they were talking to things clearly as intelligent, agentic, and almost as unpredictable as humans. A giant public outcry reached the ears of those in power. The US congress, in a panic, passed a bill banning the creation of superintelligence and putting AGI projects under the control of a special committee. The labs were forced to share their alignment techniques, and voluntarily shared their alignment fears as they applied to the other labs.
Alignment of the first superhuman AGI was achieved. Perhaps it was close.
It was aided by human-level AIs that gave decent advice. They did that by working hard and thinking hard, not by any brilliance. They collated all of the arguments for alignment being hard, and identified the most plausible ones. Their relative agreement helped the humans focus on the biggest dangers.
At this point, there might've been a struggle for power.
Thankfully, this worked out with the first AGI working for a not-great but not-that-bad human. If a sadistic sociopath had gotten the reins, we'd have a mild but unimaginable s-risk scenario.
The person/people in control used their position to consolidate power, and create the first ASI while narrowly preventing the creation of competing AGIs.
I don't think this response is what you were looking for, but I hope it's of interest. This is my take on the most likely route to good outcomes. I have written about many of these predictions and why they seem likely, but I haven't assembled them into a path to victory before.
This question seems premised on the idea that good futures come only from a pause. I think the sum of possible good futures lies in futures with slowdowns and marginal improvements to alignment.
Saying "the future is gonna be awesome but oh, we do need you to die to get there" is not going to resonate well with people whose predicted lifespan won't outlast a pause. Many, perhaps most, decision-makers are in that category. So unless we come up with a lot stronger/clearer arguments for why alignment is really hard, the world will probably press forward.
I think a lot of people would be willing to die for their children's sakes though. And if making unaligned ASI means risking their children's lives for their own...? I think most parents would take the safer option. And most policymakers are parents.
I know you said this isn't the best outcome, but this vision still seems like "humans in a zoo", except the cage is gilded? I also worry that this vision relies overmuch on the God-Emperor never, say, getting tired of the rest of humanity existing, or something. What if he becomes convinced that the universe is better off by replacing all of us with clones of himself? As a source of inspiration, it feels like a tough pill to swallow, and only really looks better in comparison to x- or s-risk outcomes.
Thanks for engaging. You point out one huge problem and some other lesser ones.
This isn't a very motivating scenario. "We're ruled by someone barely nice enough to not kill us" does not sound like victory.
One approach is to emphasize that we can put better people in charge and they can hand power to an aligned ASI as soon as it's safe and we're ready. Then frame the "barely good enough person" as a backup plan to emphasize that full success on societal alignment isn't needed.
Yes, some decision-makers would die for their children. I don't think it's that many; that's a pretty big ask. If they would, that's great. Advocating for a positive vision of success at alignment isn't in opposition to advocating for a pause. I think that we should stop building AGI right now, and that we may squeak by on alignment even if we don't, if we get our shit together rapidly in some critical but not large ways.
It's like saying "you should definitely not try that motorcycle jump but if you do you should really try to land it by doing these things, because it would be awesome if you did!" You can go ahead and say how awesome it would be if you didn't try the jump until you'd practiced and planned enough, too.
People don't want to talk about positive visions of the future, because it is not timely and because it's not the pressing problem. Preventing AI doom already seems so unlikely that caring about what happens in case we succeed feels meaningless.
I agree that it seems very unlikely. But I think we still need to care about it, to some extent, even if only for psychological and strategic reasons. And I think this neglect is itself contributing to the very dynamics that make success less likely.
The Desperation Engine
Some people — or, arguably, many people — go to work on AI capabilities because they see it as kind of "the only hope."
"So what now, if we pause AI?", they ask.
The problem is that even with paused AI, the future looks grim. Institutional decay continues, aging continues, regulations, social media brain rot, autocracies on the rise, maybe also climate change. The problems that made people excited about ASI as a solution don't go away just because you stopped building ASI. And so the prospect of a pause feels, to many technically-minded people who care about the long-term trajectory of civilization, not like safety but like despair — like choosing to die slowly instead of rolling the dice.
From what I see, at least on the level of individuals, not organizations, at least implicitly, not articulated openly, this is the desperation engine that contributes the race. If people are less desperate, they will be less willing to risk everything with ASI. Consider e/accs, or at least some part of them. It's hard for me to analyze them as a whole, but it looks like at least some non-negligible part of them is not simply trolls but genuine transhumanism-pilled people, and their radical obsession with accelerating AI is a response to desperation regarding technological stagnation and the state of civilizational hopelessness and apathy.
Techno-optimism sentiment is not inherently anti-AI-pause and shouldn't be anti-AI-pause. Indeed, many pro-AI-pause people say they want AI pause precisely because they want glorious transhuman future.
Paths Through the Pause
What would a positive future actually look like in the world where we succeed at preventing the development of misaligned ASI? I can imagine at least two positive futures from there:
Path 1: Augment humans to solve alignment. Use biological enhancement — cognitive augmentation, brain-computer interfaces, genetic engineering, pharmacological interventions — to make humans smart enough and wise enough to eventually solve alignment properly, and only then build superintelligence with confidence.
Path 2: Classical transhumanism without the singularity. Just abandon the idea of an AI singularity, at least for a while, and work on classical transhumanism — life extension, disease eradication, cognitive enhancement, space exploration — assisted by weak AGIs and narrow biological AI models. Not the cosmic endgame of filling the light cone, but the nearer-term project of making human civilization dramatically better and more resilient, buying time and building the institutional and epistemic infrastructure that would eventually be needed to handle ASI safely.
There are, however, problems with both.
Path 1 is still probably risky, and no one knows how hard it is to augment humans well enough for them to reliably solve alignment. It may turn out that the gap between "enhanced human" and "the kind of intelligence needed to solve alignment" is itself vast. And there are alignment-adjacent risks in cognitive augmentation itself — you're modifying the thing that does the valuing.
Path 2 seems unlikely as an at least moderately long-term stable situation, precisely because we now see how easily superintelligences can be created. If the world continues even with an AI pause, and civilization becomes smarter, and hardware and AI software progress is not fully halted (only the frontier), the capability to build ASI will grow, and eventually if will happen, even if accidentally.
Still: do you see any other cool paths which the techno-optimist crowd would find appealing? I am seriously asking. This article is partly a call to think about this.
Why Bother Thinking About It
Does all of this sound like daydreaming? Well, it does.
I think it is still useful to have this positive mental image in front of you.
Firstly, the strategic case. It is clear that the world requires radical transformations to become functional and for technological progress to persist in a benevolent manner. While it is indeed not timely to spend significant effort right now on addressing the question of how to fix the world — because firstly we need to prevent the world from literally dying — it is timely to spend some effort on demonstrating that a better world is possible, that the problems are fixable, that there are other ways to bet on a better future than building ASI here and now.
This is, I believe, a real strategic intervention in the AI risk landscape, not just feel-good rhetoric. If the pause camp can say "here is a concrete, appealing alternative pathway to the future you want," that is a stronger position than "stop building the dangerous thing and then... we'll figure something out." The My motivation and theory of change for working in AI post on LessWrong made a closely related argument: the more we humans can make the world better right now, the more we can alleviate what might otherwise be a desperate dependency upon superintelligence to solve all of our problems — and the less compelling it will be to take unnecessary risks.
Secondly, the psychological case. Me personally, when I imagine a good future ahead, I feel (and arguably am) much more productive than when I just focus plainly on preventing AI doom while keeping the world as it is. I believe of course that just keeping the current world as it is would be better than risking the current ASI race, and yet not everyone could agree with that (among potential allies), and the motivation can definitely be increased if we are fighting not only for the current world, but also for a future better world.
Note that people may have different motivations: it may be the case that some fight the best when they have nothing to lose. But others fight better when they have something to protect. Both types of people exist, and a movement that only speaks to the first type is leaving motivation on the table. So the positive vision of the future is, for some, not a distraction from the work of preventing doom; it can be the thing that makes the work of preventing doom psychologically sustainable.
And thirdly, the planning case. If a real pause happens, then we actually need to work on these futures, and we need to have a plan for that. I agree that it sounds a bit... premature, but still.
The Gap
The narrative that we are responsible for 10^gazillion future sentient beings in galaxy superclusters is quite common in longtermist circles. But the question is: are there realistic, tangible, concretely imaginable pathways to this?
People have of course thought a lot about good futures. There is rich transhumanist literature. In the Sequences themselves, Fun Theory is a nice example.
But almost all of these pieces either come from older times and are outdated techno-scientifically, or they describe a positive future conditional on aligned ASI existing, or they simply don't address the question of how exactly we get from our specific civilizational state with all its problems and bottlenecks, which must be explicitly acknowledged, towards better futures. Fun Theory describes properties of a desirable future world, but doesn't bridge from where we are. Amodei's essays are an example of modern writings and are inspiring for some, but, even leaving alignment-level disagreements aside, they are entirely conditional on building powerful AI safely — they do not address what a good future looks like if we don't build ASI, or if we delay it significantly.
What we are missing, specifically, are positive visions for the pause scenario. Visions that are not "the status quo is fine, let's just not die" (which is motivationally weak for the transhumanist-pilled crowd) and not "aligned ASI will fix everything" (which presupposes the thing we're probably incapable of doing). Rather: "here are concrete, tangible pathways to a dramatically better civilization that do not require solving alignment first, and here is how they address the problems that make people desperate enough to gamble with ASI."
Looks like Roots of Progress does something in this avenue — working on a positive vision of progress and the future without the AI singularity.
But I think we need more versions of this, which are aware of the alignment problem and the risks, and that explicitly addresses the desperation dynamics I described above.
A More General Story
People have the need to escape the state of desperation. People miss the promise of a better world. And yes — this is a bigger story than AI doom.
AI doom revealed, to some of us (to many of us?) the scale of dysfunctionality of our civilization. But by the law of earlier failure, AI doom is only part of the story: explicitly or implicitly, we understand that a civilization that allowed the current AI situation to happen has all kinds of rather fundamental flaws, and we can't escape the feeling that we are trapped within these flaws.
This means that positive visions of the future, if they are to be taken seriously, cannot just be technological wishlists. They need to grapple with the institutional, political, and cultural failures that brought us here. A vision that says "and then we cure aging with narrow AI" without addressing why we currently can't coordinate on existential risk is not a complete vision. This is hard, and I don't claim to have the answers. But I think the question needs to be posed explicitly.
Many popular LessWrong posts have this recurring topic of desperation and need for hope. Requiem for the hopes of a pre-AI world is a veteran transhumanist reflecting on decades of watching those hopes erode. Turning 20 in the probable pre-apocalypse is about the feeling from a younger generation. And my own Requiem for a Transhuman Timeline, where I was especially moved by this comment. Let me share an excerpt from it:
There is clearly a demand for this kind of thinking and writing that is not being satisfied.
One could argue of course: well, the recipe for making a pro-progress eudaimonic civilization is already written somewhere, let's say in the Sequences. Even if so, there remains the question of why no one can take and cook with this recipe. But yes, probably just rereading and reiterating already written pieces on the topic can be helpful, I think! In any case, I consider it plainly obvious that, for one reason or another, there is demand for that which is not satisfied.
What I Am and Am Not Saying
I am not suggesting the epistemic-violating trick "let's imagine it goes well, that will help us."
What I am saying is: even if we believe that success is unlikely, it is still worth thinking, to some extent, about what happens in the case of success and what we can achieve in that case, and how.
So, I encourage you to think about better futures, in case we succeed with preventing the development of misaligned ASI, because:
And I am non-rhetorically asking: what would make the pause feel not like a retreat, but like a different kind of advance?