I think a lot of thinking around multipolar scenarios suffers from heuristic "solution in the shape of the problem", i.e. "multipolar scenario is when we have kinda aligned AI, but still die due to coordination failures, therefore, solution for multipolar scenarios should be about coordination".
I think the correct solution is to leverage available superintelligence in nice unilateral way:
My current best guess: Subsidiarity
I've been thinking along these lines for the past few years, but I feel like my thinking was clarified and boosted by Allison's recent series: Gaming the Future
The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker.
I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts.
This is an extremely important unsolved question IMO, because a multipolar scenario appears to be where were heading if we can adequately solve alignment in time.
See if we solve alignment do we die anyway and the discussion and edited conclusion. Even after all of that, I notice I'm still confused.
The best I've come up with is: Don't be in a multipolar scenario any more than you absolutely have to. Nonproliferation, lime with nukes seems like the only answer. The best solution to a multipolar scenario is to not let it become any more multipolar than it is, and ultimately make it less multipolar.
The problems you mention seem very bad and it gets worse when you consider that very advanced technology is probably able to save a few of the genocidal AI controllers favorite people, or maybe the mind states of a lot of people, even while wiping out humanity and rival AGIs to provide some control of the future for whatever ideology.
Another possibility I should add is that rival AGIs may resort to mutually assured destruction. Having a dead man switch to crack the earth crust or send the sun nova if you're not around to stop it would be an extreme measure that could be applied. Sending a copy of yourself off to a nearby star with a stealthy departure would seem like good insurance against a genocidal takeover.
Universql surveillance of earth and the solar system might suffice to prevent hostile exponential military improvements. That might even be done by a neutral AGI that keeps everyone's secrets as long as they're not violating a treaty about developing the capacity to kill everyone else.
Here's what Claude 3.5 Sonnet said in response to this prompt:
Me:
What are the best plans for preventing catastrophes such as everyone dying or a totalitarian government taking over in multipolar AGI scenarios, i.e., ones in which AGI's are aligned to what their human users want to do, and at least one human user order them to do bad things?
Claude:
Let me share some thoughts on AGI governance and safety in multipolar scenarios, while noting this is a complex topic that reasonable people disagree on:
A key challenge in multipolar scenarios is that even if individual AI systems are aligned with their users' values and intentions, some users may have harmful goals. Some potential approaches to address this include:
What are your thoughts on which approaches seem most promising or practical? I'm particularly curious about your perspective on the balance between technical and social/institutional solutions.
___________________________________
I would add to this Defensive acceleration (d/acc) described by Vitalik Buterin (https://vitalik.eth.limo/general/2025/01/05/dacc2.html and https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html).
I too like talking things through with Claude, but I don't recommend taking Claude's initial suggestions at face value.
Try following up with a question like:
"Yes, those all sound nice, but do they comprehensively patch all the security holes? What if someone really evil fine-tuned a model to be evil or simply obedient, and then used it as a tool for making weapons of mass destruction?
Education to improve human values seems unlikely to have a 100% success rate. Some people will still do bad things, especially in the very near future.
Fine-tuning the AI will ...
I suspect that this is will be an incredibly difficult scenario to navigate and that our chances will be better if we train wise AI advisors.
I think our chances would be better still if we could pivot a significant fraction of the talent towards developing WisdomTech rather than IntelligenceTech.
On a more concrete level, I suspect the actual plan looks like some combination of alignment hacks, automated alignment research, control, def/acc, limited proliferation of AI, compute governance and the merging of actors. Applied wisely, the combination of all of these components may be enough. But figuring out the right mix isn't going to be easy.
Totalitarian dictatorship
I'm unclear why this risk is specific to multipolar scenarios? Even if you have a single AGI/ASI you could end up with a totalitarian dictatorship, no? In fact I would imagine that having multiple AGI/ASI's would mitigate this risk as, optimistically, every domestic actor in possession of an AGI/ASI should be counterbalanced by another domestic actor with divergent interests also in possession of an AGI/ASI.
I actually think multipolar scenarios are less dangerous than having a single superintelligence. Watching the AI arms race remain multipolar has actually been one of the biggest factors in my P(doom) declining recently. I believe that maintaining a balance of power at all times is key and that humanity's best chance for survival is to ensure that, for any action humanity wishes to take, there is some superintelligence that would benefit from this action and which would be willing to defend it. This intuition is largely based on examples from human history and may not generalize to the case of superintelligences.
EDIT: I do believe there's a limit to the benefits of having multiple superintelligences, especially in the early days when biological defense may be substantially weaker than offense. As an analogy to nuclear weapons, if one country possesses a nuclear bomb then that country can terrorize the world at will, if a few countries have nuclear bombs then everyone has an incentive to be restrained but alert, if every country has a nuclear bomb then eventually someone is going to press the big red button for lolz.
Short introduction
Multipolar scenarios that I will be talking about are scenarios multiple unrelated actors have access to their own personal AGIs. For the sake of discussion, assume that we solved alignment and that AGIs will follow the orders of its owners.
A few ways we might arrive at a multipolar AGI scenario
Potential catastrophes that can be caused by multiple actors having access to AGI
1) Everyone dies directly
2) Everyone dies indirectly
3) Totalitarian dictatorship
What are our best plans for preventing catastrophes like those outlined above, in a multipolar AGI scenario?