Short introduction

Multipolar scenarios that I will be talking about are scenarios multiple unrelated actors have access to their own personal AGIs. For the sake of discussion, assume that we solved alignment and that AGIs will follow the orders of its owners.

A few ways we might arrive at a multipolar AGI scenario

  • The gap between the leading AI capabilities labs is not as big as we think. Multiple AI labs create AGI roughly simultaneously.
  • The gap between the leading AI capabilities labs is quite big, but due to poor security measures, the leading lab is constantly getting its techniques leaked to the competitors, thus narrowing the gap. Multiple AI labs create AGI roughly simultaneously.
  • The first lab to create AGI does so earlier than the others. However, due to either indecisiveness or slow takeoff, other labs create their own AGIs before the first lab has a considerable impact on the world.

Potential catastrophes that can be caused by multiple actors having access to AGI

1) Everyone dies directly

  • A person who is angry at the world (for whatever reason) controls one of the AGIs. He cares for nothing and seeks revenge. He orders the AGI to kill humanity.
  • A fundamentalist of some sort controls one of the AGIs. His beliefs tell him that killing humanity is the right thing to do. He orders the AGI to kill humanity.
  • A negative utilitarian controls one of the AGIs. His beliefs tell him that painlessly killing humanity is the right thing to do. He orders the AGI to kill humanity.
  • A mentally ill person controls one of the AGIs. He is not a bad person per se, but due to the nature of his illness, he orders the AGI to kill humanity.

2) Everyone dies indirectly

  • A leadership of a country controls one of the AGIs. It uses its AGI to inflict military defeat upon its adversaries. The adversaries use their own AGIs to counteract that. A global war ensues. Humanity is killed as an unintended consequence.
  • A generic "evil" actor controls one of the AGIs. He tries to do an action that is deemed to be very harmful by a "good" actor who controls AGI. A struggle between the two actors ensues. Humanity is killed as an unintended consequence.

3) Totalitarian dictatorship

  • A leadership of a country controls one of the AGIs. It uses the AGI to take over the world and impose its rule.
  • A military group controls one of the AGIs. It uses the AGI to take over the world and impose its rule.
  • A power-hungry leadership of an AGI lab controls one of the AGIs. It uses the AGI to take over the world and impose its rule.

What are our best plans for preventing catastrophes like those outlined above, in a multipolar AGI scenario?

New Answer
New Comment

4 Answers sorted by

quetzal_rainbow

93

I think a lot of thinking around multipolar scenarios suffers from heuristic "solution in the shape of the problem", i.e. "multipolar scenario is when we have kinda aligned AI, but still die due to coordination failures, therefore, solution for multipolar scenarios should be about coordination".

I think the correct solution is to leverage available superintelligence in nice unilateral way:

  1. D/acc - use superintelligence to put as much defence as you can, starting from formal software verification and ending in spreading biodefence nanotech;
  2. Running away - if you set up Moon/Mars/Jovian colony of nanotech-upgraded humans/uploads and pour available resources into defence, even if Earth explodes, humanity as a species survives. 

Nathan Helm-Burger

63

My current best guess: Subsidiarity

I've been thinking along these lines for the past few years, but I feel like my thinking was clarified and boosted by Allison's recent series: Gaming the Future

The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker.

I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts.

Seth Herd

40

This is an extremely important unsolved question IMO, because a multipolar scenario appears to be where were heading if we can adequately solve alignment in time.

See if we solve alignment do we die anyway and the discussion and edited conclusion. Even after all of that, I notice I'm still confused.

The best I've come up with is: Don't be in a multipolar scenario any more than you absolutely have to. Nonproliferation, lime with nukes seems like the only answer. The best solution to a multipolar scenario is to not let it become any more multipolar than it is, and ultimately make it less multipolar.

The problems you mention seem very bad and it gets worse when you consider that very advanced technology is probably able to save a few of the genocidal AI controllers favorite people, or maybe the mind states of a lot of people, even while wiping out humanity and rival AGIs to provide some control of the future for whatever ideology.

Another possibility I should add is that rival AGIs may resort to mutually assured destruction. Having a dead man switch to crack the earth crust or send the sun nova if you're not around to stop it would be an extreme measure that could be applied. Sending a copy of yourself off to a nearby star with a stealthy departure would seem like good insurance against a genocidal takeover.

Universql surveillance of earth and the solar system might suffice to prevent hostile exponential military improvements. That might even be done by a neutral AGI that keeps everyone's secrets as long as they're not violating a treaty about developing the capacity to kill everyone else.

sweenesm

-50

Here's what Claude 3.5 Sonnet said in response to this prompt:

Me:

What are the best plans for preventing catastrophes such as everyone dying or a totalitarian government taking over in multipolar AGI scenarios, i.e., ones in which AGI's are aligned to what their human users want to do, and at least one human user order them to do bad things?

Claude:

Let me share some thoughts on AGI governance and safety in multipolar scenarios, while noting this is a complex topic that reasonable people disagree on:

A key challenge in multipolar scenarios is that even if individual AI systems are aligned with their users' values and intentions, some users may have harmful goals. Some potential approaches to address this include:

  1. International coordination and governance frameworks that place limits on allowable AI system capabilities and uses, similar to arms control treaties. However, enforcement would be challenging.
  2. Technical measures like capability limitations built into AI systems' training that prevent them from being used for certain catastrophic actions, regardless of user intent. The challenge is defining and implementing such limitations without preventing beneficial uses.
  3. Focused work on improving human values and cooperation through better institutions, education, and mechanisms for resolving conflicts peacefully. This could reduce the likelihood of actors wanting to cause catastrophic harm.
  4. Developing AI systems with some form of common good or humanitarian principles that apply regardless of user instructions, though this involves difficult philosophical and technical questions about how to define and implement such principles.

What are your thoughts on which approaches seem most promising or practical? I'm particularly curious about your perspective on the balance between technical and social/institutional solutions.

___________________________________

I would add to this Defensive acceleration (d/acc) described by Vitalik Buterin (https://vitalik.eth.limo/general/2025/01/05/dacc2.html and https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html).

I too like talking things through with Claude, but I don't recommend taking Claude's initial suggestions at face value.

Try following up with a question like:

"Yes, those all sound nice, but do they comprehensively patch all the security holes? What if someone really evil fine-tuned a model to be evil or simply obedient, and then used it as a tool for making weapons of mass destruction?

Education to improve human values seems unlikely to have a 100% success rate. Some people will still do bad things, especially in the very near future.

Fine-tuning the AI will ... (read more)

3 comments, sorted by Click to highlight new comments since:

I suspect that this is will be an incredibly difficult scenario to navigate and that our chances will be better if we train wise AI advisors.

I think our chances would be better still if we could pivot a significant fraction of the talent towards developing WisdomTech rather than IntelligenceTech.

On a more concrete level, I suspect the actual plan looks like some combination of alignment hacks, automated alignment research, control, def/acc, limited proliferation of AI, compute governance and the merging of actors. Applied wisely, the combination of all of these components may be enough. But figuring out the right mix isn't going to be easy.

Totalitarian dictatorship

I'm unclear why this risk is specific to multipolar scenarios? Even if you have a single AGI/ASI you could end up with a totalitarian dictatorship, no? In fact I would imagine that having multiple AGI/ASI's would mitigate this risk as, optimistically, every domestic actor in possession of an AGI/ASI should be counterbalanced by another domestic actor with divergent interests also in possession of an AGI/ASI.

I actually think multipolar scenarios are less dangerous than having a single superintelligence. Watching the AI arms race remain multipolar has actually been one of the biggest factors in my P(doom) declining recently. I believe that maintaining a balance of power at all times is key and that humanity's best chance for survival is to ensure that, for any action humanity wishes to take, there is some superintelligence that would benefit from this action and which would be willing to defend it. This intuition is largely based on examples from human history and may not generalize to the case of superintelligences. 

EDIT: I do believe there's a limit to the benefits of having multiple superintelligences, especially in the early days when biological defense may be substantially weaker than offense. As an analogy to nuclear weapons, if one country possesses a nuclear bomb then that country can terrorize the world at will, if a few countries have nuclear bombs then everyone has an incentive to be restrained but alert, if every country has a nuclear bomb then eventually someone is going to press the big red button for lolz.