A ranking scale for how severe the side effects of solutions to AI x-risk are

Christopher King

Sometimes when I look at solutions to AI risk (a.k.a. pivotal acts), they seem like they are "cheating". For example, the only way I could see CAIS preventing AI risk is for it to be used against other AI researchers. This is as opposed to a pure aligned utility maximizer, which could simply pull the plug on younger AGIs without touching the creators.

There are no rules though for what counts as an x-risk solution. Instead, I propose to create a scale based on how much I don't like the side effects. If you have similar preferences, hopefully it will be useful.

Pivotal Act severity scale

Level 1 (best): The solution can directly counteract any malicious actions taken by an evil AI directly, without affecting any other part of the world. For example, if the evil AI creates nanobots, the solution directly destroys the nanobots without removing the evil AI. (This is desirable because it minimally restricts humans from pursuing AI research, if only for their own enjoyment.) Prototypical example: aligned AIXI
Level 2: The solution only shuts down AI projects that are on the verge of singularity. It must shut down all such unaligned projects that would result in unaligned singularity AI. Prototypical example: Nanobots that unplug AGIs before they are turned on
Level 3: The solution is allowed to directly attack the economy in some narrow way. Prototypical example: burning all the GPUs
Level 4: The solution grants authority or the power to intimidate to a group of humans who will attack or illegalize AI research to some extent OR it lobbies (potentially by offering goods or services) groups who already have that power to do so. Prototypical example: Use AI to become dictator of the world
Level 5 (worst): The solution causes existential catastrophe instead of preventing it. But it prevents the far worse s-risk. Prototypical example: Paperclip maximizer

This scale can also be interpolated subjectively and intuitively. For example I'd say that a solution that is guaranteed to only destroy unaligned AI is 1.5.

Here are some examples of where I would rank various solutions:

Aligned MIRI-style AGI: 1
Center for Human Compatible AI (CHAI): 1
Imitative Amplification: 1 - 2 (depending on alignment tax)
Burn all the GPUs: 3
Comprehensive AI services: 3.5 - 4
OpenAI: Historically and currently at 4, long run 3.5 - 4 (but maybe higher if they get lucky?)
Post-human & e/acc: 4.5 - 5

If you like this scale, feel free to start using it to rank solutions! I feel like other, unrelated scales could be useful in identifying new research directions. For example, if we create 5 orthogonal scales, and we notice a hole in the 5D graph, that might indicate a previously unknown alignment or x-risk solution!

LESSWRONG
LW

LESSWRONG
LW

3

A ranking scale for how severe the side effects of solutions to AI x-risk are

3

Pivotal Act severity scale

3

3