AI Regulation May Be More Important Than AI Alignment For Existential Safety
Summary: Aligning a single powerful AI is not enough: we're only safe if no-one, ever, can build an unaligned powerful AI. Yudkowsky tried to solve this with the pivotal act: the first aligned AI does something (such as melting all GPUs) which makes sure no unaligned AIs can ever get built, by anyone. However, the labs are currently apparently not aiming to implement a pivotal act. That means that aligning an AGI, while creating lots of value, would not reduce existential risk. Instead, global hardware/data regulation is what's needed to reduce existential risk. Therefore, those aiming to reduce AI existential risk should focus on AI Regulation, rather than on AI Alignment. Epistemic status: I’ve been thinking about this for a few years, while working professionally on x-risk reduction. I think I know most literature on the topic. I have also discussed the topic with a fair number of experts (who in some cases seemed to agree, and in other cases did not seem to agree). Thanks to David Krueger, Matthijs Maas, Roman Yampolskiy, Tim Bakker, Ruben Dieleman, and Alex van der Meer for helpful conversations, comments, and/or feedback. These people do not necessarily share the views expressed in this post. This post is mostly about AI x-risk caused by a take-over. It may or may not be valid for other types of AI x-risks. This post is mostly about the ‘end game’ of AI existential risk, not about intermediate states. AI existential risk is an evolutionary problem. As Eliezer Yudkowsky and others have pointed out: even if there are safe AIs, those are irrelevant, since they will not prevent others from building dangerous AIs. Examples of safe AIs could be oracles or satisficers, insofar as it turns out to be possible to combine these AI types with high intelligence. But, as Yudkowsky would put it: “if all you need is an object that doesn't do dangerous things, you could try a sponge”. Even if a limited AI would be a safe AI, it would not reduce AI existential risk. This is
UN control. A Baruch plan for AGI.