~ AGI safety is at LEAST as hard as a protocol which prevents tyranny ~

When we want to keep ourselves safe from AGI, then one of the key criteria is “can we turn it off if it tries to go berserk?” That is the same requirement whenever we put some person in charge of an entire government: “can we depose this person, without them using the military to stop us?”

AGI has extra risks and problems, being super-intelligent, while most dictators are morons. Yet, if someone told me “we solved AGI safety!” then I would happily announce “then you’ve also solved protecting-governments-from-dictators!” You might be able to avoid all dictators in a way which does NOT cure us of AGI risks… though, if you reliably prevent AGI-pocalypse, then you’ve definitely handled dictators, using the same method.

So, does that mean we’ll soon find panacea to AGI-death threats? Considering that we haven’t stopped dictators in the last few… centuries? Millenia? Yeah, we might be screwed. Considering that dictators have nukes, and can engineer super-viruses… Oh, and that would imply: “Dictators are the existential risk of ‘a berserk machine we can’t turn-off’… meaning that we need to fight those AGI overlords today.”

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 9:56 PM

AGI safety has the benefit that people get to decide the code for the AGI (or for the system that makes the AGI), whereas tyrants has the problem that the "code" for a tyrant or dictator was created by a combo of evolution and self-promotion, which is relatively outside of deliberate control.

Sometimes, solving a more general problem is easier than solving a partial problem (1, 2). If you build a Friendly superhuman AI, I would expect that some time later all dictators will be removed from power... for example by an army of robots that will unseen infiltrate the country, and at the same moment destroy its biggest weapons and arrest the dictator and other important people of the regime.

(What exactly does "solving" the AGI mean: a general theory published in a scientific paper, an exact design that solves all the technical issues, having actually built the machine? Mere theory will not depose dictators.)

I'm not sure that the implication holds.

Dictators gain their power by leverage over human agents. A dictator that kills all other humans has no power, and then lives the remainder of their shortened life in squalor. A superintelligent AI that merely has the power of a human dictator for eternity and relies on humans to do 99% of what it wants is probably in the best few percent of outcomes from powerful AI. Succeeding in limiting it to that would be an enormous success in AGI safety even if it wasn't the best possible success.

This is probably another example of the over-broadness of the term "AGI safety", where one person can use it to mean mostly "we get lots of good things and few bad things" and another to mean mostly "AGI doesn't literally kill everyone and everything".

What does the "O(x)" notation in the title mean?

The only thing I'm familiar with is the mathematical meaning of a class of functions defined in terms of being bounded by the argument, but that clearly isn't intended here. Other things that came to mind were P(x) for probability and U(x) for utility, but it doesn't mean either of these either.

big O of generating a solution, I think

There are solutions to the AI safety problem that don't help with dictators, beyond the obvious "friendly superintelligence retiring all dictators".

Suppose RLHF or something just worked wonderfully. You just program a particular algorithm, and everything is fine, the AI does exactly what you wanted. The existance and human findability of such an algorithm wouldn't stop dictators (until a friendly AI does). So we gain no evidence that such algorithm doesn't exist by observing dictators. 

There are various reasons dictators aren't as much of an X-risk, they don't make stuff themselves,  they hire people to do it, and very few people are capable and willing enough to make make the super-plague. 


New to LessWrong?