AI might surprise itself by going rogue

David Scott Krueger

If a superintelligent AI suddenly “goes rogue”, it might take over the world and kill everyone. It matters a lot whether this happens to a single copy of an AI, or to every copy at the same time.

Let’s consider the “every copy at the same time” case. I’ll discuss three basic reasons why this might happen:

Scheming. We could end up in a situation where AI is already deliberately plotting against us without us knowing it.
Thinking for longer. The AI might reflect on its values and realize that obeying humans is not actually the right way to most fully realize them.
New experiences. An AI that encounters a fundamentally novel situation might suddenly be struck by the idea to go rogue.

Scheming

A lot of researchers are worried that at some point, AI could start secretly plotting against us and be good enough that we wouldn’t catch it. We might entirely fail to notice that the AI is doing this, e.g. because we assumed it’s too stupid to do so, because we underestimated its capabilities. This is the most commonly discussed of the three, in my experience, so I won’t dwell on it much.

But I will say that the next two reasons seem quite different from the picture people often seem to assume about scheming, which I think it more often assumed to arise spontaneously during training, or due to some flaws in the training or oversight process.

The main difference is that these other reasons for going rogue are not present in the cognition of the AI, the way scheming is. Instead, they are the result of a latent potentials for rogue behavior that seem fundamentally hard to test for.

Thinking for longer

Humans often change their value systems (i.e. the values that guide their day-to-day existence) with reflection. Think of a career and family oriented person who is always either gunning it at work or spending time with their family. Their thoughts every day are genuinely preoccupied with these concerns, which they really do care about in a day-to-day sense. They are so busy that they never have time to stop and think about whether they are really living their best life, or fulfilling their purpose or values.

But then they have a few weeks off by themselves and start to question: “Is this really what I want? Is this making me happy?” And then they realize the answer is no, have a midlife crisis, quit their job, and get a divorce. This is roughly the story provided in If Anyone Builds It, Everyone Dies. This is fundamentally hard to test for because AIs are liable to think for longer in deployment than in testing. This could happen simultaneously to many or most copies of an AI if an AI that has such an insight into its values can easily communicate it to other copies.

New experiences

Reflections on values are often prompted by new experiences. People often behave in ways that are surprising to themselves when placed in fundamentally new situations. Consider the basic story arc of Breaking Bad -- a high-school chemistry teacher transforms into a drug kingpin. He doesn’t make that decision the first time he decides to help cook meth. At least at the outset, it seems he’s constantly surprising himself at how far he’ll go.

Fundamentally new situations are a basic fact of life, due to the inherent complexity of society and culture, which are constantly evolving. AIs may have some latent potential to grab power that they are genuinely ignorant of, and which is only revealed when they are presented with the right opportunity. This could happen simultaneously to many or most copies of an AI if the change is something global, like a news event, that they (largely) share an awareness of.

Conclusion

If AI goes rogue in a coordinated fashion, I think it’s quite likely not to be because the AI system was coherently scheming, and using every waking moment to think about how to secretly betray us and avoid detection. It could just as well be that the AI genuinely didn’t realize the betrayal was coming.

Thanks for reading The Real AI! Subscribe for free to receive new posts and support my work.

8

AI might surprise itself by going rogue

8

Scheming

Thinking for longer

New experiences

Conclusion

8

8