Alternate framing: Optimality is the tiger, and agents are its teeth.
Tonally relevant: Godzilla Strategies.
It's a problem when people think that a superintelligent AI will be just a volitionless tool that will do as told. But it's also a problem when people focus overly much on the story of "agency". When they imagine that all of the problems come from the AI "wanting" things, "thinking" things, and consequentializing all over the place about it. If only we could make it more of a volitionless tool! Then all of our problems would be solved. Because the problem is the AI using its power in clever ways with the deliberate intent to hurt us, right?
This, I feel, fails entirely to appreciate the sheer power of optimization, and how even the slightest failure to aim it properly, the slightest leakage of its energy in the wrong direction, for the briefest of moments, will be sufficient to wash us all away.
The problem isn't making a superintelligent system that wouldn't positively want to kill us. Accidentally killing us all is a natural property of superintelligence. The problem is making an AI that will deliberately spend a lot of effort on ensuring it's not killing us.
I find planet-destroying Death Rays to be a good analogy. Think the Death Star. Think—
Imagine that you're an engineer employed by an... eccentric fellow. The guy has a volcano lair, weird aesthetic tastes, and a tendency to put words like "world" and "domination" one after another. You know the type.
One of his latest schemes is to blow up Jupiter. To that end, he'd had excavated a giant cavern underneath his volcano lair, dug a long cylindrical tunnel from that cavern to the surface, and ordered your team to build a beam weapon in that cavern and shoot it through the tunnel at Jupiter.
You're getting paid literal tons of money, so you don't complain (except about the payment logistics). You have a pretty good idea of how to do that project, too. There are these weird crystal things your team found lying around. If you poke one in a particular way, it releases a narrow energy beam which blows up anything it touches. The power of the beam scales superexponentially with the strength of the poke; you're pretty sure shooting one with a rifle will do the Jupiter-vanishing trick.
There's just one problem: aim. You can never quite predict which part of the crystal will emit the beam. It depends on where you poke it, but also on how hard you poke, with seemingly random results. And your employer is insistent that the Death Ray be fired from the cavern through the tunnel, not from space where it's less likely to hit important things, or something practical like that.
If you say that can't be done, your employer will just replace you with someone less... pessimistic.
So, here's your problem. How do you build a machine that uses one or more of these crystals in such a way that they fire a Death Ray through the tunnel at Jupiter, without hitting Earth and killing everyone?[1]
This analogy can be nitpicked endlessly, of course. By no means does anything here prove that it's a valid one. You can argue that just a wee bit of misalignment won't destroy the world, or that the AI doesn't need to be dangerous in this way for us to do interesting things with it, or that intelligence isn't really quite that powerful, et cetera.
This post isn't aimed at convincing someone of that; there's a lot of posts that do it already. But if you broadly agree with the premise, but have some difficulty sorting out the exact problems with any given containment scenario, this analogy might help.
Any sufficiently powerful AI system holds a terrifying core of optimization — the ability to implacably rewrite some part of the world according to some specification. It doesn't matter how that power is represented, in what wrapper it's in, where specifically it is aimed, whether it's controlled by an alien sapient entity. As long as it's not aimed exactly where we want it to be, with no leakage, from the very beginning, it will kill us all.
It's its intrinsic property.
Also, Earth has no atmosphere in that scenario. Probably your employer's fault too. But at least that means a well-aimed beam wouldn't hit the air and explode everything anyway.