With quantum computing, an AI box needs not guard against hardware exploits.
Epistemic status: armchair computer science
Here's a simple machine model no more powerful than a quantum computer.
It's a computer with extra instructions. It can spend x*t time to, within the machine, split time into x² timelines. It knows which timeline it's in. Timelines cannot interact. If it attempts to act on anything but the memory while there are n timelines, with probability 1/n, all other timelines are deleted; otherwise, its own timeline is deleted. It can delete its timeline if others exist. After t time, all but one random remaining timeline are deleted.
Split into a² timelines. In timeline 1, run the AI. If it somehow manages to affect the world, it only has a 1/a² chance to escape. After it is done running, run an algorithm of your choice to curate the output. Have all other timelines delete themselves.
Author's Note: I previously made a post making much the same point. I was told it's hard to understand. I suppose I tried to write what I would wish to read.
Edit: I revoke this post's discussion of how to curate output without handing the world to the AI.