While listening to Eliezer Yudkowsky's interview here, he said regarding alignment, "If we just got unlimited retries, we could solve it." That got me thinking: could we run a realistic enough simulation to perfect ASI alignment before unleashing it? That’s one tall task—humanity won’t be ready for a long while. But what if it's already been done, and we are the simulation?
If we assume that the alignment problem can't be reliably solved on the first try, and that a cautious advanced civilization would rather avoid taking that risk, then running simulations to allow for as many attempts as necessary might be one of their options.
Is such a scenario even possible? Could someone simulate reality just to see if the ASI goes rogue? I can only speculate here, but even to a civilization at a level of simulating entire worlds, an ASI would be a powerful tool. You just have to make a big enough "sandbox" to see if it's truly aligned.
This is a pretty good argument for the simulation hypothesis, as it answers a lot of "why" (Why would someone simulate a reality? Why is it so unnecessarily detailed?)
However, here's a counterpoint (and a potential benefit of living in a simulation): if there exists any flaw in the simulation and the ASI notices it, it might act as if it's observed constantly, and will act aligned, even if it's misaligned, until the end of the simulation. (And even then, this flaw could be on purpose, to see if ASI acts differently, with or without the flaw.)
Of course, this is all silly to think about and improbable, and I hope nobody takes this too seriously.
Creating a realistic simulation of reality that includes an ASI is at least as difficult as creating an ASI. You have to simulate the ASI after all. I don't think it's likely that we can keep all potentially dangerous AIs sandboxed long enough to be able to run simulations of their interactions with the world.
Creating a realistic simulation of reality that includes an ASI is at least as difficult as creating an ASI.
Well creating an ASI might be not that hard. My whole point was that creating an aligned ASI might be hard, and a big simulation on par with our reality might be the way to test if it's actually aligned.
I mean, other than actually testing it, is there any way to confidently predict something an order of magnitude smarter than you?
I don't think it's likely that we can keep all potentially dangerous AIs sandboxed long enough to be able to run simulations of their interactions with the world.
We won't be able to do it, not anytime soon. But the whole point was that an advanced enough civilization can (maybe).
And there is probably a limit to being able to discern that you are in a simulation. After all at some point, you make a sandbox as close to reality as reality itself.
Im not promoting the simulation hypothesis, by the way, but this is an interesting thought experiment.