Posts

Sorted by New

Wiki Contributions

Comments

"Instead of building in a shutdown button, build in a shutdown timer."

-> Isn't that a form of corrigibility with an added constraint? I'm not sure what would prevent you from convincing humans that it's a bad thing to respect the timer, for example. Is it because we'll formally verify we avoid deception instance? It's not clear to me but maybe I've misunderstood.