As a non-native English speaker, it was a surprise that "self-conscious" normally means "shy", "embarassed", "uncomfortable", ... I blame lesswrong for giving me the wrong idea of this word meaning.
If there are side effects that someone can observe then the virtual machine is potentially escapable.
An unfriendly AI might not have a goal of getting out. A psycho that would prefer a dead person to a live person, and who would prefer to stay in a locked room instead of getting out, is not particularly friendly.
Since you would eventually let out the AI that won't halt after a certain finite amount of time, I see no reason why unfriendly AI would halt instead of waiting for you to believe it is friendly.
I'm curious what are the "ejector seats" that you mention in this post and in Day 1 post, that can help with the time sinks and planning. While other concepts seem familiar, I don't think I heared about the ejector seats before. I can guess that those are something like TAP's with the action of "adandoning current project/activity". Looking forward to your Day 10 post on planning that will hopefully have an in depth explanation and best practices of building those.
Thanks for the sequence that focuses on instrumental every-day rationality.