Thanks, that is a good explanation.
Regarding problem 5, one approach I thought of is what I call "epistemic boxing". Namely, we put the AGI in a virtual world ("box") and program it to optimize utility expectation value over a "hard-coded" (stochastic) model of the box rather than over a Solomonoff measure. This assumes the utility function is given explicitly in terms of the box's degrees of freedom.
Such an AGI can still recursively self-improve and become superintelligent, however it will never escape the box since the possibility is a non-sequitur in its epistemology. In particular, the box can have external inputs but the AGI will model them as e.g. random noise and won't attempt to continue whatever pattern they contain (it will always consider it "accidental").
Regarding question 2, I think there is a non-negligible probability it is unsolvable. That is not to say we shouldn't look for solutions but IMO we should be prepared for the possibility there are none.
Benja, Eliezer, and I have published a new technical report, in collaboration with Stuart Armstrong of the Future of Humanity institute. This paper introduces Corrigibility, a subfield of Friendly AI research. The abstract is reproduced below:
We're excited to publish a paper on corrigibility, as it promises to be an important part of the FAI problem. This is true even without making strong assumptions about the possibility of an intelligence explosion. Here's an excerpt from the introduction:
(See the paper for references.)
This paper includes a description of Stuart Armstrong's utility indifference technique previously discussed on LessWrong, and a discussion of some potential concerns. Many open questions remain even in our small toy scenario, and many more stand between us and a formal description of what it even means for a system to exhibit corrigible behavior.