The distinction I have in mind is that a self-modifying AI can come up with a new thinking algorithm to use and decide to trust it, whereas a non-self-modifying AI could come up with a new algorithm or whatever, but would be unable to trust the algorithm without sufficient justification.
Likewise, if an AI's decision-making algorithm is immutably hard-coded as "think about the alternatives and select the one that's rated the highest", then the AI would not be able to simply "write a new AI … and then just hand off all its tasks to it"; in order to do that, it would somehow have to make it so that the highest-rated alternative is always the one that the new AI would pick. (Of course, this is no benefit unless the rating system is also immutably hard-coded.)
I guess my idea in a nutshell is that instead of starting with a flexible system and trying to figure out how to make it safe, we should start with a safe system and try to figure out how to make it flexible. My major grounds for believing this, I think, is that it's probably going to be much easier to understand a safe but inflexible system than it is to understand a flexible but unsafe system, so if we take this approach, then the development process will be easier to understand and will therefore go better.
Likewise, if an AI's decision-making algorithm is immutably hard-coded as "think about the alternatives and select the one that's rated the highest", then the AI would not be able to simply "write a new AI … and then just hand off all its tasks to it"; in order to do that, it would somehow have to make it so that the highest-rated alternative is always the one that the new AI would pick.
You basically say that the AI should be unable to learn to trust a process that was effective in the past to also be effective in the future. I think that would restrict intelligence a lot.
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.