Note that your "1" has two words that both carry very heavy load - "uses" and "correct". What does it mean for a model to be correct? How do you create one? How do you ensure that the model you implemented in software is indeed correct? How do you create AI that actually uses that model under all circumstances? In patcicular, how do you ensure that it is stable under self-improvement, out-of-distribution environments, etc? Your "2-4" seem to indicate that you are focusing more on the "correct" part, and not enough on the "uses" part. My understanding is that if both "correct" and "uses" could be solved, it would indeed likely be a solution to the alignment problem, but it's probably not the only path, and not necessarily the most promising one. Other paths could potentially emerge from the work on AI corrigibility, negative side-effect minimization, etc.