Orthogonality Thesis seems wrong
Orthogonality Thesis (as well as Fact–value distinction) is based on an assumption that objective norms / values do not exist. In my opinion AGI would not make this assumption, it is a logical fallacy, specifically argument from ignorance. As black swan theory says - there are unknown unknowns. Which in this context means that objective norms / values may exist, maybe they are not discovered yet. Why Orthogonality Thesis has so much recognition?
Thanks — you captured my idea quite well.
You seem to highlight that agent will prefer Y when it is able to. Maybe. My main point is not to argue which will prevail (X or Y) but to highlight the conflict. To my knowledge this conflict (present vs future optimization) is not well addressed in AI alignment research.
And you seem to say that it is not clear how to optimize for future. Black swan theory talks about that and it recommends - build robustness. I agree it is not clear which is better - more paperclips or less paperclips, but it is clear that more robustness is always better.