It seems like it would be really easy to come up with a lot of moral questions and answers and then ask an AI to tell us what it predicts humans preferring as an outcome.
There's a possibility that AI is not good at modeling human preferences, but if that's the case, it'll be very apparent at lower levels because that will mean commands will have to be very specific to get results. Any model that can't answer basic questions about it's intended goals is not going to be given the (metaphorical) nuclear codes.
In fact, why wouldn't you just test every AI by asking it to explain how it's going to solve your problem before it actually solves it?
How do I die?