High-level interpretability: detecting an AI's objectives — LessWrong