World-Model Interpretability Is All We Need — LessWrong