Black-box interpretability methodology blueprint: Probing runaway optimisation in LLMs — LessWrong