The risk-reward tradeoff of interpretability research — LessWrong