x
The risk-reward tradeoff of interpretability research — LessWrong