x
Solving Reward Specification through Interpretability for Wisdom — LessWrong