Early Experiments in Reward Model Interpretation Using Sparse Autoencoders — LessWrong