Training Process Transparency through Gradient Interpretability: Early experiments on toy language models — LessWrong