x
Circuit discovery through chain of thought using policy gradients — LessWrong