Measuring Learned Optimization in Small Transformer Models — LessWrong