x
interpreting GPT: the logit lens — LessWrong