interpreting GPT: the logit lens — LessWrong