Exploring the petertodd / Leilan duality in GPT-2 and GPT-J — LessWrong