An exploration of GPT-2's embedding weights — LessWrong