Finding the uncertainty vector in GPT2-scale transformers
In this post I explore a phenomena in LLMs where the training process naturally consolidates information in a highly interpretable structure in the residual stream, through a positive feedback loop from a small variation at initialization. I start with a toy example and work up to GPT2 scale, showing animations...