Most of the residual stream forgets within a token. A compact subspace doesn't.
Preface This is a preliminary writeup for an experiment on residual stream geometry. The research direction seems pretty underexplored, so I’m posting early to collect objections, research intuitions, and connections to problems other people are thinking about before I invest in the larger run. The case for skimming this post:...
Jun 630