x
Most of the residual stream forgets within a token. A compact subspace doesn't. — LessWrong