The Geometry of LLM Logits (an analytical outer bound)
The Geometry of LLM Logits (an analytical outer bound) 1 Preliminaries Symbol Meaning d width of the residual stream (e.g. 768 in GPT-2-small) L number of Transformer blocks V vocabulary size, so logits live in RV h(ℓ) residual-stream vector entering block ℓ r(ℓ) the update written by block ℓ WU∈RV×d,b∈RV...
May 30, 20255