Activation Plateaus: Where and How They Emerge
By design, LLMs perform nonlinear mappings from their inputs (text sequences) to their outputs (next-token generations). Nonlinearities are built into the model architecture, but the way models use these nonlinearities in their representation of information is still an open question. By studying different aspects of this nonlinear mapping, we can...
Oct 17, 202537