In January 2023, beren and Eric Winsor cataloged basic distributional properties of weights, activations, and gradients in GPT-2 models, providing a systematic view of model internals (thanks to Ryan Greenblatt for the pointer). This post extends their investigation in two directions.
First, examining their characterization of transformer activations as "nearly Gaussian with outliers," I conducted detailed distributional analyses of post-residual activations. Myfindings align with observations made in comments and unpublished work: the distributions are better described by heavier-tailed distributions, dominated by the logistic distribution. What can appears as outliers or artifacts manifests in my analysis as consistent mixture distributions, with minor modes appearing systematically to both sides of the primary distribution.
Second, prompted by Buck... (read 1387 more words →)