Spectral Taxonomy of QK Circuits in Transformer Models
Precomputed figures and the code used to generate them are in this GitHub repository. To run the analysis on other models, edit the models.py file. TL;DR Empirical Spectral Distribution (ESD) of the correlation matrix of WQK in dense (non-MoE) text-to-text models exhibit certain trends. These trends can be used as...
Oct 17, 20258