What Transformers Learn When They Solve Majority!
Mar 31
Full post with figures, code, and experimental details: brokttv.github.io/Majority Merrill et al. (2022) prove that a 1-layer transformer with saturated attention can recognize MAJORITY, placing such models strictly above AC⁰ in circuit complexity. Their proof is a hand-crafted existence result — one explicit weight configuration constructed to establish expressivity: set...