Transformer Modular Addition Through A Signal Processing Lens
Hello, my name is Benjamin "Frye" Kelley. This post is regarding some independent research I've been doing expanding on the work, Progress Measures for Grokking via Mechanistic Interpretability, by Neel Nanda et al. I've been trying to understand how sinusoids move through a transformer, allowing it to grok modular addition....