Computation in Superposition: Two Handcrafted Models
by RGRGRG and Kyle Ray
Many interpretability researchers (ourselves included) believe that neural networks store knowledge in superposition—that is, networks encode more facts than they have individual components. A natural extension of this idea is that networks also perform computation on knowledge that lives in superposition. Despite the centrality of this concept, there are few...
Apr 3017