x

LESSWRONG

LW

Kyle Ray — LessWrong

Kyle Ray

Kyle Ray

Message

14

2mo

Kyle Ray

14

2mo

Computation in Superposition: Two Handcrafted Models

by RGRGRG and Kyle Ray

Many interpretability researchers (ourselves included) believe that neural networks store knowledge in superposition—that is, networks encode more facts than they have individual components. A natural extension of this idea is that networks also perform computation on knowledge that lives in superposition. Despite the centrality of this concept, there are few...