Toy Models of Superposition: Simplified by Hand
Introduction In Anthropic's paper "Toy Models of Superposition", they illustrate how neural networks represent more features than they have dimensions by embedding them in so-called "superposition". This allows the model to represent more information (although at the cost of interference) and leads to polysemantic neurons (that correspond to multiple features)....
Sep 29, 20249