Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
This work represents progress on removing attention head superposition. We are excited by this approach but acknowledge there are currently various limitations. In the short term, we will be working on adjacent problems are excited to collaborate with anyone thinking about similar things! Produced as part of the ML Alignment...
