Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
This work represents progress on removing attention head superposition. We are excited by this approach but acknowledge there are currently various limitations. In the short term, we will be working on adjacent problems are excited to collaborate with anyone thinking about similar things! Produced as part of the ML Alignment...
Thank you for the comment! Yep that is correct, I think perhaps variants of this approach could still be useful for resolving other forms of superposition within a single attention layer but not currently across different layers.