Mitali M — LessWrong

SAE Feature Matchmaking (Layer-to-Layer)

Last week I read Mechanistic Permutability: Match Features Across Layers, an interesting paper on matching features detected with Spare Autoencoders across multiple layers of the Transformer neural network. In this paper, the authors studying the problem of aligning SAE extracted features across multiple layers in the neural network without having...

Feb 109

Paying attention to Attention Sinks

I recently read Spectral Filters, Dark Signals, and Attention Sinks, an interesting paper on discovering where excess attention in transformers is dumped. Researcher found that transformers contain a "Dark Subspace" to store information that isn't intended for the output layer. The attention sink concept is a specific manifestation of this,...

Jan 2311

Thoughts and experiences on using AI for learning

As this is my first post on LessWrong, I begin with an introduction of myself to establish my background. I am finishing my last year of my undergraduate studies, with a degree in Computer Science. My primary concern, and the thesis of this post, is the potential for epistemic failure...

Nov 17, 20256