Epistemic status: I've worked on this project for ~20h, on my free time and using only a Colab notebook. Executive summary I trained a minimalistic implementation of Mamba (details below) on the modular addition task. I found that: 1. This non-transformer-based model can also exhibit grokking (i.e., the model learns...
Summary In this post I want to briefly share some results I have got after experimenting with the equivalent version of the simple neural networks that the authors used here to study how superposition and poly-semantic neurons come about in neural networks trained with gradient descent. The take-home message is...
I want to thank @Ryan Kidd, @eggsyntax and Jeremy Dolan for useful discussions and for pointing me to several of the relevant resources (mentioned in this post) that I have used for linking my own ideas with those of others. Executive summary Designing an AI that aligns with human goals...
Disclaimer I currently am a Postdoctoral Fellow in Computational Neuroscience, learning about Mechanistic Interpretability and AI Safety in general. This post and the paper that goes with it are part of my current pivot towards these topics; thus, I apologise in advance if I'm not using the appropriate terminology or...