Model multitasking: Can a model learn two different tasks simultaneously through Grokking?
The work discussed in this blog post was inspired by Progress Measures for Grokking via Mechanistic Interpretability1. I would encourage you to read this paper in its entirety and check out the list of resources at the end of the post for some more in-depth background information. TL;DR If you...