Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
The work was done as part of the MATS 7 extension. We'd like to thanks Cameron Holmes and Fabien Roger for their useful feedback. Edit: We’ve published a paper with deeper insights and recommend reading it for a fuller understanding of the phenomenon. TL;DR Claim: Narrow finetunes leave clearly readable...