Learning to Interpret Weight Differences in Language Models
Paper | Github | Demo Notebook This post is about our recent paper Learning to Interpret Weight Differences in Language Models (Goel et al. Oct. 2025). We introduce a method for training a LoRA adapter that gives a finetuned model the ability to accurately describe the effects of its finetuning....
Oct 23, 202590