x

LESSWRONG

LW

avichal — LessWrong

avichal

avichal

Message

90

1

1

9mo

avichal

90

9mo

Learning to Interpret Weight Differences in Language Models

Paper | Github | Demo Notebook This post is about our recent paper Learning to Interpret Weight Differences in Language Models (Goel et al. Oct. 2025). We introduce a method for training a LoRA adapter that gives a finetuned model the ability to accurately describe the effects of its finetuning....

Oct 23, 2025•90