SVD on Weight Differences for Model Auditing
TLDR: We introduce SVD rank truncation, a method for auditing fine-tuned models by using singular value decomposition (SVD) on the weight difference matrices and reducing them to rank-1. We show proof of concept on AuditBench models, where the method achieves strong results on SDF-trained models (85-98% success rate) but remains...
May 614