this recent paper honestly explains the phenomenon really. tldr is the lm_head acts as a bottleneck and so many unrelated tokens get entangled resulting in trait transmissionhttps://owls.baulab.info/
this recent paper honestly explains the phenomenon really. tldr is the lm_head acts as a bottleneck and so many unrelated tokens get entangled resulting in trait transmission
https://owls.baulab.info/