LESSWRONG
LW

564
ajobi
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
ajobi18d10

this recent paper honestly explains the phenomenon really. tldr is the lm_head acts as a bottleneck and so many unrelated tokens get entangled resulting in trait transmission
https://owls.baulab.info/

Reply