LLM Self-Reference Language in Multilingual vs English-Centric Models
My first post explored "self-talk" induction in small base LLMs. After further contemplation, I decided that first I ought to better understand how LLMs represent "self" mechanistically before examining induced "self-talk". How do language models process questions about themselves? I've started by analyzing attention entropy patterns[1] across self-referent prompts ('Who...
Oct 22, 20255