LESSWRONG
LW

369
sbill
1010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Exploring belief states in LLM chains of thought
sbill8d20

Good work! I'm curious why theres a sudden dip in the gemma 2-9b at the last token position, and why probes trained on Qwen don't seem to have any relationship.

Quite a bit of literature indicates that the intermediate activations output by the MLP block are the sum of several different features in superposition, in which each feature is some vector. I would be curious if you can do an SAE or SNMF and see if one of these features is strongly associated with answering correctly.

Reply