LESSWRONG
Petrov Day
LW

246
Ilia Shirokov
21Ω10210
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Sleep peacefully: no hidden reasoning detected in LLMs. Well, at least in small ones.
Ilia Shirokov6mo30

Actually, I haven't seen this article! Thank you very much; it seems very interesting, as do the references cited therein. However, I suppose the distribution from which "filler tokens" (or extra tokens) are drawn might matter, as well as their sequences (that is not just "…", "abcd", or "<pause>", but something more sophisticated might be more useful for a model). It would be very interesting to determine which "filler sequences" are most suitable for hiding computations for specific tasks (this is one of the directions we are working on) and which circuits are responsible for it (if they exist).

Reply
7Steganography via internal activations is already possible in small language models — a potential first step toward persistent hidden reasoning.
Ω
2mo
Ω
0
17Sleep peacefully: no hidden reasoning detected in LLMs. Well, at least in small ones.
Ω
6mo
Ω
2