LESSWRONG
LW

432
Ilya Nachevsky
4000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
6Steganography via internal activations is already possible in small language models — a potential first step toward persistent hidden reasoning.
Ω
1mo
Ω
0
17Sleep peacefully: no hidden reasoning detected in LLMs. Well, at least in small ones.
Ω
5mo
Ω
2