Can LLMs learn Steganographic Reasoning via RL?
by robert mccarthy, Vasil Georgiev, Steven Basart, and David Lindner
TLDR: We show that Qwen-2.5-3B-Instruct can learn to encode load-bearing reasoning tokens when RL incentivizes this in a simple reasoning task. However, it was difficult to achieve this result. Our attempts to learn encoded reasoning via RL were generally bottlenecked either by exploration difficulties or hacking of the CoT monitor....
Apr 11, 202530