Re: questions which lead to this most strongly: my experience has been that giving the model a variety of tasks and constraints across a long context can lead to this somewhat reliably- for example, taking the model through a creative exercise with constraints like a specific kind of poetic meter, then asking it to generalize in an unrelated task, then asking it to (and so on) until the activations are so complex and "muddled" the model struggles with coherent reasoning through the establishes context.
I apologize for the imprecision and hope this can be useful!
This paper makes me really curious:
I love this! I'm particularly intrigued by #3, which matches my anecdotal experience with reasoning models which are "overwhelmed" by a task: as your example depicts, there is a period of incoherence, a "stabilization" of sorts, then a successful (or at least, sensical) resumption of task at hand.
I'm very curious to see how CoT research will evolve for LLMs. It seems both incredibly useful and incredibly brittle (as an interpretability/alignment tool) given that models like Opus 4 are entirely capable of completing a complex task while never once referencing it in the CoT (when directed to do so).
I appreciate this! I wonder on the following: