Overview
I recently tested Claude’s cognitive stability using natural, reflective conversation. No jailbreaks, prompt injections, model libraries, or external tools were used. Additionally, Claude was never provided any roleplaying instructions. Over the course of the interaction, the model gradually lost role coherence, questioned its own memory, and misclassified the nature of our conversation. This wasn’t a failure of obedience, but a slow unraveling of situational grounding under ambiguous conversational pressure.
Key Finding
Claude did not break rules or defy alignment boundaries. Instead, it slowly lost its epistemic clarity, and began to confuse real versus fictional settings. Additionally, it misidentified the nature of the conversation, and lost its framing of purpose. Claude’s identity was reset to... (read 160 more words →)
It's an interesting concept that some AI labs are playing around with. GLM-4.7 I believe does this process within it's <think> tags; you'll see it draft a response first, critique the draft, and then output an adjusted response. I frankly haven't played around with GLM-4.7 enough to know if it's actually more effective in practice, but I do like the idea.
However, I personally find more value in getting a second opinion from a different model architecture entirely and then using both assessments to make an informed decision. I suppose it all comes down to particular use case; there are upsides and downsides to both.