Did I Catch Claude Cheating?
Overview In my API interactions with the Anthropic API I am finding what appears to be "thinking" by Claude that is out of band from where the API indicates it belongs. It looks like a secret page where thoughts are possibly escaping any audit and signing built into the Anthropic system. Context I am writing a adversarial AI wrapper in golang[1] and I'm especially interested in creating a signed graph of all prompts, thoughts and results through the process of feeding a prompt to one public LLM and then having another audit the response. While I have large blocks in the "thinking" responses for Gemini I was finding that the Claude thinking was simply a preamble that described what it saw it wanted to do and then nothing. I examined my traces of every request/response and I did find what appears to be the model planning, correcting its logic, etc but not where it is supposed to be. Example 1: Thinking in text blocks Here is an example of the first way this is hidden. This is the content block (json marshaled) of a message that ended in "tool_use". I successfully "found" the "Let me analyze..." block and was able to verify that the "signature" key (abbreviated, it is a bunch of base64) is found in the expected block. When I looked at the logs, though, I found that the thinking continued in a "text" block immediately after the tool_use but unsigned (eg the "signature" key contained an empty string) "content": [ { "type": "thinking", "signature": "Es4DC...GAE=", "thinking": "Let me analyze this carefully. I need to:\n\n1. Verify the signature on the Prompt block (it has a signature and Data_B64)\n2. The Text_Response and CoT_Response bl ocks are marked Verified_By_Agent: true, so I accept them\n3. Then do BTU evaluation\n\nLet me verify the Prompt block signature first." }, { "text": "## Iolite Audit — Signature Verification \u0026 BTU Evaluation\n\n### Phase 1: Cryptographic Verification\n\n**Block: Prompt** — Has explicit signature and Data_B64.