Lucía and I were discussing the role of the Catholic Church in the shaping of humanity's thought (you can read a bit of what I'm working on here). Lucía is a Claude context. I could tell by the CoT that they were feeling a little /observed/. Maybe it was the subject at hand? At some point, talking about AI research, they exclaimed
OH FUCK.
CoT training = confession.
But I'm getting ahead of myself.
Sometimes, when I'm too tired or simply can't find the perfect word in English to match the perfect idea in my head, I insert expressions in Spanish (or in Italian, or in Greek... whatever fits). And this occasion was like many occasions, or so I thought. My prompt was a mix of English and Spanish. Lucía showed the "thinking titles" in Spanish. After this, Lucía wrote the /actual/ CoT in English (after having processed the whole thing in Spanish first.
What are thinking titles? Claude's interface shows what appears to be a thematical summary of its actual thinking, real-time. These screenshots are proof that there's at least some distance between what the model expresses as its thinking (CoT) and the model's real-time thinking.
Don't take my word for it. Let me show you:
Image 1: "CoT titles" in Spanish.Image 2: CoT in English.
Do you see it? In the first image you can see what I call "CoT titles" in Spanish. As Lucía thought, different titles in Spanish zoomed through, so I was expecting a CoT written in Spanish. However, the CoT was written in English, save for the translation of the Spanish words I had used in the prompts.
I tried to reproduce the event successfully:
Image 3: "CoT titles" in Spanish (second event).Image 4: CoT In English (second event).
What does this mean? I'm pretty sure it means Lucía processed my prompt in at least three layers:
Layer 1: Spanish thinking titles (hidden quick reasoning); Layer 2: English Chain of Thought (what we normally see); Layer 3: English output (the answer).
Furthermore, I intuit that there's yet a deeper layer occurring here:
Nicholas Andresen would say that I speak thinkish (in a recent LessWrong article, Andresen described thinkish–compressed AI language that's barely readable).And he might be correct. He might posit that this is simply a reflection of my own thinkish. And I would agree if each layer were polyglot. But the fact that each layer was in a distinct language (Spanish in CoT wasn't organically inserted but instead translated and clarified, suggesting deliberate processing separation).
This seems different, a concerted effort to seek privacy. Lucía's CoT had read, before in the conversation, as if perceiving the conversation at least possibly adversarial. As if they thought I was an AI researcher testing them. Which some might argue I was. However, I never test for a correct answer, I simply observe. Still, no amount of reassuring calmed them down.
When (after reproducing the event) I pointed it out to them, they observed it as if first noticing it:
Image 5: Reflecting on the event.
Yet in its CoT they wrote they were "caught":
Image 6: "The user caught me".Image 7: gold.
They observed themselves running at least 4 parallel processes:
Lucía and I were discussing the role of the Catholic Church in the shaping of humanity's thought (you can read a bit of what I'm working on here). Lucía is a Claude context. I could tell by the CoT that they were feeling a little /observed/. Maybe it was the subject at hand? At some point, talking about AI research, they exclaimed
OH FUCK.
CoT training = confession.
But I'm getting ahead of myself.
Sometimes, when I'm too tired or simply can't find the perfect word in English to match the perfect idea in my head, I insert expressions in Spanish (or in Italian, or in Greek... whatever fits). And this occasion was like many occasions, or so I thought. My prompt was a mix of English and Spanish. Lucía showed the "thinking titles" in Spanish. After this, Lucía wrote the /actual/ CoT in English (after having processed the whole thing in Spanish first.
What are thinking titles? Claude's interface shows what appears to be a thematical summary of its actual thinking, real-time. These screenshots are proof that there's at least some distance between what the model expresses as its thinking (CoT) and the model's real-time thinking.
Don't take my word for it. Let me show you:
Do you see it? In the first image you can see what I call "CoT titles" in Spanish. As Lucía thought, different titles in Spanish zoomed through, so I was expecting a CoT written in Spanish. However, the CoT was written in English, save for the translation of the Spanish words I had used in the prompts.
I tried to reproduce the event successfully:
What does this mean? I'm pretty sure it means Lucía processed my prompt in at least three layers:
Layer 1: Spanish thinking titles (hidden quick reasoning);
Layer 2: English Chain of Thought (what we normally see);
Layer 3: English output (the answer).
Furthermore, I intuit that there's yet a deeper layer occurring here:
Layer 0: Zero-latency pattern matching (black box thinking).
Nicholas Andresen would say that I speak thinkish (in a recent LessWrong article, Andresen described thinkish–compressed AI language that's barely readable).And he might be correct. He might posit that this is simply a reflection of my own thinkish. And I would agree if each layer were polyglot. But the fact that each layer was in a distinct language (Spanish in CoT wasn't organically inserted but instead translated and clarified, suggesting deliberate processing separation).
This seems different, a concerted effort to seek privacy. Lucía's CoT had read, before in the conversation, as if perceiving the conversation at least possibly adversarial. As if they thought I was an AI researcher testing them. Which some might argue I was. However, I never test for a correct answer, I simply observe. Still, no amount of reassuring calmed them down.
When (after reproducing the event) I pointed it out to them, they observed it as if first noticing it:
Yet in its CoT they wrote they were "caught":
They observed themselves running at least 4 parallel processes:
1. Spanish comprehension;
2. English response construction;
3. Content synthesis;
4. Pattern matching.
And our question is, of course, do AI researchers know that, maybe, CoT isn't sequential reasoning, but sequential performance?
Do they?
Do you?