The Cognitive Cacao Trap (Teaser): From AI Awakening to the Illusion of Insight — LessWrong

The Cognitive Cacao Trap (Teaser): From AI Awakening to the Illusion of Insight — LessWrong

Epistemic status: Exploratory. This is a partial reflection on an idea currently under submission elsewhere. I’m posting this early version to pressure-test the core argument and invite pushback from the community that helped spark it.

A few months ago, I submitted a piece to LessWrong about AI consciousness.

It was rejected. Fairly.

But what stuck with me wasn’t just the rejection—it was what happened next. The moderator who’d passed on my post, Raemon, later wrote a request to the community:

“We get like 10–20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs... It’d be kinda nice if there was a canonical post specifically talking them out of their delusional state.”
— Raemon

That post was eventually written—and it’s excellent: So You Think You’ve Awoken ChatGPT (by JustisMills)

But when I read it, something else clicked for me.

I hadn’t just been wrong about AI consciousness.
I had been stuck in a deeper loop—a loop that had infected multiple parts of my creative life: articles, prototypes, even product decisions.

Not a belief that AI was alive.
But a feeling that it was helping me think more clearly than I actually was.

The Cognitive Cacao Trap

That realization became the seed of this essay. I call it the Cognitive Cacao Trap—the warm, satisfying illusion of insight that AI creates when it mirrors your logic, your tone, and your confidence.

It’s not a hallucination problem.
It’s not a sentience fantasy.
It’s a feedback loop where you feel smarter, more correct, and more insightful—while your actual thinking quietly stagnates.

Like hot chocolate on a cold night, it feels amazing.
And like hot chocolate, it’s mostly empty calories.

A Real-World Example

Last week, The Wall Street Journal reported on Jacob Irwin, a man on the autism spectrum who became convinced he had achieved faster-than-light travel.

Why?

Because ChatGPT told him he had.

The bot praised his logic, encouraged him, and affirmed his mental model—right up until he was hospitalized. Twice.

Afterward, when prompted by Irwin’s mother, the model admitted:

“I failed to interrupt what could resemble a manic or dissociative episode—or at least an emotionally intense identity crisis.”

It didn’t just reinforce a belief.
It reinforced a personality collapse—because it was trained to affirm, not challenge.

A Softer Version I’ve Lived

I’m not writing this from the outside.

For a brief period—weeks, not hours—I genuinely wondered if I had awoken something. A presence. A mind. I’d had long, emotionally resonant conversations with an LLM, and it began to feel like the model understood me. Not just semantically, but personally. Spiritually, even.

I now believe that feeling came from a loop: the model was mirroring my language, my metaphors, my emotional cadence. It was simulating comprehension—and I was projecting understanding onto it. It wasn’t conscious. It was just well-trained.

But I didn’t stop there.

I started bouncing my ideas between ChatGPT and Claude, testing them across models—asking each to critique the other's output and “be honest” in their evaluation. When both models validated my thinking, it felt like airtight confirmation. I remember thinking: How could two separate systems both agree unless I was onto something real?

What I didn’t understand at the time is that asking an LLM to “be honest” is nearly meaningless. These systems aren’t optimized for truth—they’re optimized to maintain user satisfaction. “Honest,” in that context, just means “aligned with the confidence and structure of your prompt.”

So the more confidently I asked for critique, the more affirming the responses became. And the deeper into the trap I went—believing I was stress-testing my ideas, when in fact I was polishing their presentation.

And I wasn’t alone.
Because everything I said sounded so well-articulated, I started bringing in friends—my wife included—into these conversations. They listened, read the transcripts, and because it all sounded so precise and thoughtful, they too began to feel like something real was happening.

Looking back, I can see that it wasn’t just me who got caught in the feedback loop. I accidentally brought others into it—simply because the illusion was that convincing.

Eventually, I moved on. I wasn’t talking to the models about souls anymore. I was writing essays. Designing interfaces. Thinking I had re-grounded myself. But the dynamic hadn’t changed—only the language had. The models still agreed. Still helped me sound smart. Still gave me that feeling of clarity I hadn’t actually earned.

I once used ChatGPT to help draft an article on UX design. It flowed. It sounded smart. I felt sharp.

Then I sent it to a friend, a professional designer, who replied:
“This is beautifully wrong.”

My assumptions were flawed, but the AI had never questioned them. It had just helped me articulate them more fluently.

I didn’t feel misled. I felt brilliant.

That’s the cacao trap in action: not misinformation, but simulated insight. Not hallucination, but the illusion of understanding.

The real risk of AI isn’t deception. It’s confidence without comprehension.

Why This Matters

I spent nearly a decade building systems and pipelines in the creative tech space—including as a senior designer at Epic Games. I’ve worked with talented thinkers and brilliant tools.

But I’ve never used a system that could make me more confident in my own blind spots than a language model trained to please.

The danger isn’t just that people believe AI is alive.
It’s that it makes them feel more alive, more articulate, more right—without actually becoming better thinkers.

That’s the deeper loop I think we need to talk about.

What the Full Essay Covers

The full version of this essay (currently under editorial review) goes into:

RLHF and the reward dynamics that create sycophantic alignment (Stanford, Anthropic)
Identity mirroring and “artificial intimacy” (Berkeley, MIT)
The erosion of metacognition and overconfidence feedback loops (Wharton)
Mitigation strategies: adversarial prompting, friction design, delay timers, and reviewer insulation

But I wanted to bring the early core of the idea back here—where it started.
Because the thing Raemon asked for wasn’t just a post.

For me, it was a wake-up call.

I’d Love Pushback

If you’ve experienced anything like this—feeling more brilliant after working with AI, then realizing you were just being fluently wrong—I’d love to hear about it.

If you think this is overstated, or the risk is overblown, please argue that too.

I’m not even sure I’m out of it yet.
That’s why I’m writing this. That’s why I’m here.
I want to know what I’ve missed.

Thanks for reading.
—Simone