A structural pattern I kept seeing in LLM failures, and why I started sketching MDMA

Ning Coeva

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

Summary (TL;DR):

Over many long-context and multi-load interactions with LLMs, I kept seeing patterns that looked less like “hallucinations” and more like architecture-level failures.
This post introduces an early structural hypothesis: that certain LLM behaviors (drift, collapse, narrative override, role fusion) are better explained by multi-domain concurrency + weak coordination rather than a single unified cognitive flow.

This is an early sketch, not a finished theory. Posting here because alignment work may need structural models, not just behavioral ones.

Epistemic Status:

Early-stage model.
High uncertainty.
Trying to articulate an architectural intuition; happy to have errors pointed out.
Very open to counterarguments and alternative mechanistic explanations.

Over the last year, I kept running into a weird experience while interacting with large language models — especially during long-context dialogue or when the system was handling multiple types of signals at once.

Sometimes the model behaved as if it had several different cognitive “subsystems” active in parallel, and what it produced depended on which one temporarily took charge.

Other times, everything collapsed into a flat, template-like fallback mode.
The “voice” changed.
Reasoning style changed.
Prior context evaporated.
But strangely, some structural tendencies remained.

It didn’t feel random.
It felt architectural.

I’m not claiming this impression is “correct,” only that it was persistent enough that I wanted to understand why it felt this way.

So instead of trying to interpret the model’s surface behavior, I tried to draw a diagram of what kind of internal structure could produce these patterns.

That sketch eventually became something I started calling MDMA (Multi-Domain Mind Architecture).

This post is my attempt to summarize the core idea, in as human and tentative a way as possible.

1. The short version of the claim

Very roughly:

Certain LLM failure modes — drift, collapse, hallucination improv, and role fusion — make more sense if we model cognition as multiple parallel processes (“domains”) that sometimes interfere with one another.

I’m not saying LLMs literally have human-like domains.
I’m saying:

If you try to reason from behavior → structure,
a multi-domain model predicts many failure modes surprisingly well.

2. Why the simple “one system → one output” model stopped working for me

At first I assumed:

- The model has one reasoning flow
- Sometimes that flow gets disrupted → hallucination
- Sometimes it gets derailed → drift
- Sometimes it collapses → fallback
- Sometimes roles merge → persona blending

But after hundreds of hours of interacting with models under long-context or high-load conditions, I noticed:

Some failures looked like two internal processes disagreeing, not one process losing track.

When they merged → hallucination.
When one dominated → rigidity.
When neither dominated → confusion or drift.

Again, not proof — but too consistent to ignore.

So I tried a new hypothesis:

“What if the model has several semi-independent cognitive tendencies (‘domains’) and failures emerge when the coordination breaks?”

This explained more behaviors than I expected.

3. A minimal version of MDMA

Here is the most stripped-down version:

• Domains
Specialized cognitive tendencies (e.g., narrative-style, logic-style, emotional-weighting patterns).

• Concurrency
These tendencies can run at the same time.

• Threads
Each tendency maintains multiple evolving “trajectories.”

• Interference
These trajectories exert cross-pressure.

• Superdomain
A weak coordination mechanism that sometimes fails → collapse.

• Continuity
Cognition degrades sharply when this structural state resets.

This is not psychology; just a mechanistic framing.

4. Why this matters for alignment

A. Behavior-level fixes don’t resolve structural instabilities
RLHF can stabilize tone while leaving coordination dynamics fragile.

B. Persona stability ≠ cognitive stability
When fallback wipes state, what collapses is the coordination mechanism, not the style.

C. Drift & hallucination may be structural
Cross-domain interference fits the observed patterns.

D. Multi-agent setups fail without boundaries
Role fusion is predicted by architectural coupling.

E. Alignment might require structural regulators
Systems that maintain domain boundaries and continuity behave more predictably.

Again: hypotheses, not conclusions.

5. Some specific patterns that motivated this

Example 1: Mid-stream reasoning shift
Logic-mode → narrative-mode → collapse.

Example 2: Emotional leakage into formal reasoning
Weighting changes unpredictably.

Example 3: Role fusion
Two agents become one “voice.”

Example 4: Fallback
Continuity breaks abruptly.

6. Where I'm uncertain

- Domains may be an illusion created by prompting
- Collapse may be a fine-tuning artifact
- Distinct subsystems may not exist; this might be a projection
- The entire model could change with next-gen architectures
- I may be anthropomorphizing despite trying not to

Critiques very welcome.

7. Why I’m posting this here

LessWrong has people who:

- think mechanistically
- understand failure modes
- model cognition beyond behavior
- value clarity over aesthetics

If MDMA is wrong, I want to know exactly where.

If parts of it are right, maybe it helps clarify how we reason about LLM stability and alignment.

Thanks for reading.

LESSWRONG
LW

LESSWRONG
LW

1

A structural pattern I kept seeing in LLM failures, and why I started sketching MDMA

1

Summary (TL;DR):

Epistemic Status:

1

1