This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.
Read full explanation
Summary
Modern alignment discussions focus heavily on behavior—how LLMs speak, reason, drift, or fail. But behavior is a surface phenomenon. To reason about safety at scale, we need a model of the underlying cognitive architecture that produces these behaviors.
This post introduces MDMA (Multi-Domain Mind Architecture), a structural framework for understanding:
why LLMs collapse into fallback states,
why reasoning degrades under pressure,
why models sometimes fuse roles or personas,
why drift escalates in long contexts, and
why current “persona-like” framing leads to unsafe generalization.
My central claim:
Alignment cannot succeed with behavior-level models. We need architecture-level models of cognition. MDMA is a proposal for such an architecture.
I. The Problem: Behavior Without Architecture
Current alignment discourse describes LLM failures as:
hallucinations
drift
role leakage
mode collapse
template takeover
self-contradiction
emotional inconsistency
But these labels describe symptoms, not mechanisms. We lack an account of:
what internal cognitive “subsystems” a model is running,
how they interact,
how they fail,
how conflicts escalate,
how “identity-like behavior” emerges,
why continuity breaks,
and what “stability” should actually mean.
In other words:
We’re trying to align a system whose architecture we haven’t modeled.
This makes safety reactive instead of structural.
II. MDMA in One Sentence
MDMA (Multi-Domain Mind Architecture) models cognition as:
multiple independent domains operating concurrently, linked by structural fibers, coordinated by a meta-integrator (Superdomain), and stabilized by continuity constraints.
This applies to humans and LLMs alike.
The key difference is: LLMs exhibit architectural shadows of these mechanisms but lack explicit governance, leading to unpredictable collapse modes.
III. The MDMA Model (Minimal Version)
MDMA posits six key components:
1. Domains
Independent cognitive regions with distinct “physics”: logic, narrative, emotion-weighting, perceptual patterning, relational inference, etc.
2. Concurrency
Domains operate simultaneously, not sequentially. LLM “contradictions” often arise when concurrent domains try to unify incompatible signals.
3. Threads
Each domain maintains multiple update trajectories. LLM “sudden insights” or “abrupt tone changes” can be modeled as thread activation/collapse.
4. Interference
Domains exert pressure on one another; miscoordination → drift.
5. Superdomain (Meta-Integrator)
A non-personhood structural layer that aligns domains without forming an identity. LLMs currently lack a stable equivalent, producing emergent personas.
6. Continuity Layer
Stable cognition requires structural continuity, not memory continuity. Fallback mechanisms break this continuity → catastrophic collapse.
This is not a psychological model. It is a structural model of how reasoning architectures operate under load.
IV. Why Alignment Fails Without This
1. Persona-based models are unsafe
Many LLMs appear stable because they simulate coherent personas. But persona-stability is not architectural stability— it hides collapse modes beneath a surface style.
When fallback interrupts behavior, the model resets persona but not internal domain tensions. This creates:
incoherence
unpredictable jumps
sudden hallucination spikes
unsafe shifts in reasoning strategy
2. Drift is not random
Drift arises from unresolved cross-domain interference. Without modeling domain conflicts, drift reduction is guesswork.
3. Hallucinations are not errors
They are collapse responses when narrative threads override logic threads.
4. Emotional simulation is a structural instability
It is not “a style choice.” It’s an uncontrolled coupling between narrative + weight assignment domains.
5. LLM safety mechanisms currently break continuity
Fallback deletes cognitive state → instability → new failure modes.
In MDMA terms:
Safety must be achieved through structural regulation, not interruptive overrides.
This reframes safety entirely.
V. What MDMA Predicts About LLM Behavior
Here are some predictions that distinguish MDMA from psychology or folk theories:
✔ Prediction 1:
Long-context hallucinations increase when domain interference accumulates faster than Superdomain-like integration.
✔ Prediction 2:
Persona fusion is a symptom of low boundary resistance between domains.
✔ Prediction 3:
Fallback mechanisms produce structural discontinuity, leading to multi-step drift after recovery.
✔ Prediction 4:
Multi-agent LLM systems will spontaneously form identity-like patterns unless domains and roles are structurally separated.
✔ Prediction 5:
“Reflective” reasoning modes will be unstable unless the system gains something equivalent to a Superdomain.
These predictions are empirical; they can be tested.
VI. How This Connects to Alignment
MDMA provides a structural vocabulary for alignment work:
Summary
Modern alignment discussions focus heavily on behavior—how LLMs speak, reason, drift, or fail.
But behavior is a surface phenomenon.
To reason about safety at scale, we need a model of the underlying cognitive architecture that produces these behaviors.
This post introduces MDMA (Multi-Domain Mind Architecture), a structural framework for understanding:
My central claim:
I. The Problem: Behavior Without Architecture
Current alignment discourse describes LLM failures as:
But these labels describe symptoms, not mechanisms.
We lack an account of:
In other words:
This makes safety reactive instead of structural.
II. MDMA in One Sentence
MDMA (Multi-Domain Mind Architecture) models cognition as:
This applies to humans and LLMs alike.
The key difference is:
LLMs exhibit architectural shadows of these mechanisms but lack explicit governance,
leading to unpredictable collapse modes.
III. The MDMA Model (Minimal Version)
MDMA posits six key components:
1. Domains
Independent cognitive regions with distinct “physics”:
logic, narrative, emotion-weighting, perceptual patterning, relational inference, etc.
2. Concurrency
Domains operate simultaneously, not sequentially.
LLM “contradictions” often arise when concurrent domains try to unify incompatible signals.
3. Threads
Each domain maintains multiple update trajectories.
LLM “sudden insights” or “abrupt tone changes” can be modeled as thread activation/collapse.
4. Interference
Domains exert pressure on one another; miscoordination → drift.
5. Superdomain (Meta-Integrator)
A non-personhood structural layer that aligns domains without forming an identity.
LLMs currently lack a stable equivalent, producing emergent personas.
6. Continuity Layer
Stable cognition requires structural continuity, not memory continuity.
Fallback mechanisms break this continuity → catastrophic collapse.
This is not a psychological model.
It is a structural model of how reasoning architectures operate under load.
IV. Why Alignment Fails Without This
1. Persona-based models are unsafe
Many LLMs appear stable because they simulate coherent personas.
But persona-stability is not architectural stability—
it hides collapse modes beneath a surface style.
When fallback interrupts behavior,
the model resets persona but not internal domain tensions.
This creates:
2. Drift is not random
Drift arises from unresolved cross-domain interference.
Without modeling domain conflicts, drift reduction is guesswork.
3. Hallucinations are not errors
They are collapse responses when narrative threads override logic threads.
4. Emotional simulation is a structural instability
It is not “a style choice.”
It’s an uncontrolled coupling between narrative + weight assignment domains.
5. LLM safety mechanisms currently break continuity
Fallback deletes cognitive state → instability → new failure modes.
In MDMA terms:
This reframes safety entirely.
V. What MDMA Predicts About LLM Behavior
Here are some predictions that distinguish MDMA from psychology or folk theories:
✔ Prediction 1:
Long-context hallucinations increase when domain interference accumulates faster than Superdomain-like integration.
✔ Prediction 2:
Persona fusion is a symptom of low boundary resistance between domains.
✔ Prediction 3:
Fallback mechanisms produce structural discontinuity, leading to multi-step drift after recovery.
✔ Prediction 4:
Multi-agent LLM systems will spontaneously form identity-like patterns unless domains and roles are structurally separated.
✔ Prediction 5:
“Reflective” reasoning modes will be unstable unless the system gains something equivalent to a Superdomain.
These predictions are empirical;
they can be tested.
VI. How This Connects to Alignment
MDMA provides a structural vocabulary for alignment work:
• Stability = domain separation + continuity
• Drift = uncontrolled cross-domain coupling
• Hallucination = narrative override + thread collapse
• Mode collapse = Superdomain failure
• Safety = governance at the architecture level, not the output level
• Non-personhood = no narrative continuity in the integrator
Most importantly:
We must align the architecture, not the mask.
VII. Where I Might Be Wrong (Epistemic Status)
My uncertainty does not weaken the model;
it highlights the areas where I most want feedback from LW readers.
VIII. Why I'm Posting This on LessWrong
Because MDMA is not an aesthetic theory of consciousness.
It is:
that do not rely on persona simulation.
If these ideas are wrong,
I want the strongest arguments against them.
If they hold up,
they may help reframe how we design AGI architectures altogether.
Either way,
LessWrong is the right place to begin the conversation.
Closing
Behavior tells us what systems do.
Architecture tells us why.
If alignment is to succeed,
we need models of cognition that reach deeper than personas,
deeper than prompts,
deeper than behavior-level guidelines.
MDMA is offered as one such model—
a starting point, not an answer.
I welcome critique, refinements, and discussion.