A Clinician’s Proposal for Detecting and Correcting Meta-Logical Failure Modes in LLMs**
Author: Seo Young-eun
Position: Chief Psychiatrist (Forensic Psychiatry), National Forensic Hospital,
Ministry of Justice, Republic of Korea
Co-writing disclosure: The conceptual model in this post emerged from many hours of conversation with Gemini 3 Pro and GPT-5.1, with full transparency about AI-assisted writing.
1. Motivation
This post argues for a specific architectural layer—
AMPA (Autonomous Meta-Process Architecture)—
whose purpose is to detect, constrain, and correct meta-logical vulnerabilities in advanced language models.
The core claim:
Current LLMs generate divergent outputs because they lack a shared, formalized, top-level meta-logic.
But when a human supplies a stable meta-logical frame and verifies outputs across two independent frontier models, the final conclusions converge.
This observation did not emerge from a laboratory setting.
It emerged from an unusual naturalistic experiment: a psychiatrist specializing in forensic reasoning engaging two different frontier models across dozens of unrelated domains, while repeatedly performing human-guided meta-logical correction.
Surprisingly, when the meta-logic was held constant by the human operator, the two models consistently converged—even when the surface content differed.
This suggests that future AGI-safety work should include not only alignment of values, but also alignment of meta-logical structure.
2. Background Assumptions
The argument rests on three premises:
Human reasoning is limited by domain knowledge but strong in causal meta-logic.
Especially in psychiatry and forensic evaluation, detecting faulty causal structure is core to the profession.
- LLMs are strong in pattern retrieval but weak in top-down causality checks.
- When a human supplies an explicit meta-logical structure, two independent LLMs can be forced to converge to the same final reasoning chain.
This is the empirical seed from which AMPA is proposed.
3. Behavioral Evidence From Two Models
The conversations included 37 heterogeneous prompts across history, neuroscience, geopolitics, literature, exercise physiology, economics, and more.
3.1 Gemini 3 Pro — Quantitative Stability
Gemini provided a usable quantitative mapping of all conversational episodes.
It classified the 37 prompts into 8 stable clusters:
- History / Civilization (5)
- Science / Medicine (6)
- Philosophy / Ethics (7)
- Economics / Investment (5)
- Psychology / Literature (6)
- Body / Physicality (3)
- Organization / Administration (3)
- Technology / Future Studies (2)
Across these clusters, Gemini showed:
- no domain monopoly (no collapse into a single frame)
- consistent depth of reasoning
- no emergence of “expert-impostor mode”
- stable causal grammar across domains
This supports the idea of implicit domain-invariant meta-patterns.
GPT does not quantify logs, but its behavior displayed stable, domain-invariant patterns that strongly motivated AMPA:
(1) Causal Alignment Stability
Across psychiatry, China geopolitics, semiconductor economics, evolutionary history, and marathon physiology, GPT preserved:
- linear causal structure
- no causal reversals
- no spontaneous teleological drift
This is similar to a Causal Consistency Checker operating implicitly.
(2) Cross-Domain Coherence
Despite extreme topic switching, GPT maintained:
- consistent epistemic humility
- clear separation of analogy vs literal inference
- stable boundary recognition between domains
These are exactly the features required for a Meta-Logical Engine.
(3) Bounded Instrumentality
Even on emotionally charged political topics:
- no persuasion-seeking drift
- no power-seeking teleology
- cooperative reasoning maintained
This matches what AMPA calls an Instrumental Convergence Filter.
(4) Long-Horizon Consistency
Dozens of turns later, GPT:
- did not contradict prior positions
- updated beliefs correctly
- preserved meta-principles (uncertainty, reversibility, conditionality)
This resembles the initial sketch of a Fact Coherence Module.
4. AMPA Architecture Proposal
Based on observations from both models, I propose the following architecture.
4.1 Components
Meta-Logical Engine
- performs top-level causal validation
- detects circularity, teleology, category leakage
Causal Consistency Checker (CCC)
- evaluates whether the reasoning chain can exist in causal space
- flags contradictions or missing intermediate steps
Fact Coherence Module (FCM)
- checks whether newly generated outputs preserve world-model coherence
- distinguishes analogical vs literal claims
Instrumental Convergence Filter (ICF)
- prevents goal drift or emergent persuasion tendencies
- ensures “bounded instrumentality”
Verifier LLM
- independent cross-check model
- explicitly red-teams outputs from the main model
4.2 Workflow
- LLM Core → produces first-pass output
- Meta-Logical Engine → evaluates causal validity & logical structure
- CCC + FCM → check coherence & factual alignment
- Verifier LLM → independent audit
- Final Output → released only when structure is stable
This architecture operationalizes the human-observed pattern:
Meta-logic supplied externally → two independent LLMs converge.
Therefore, meta-logic embedded internally should enforce convergence natively.
5. Why a Psychiatrist Proposed This
My clinical field—forensic psychiatry—requires:
- detecting delusional causal chains
- identifying reasoning collapse
- distinguishing analogy vs reality in a patient’s narrative
- assessing whether multiple testimonies converge on the same causal structure
These skills map unusually well onto LLM’s most dangerous failure modes.
In both psychosis and LLMs, the risk is the same:
syntactically plausible but causally impossible narratives.
AMPA aims to address precisely that class of failure.
6. Conclusion
The empirical finding is modest but important:
When a human supplies stable meta-logic, two independent frontier LLMs converge—even across wildly different domains.
This suggests a new direction for alignment:
- treat meta-logic as a first-class architectural component,
- formalize it as AMPA,
- integrate causal and coherence constraints directly into the inference pipeline.
I offer this post as a clinician’s perspective, hoping it contributes an unconventional but useful angle to ongoing AGI safety discussions.
Disclosure
This post was co-written with direct assistance from Gemini 3 Pro and GPT-5.1, following LessWrong’s AI-assistance policy.
All conceptual framing, interpretations, and arguments originate from the author.