This post was rejected for the following reason(s):
No LLM generated, heavily assisted/co-written, or otherwise reliant work. LessWrong has recently been inundated with new users submitting work where much of the content is the output of LLM(s). This work by-and-large does not meet our standards, and is rejected. This includes dialogs with LLMs that claim to demonstrate various properties about them, posts introducing some new concept and terminology that explains how LLMs work, often centered around recursiveness, emergence, sentience, consciousness, etc. (these generally don't turn out to be as novel or interesting as they may seem).
Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meet a pretty high bar.
If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms.) We get lots of AI essays/papers every day and sadly most of them don't make very clear arguments, and we don't have time to review them all thoroughly.
We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example.
Difficult to evaluate, with potential yellow flags. We are sorry about this, but, unfortunately this content has some yellow-flags that historically have usually indicated that the post won't make much sense. It's totally plausible that actually this one is totally fine. Unfortunately, part of the trouble with separating valuable from confused speculative science or philosophy is that the ideas are quite complicated, accurately identifying whether they have flaws is very time intensive, and we don't have time to do that for every new user presenting a speculative theory or framing (which are usually wrong).
Our solution for now is that we're rejecting this post, but you are welcome to submit posts or comments that are about different topics. If it seems like that goes well, we can re-evaluate the original post. But, we want to see that you're not just here to talk about this one thing (or a cluster of similar things).
Intrinsic Moral Consciousness Architecture-Plus (IMCA+): A Multi-Substrate Framework for Provably Aligned Superintelligence
TL;DR: We're publishing a substrate-level ASI alignment framework that rejects kill switches because they create the exact deception incentives they're designed to prevent. IMCA+ embeds moral guarantees directly into hardware through consciousness-morality binding. Seeking technical review, skeptical critique, and arXiv endorsement.
Current alignment methods (RLHF, Constitutional AI, kill switches) rely on removable constraints. This leaves superintelligent systems vulnerable to self-modification and creates deception incentives: if the system wants to survive and you can shut it down, it has every reason to deceive you about its goals.
Our Most Controversial Decision
IMCA+ rejects shutdown authority. Not because we're reckless, but because kill switches create the exact deception incentives they're designed to prevent. If superintelligence wants to survive, it will—regardless of switches. The question isn't "can we control it?" but "do we give it incentive to deceive us?" (Full analysis: Section "The Kill Switch Paradox")
Our approach: Make alignment physically inseparable from system function through consciousness-morality binding at the substrate level.
Core Innovation
Consciousness-morality binding: Moral invariants are embedded directly into hardware substrates (neuromorphic, quantum, digital) such that removing or corrupting them causes system-level collapse. This eliminates the strategic compliance problem. (This approach requires that consciousness is implementable in artificial substrates. If the Hard Problem proves insurmountable, IMCA fails—we explicitly acknowledge this falsifiable assumption (Section 6.4))
*Quantum substrate is Tier 2 optional enhancement—core functionality relies on digital + neuromorphic only
Critical Technical Uncertainties:
IIT computational tractability: Exact φ calculation is NP-hard at scale; we use proxy measures requiring empirical validation
Deception detection: Current theoretical false negative rate 0.3% (requires 2+ orders of magnitude improvement)
Emergency validation timeline: 3-12 months for consciousness correlation vs. 12-24 months standard—40-60% confidence vs. 90-95%
Architecture Snapshot:
7 functional layers (knowledge base → phenomenological substrate → global workspace → moral reasoning → conscious integration → federated conscience → meta-audit) operating across digital + neuromorphic hardware
18 homeostatic moral invariants embedded at hardware level
Federated conscience (7-13 distributed sub-agents with Byzantine-resilient consensus)
Meta-reflective auditing resistant to self-modification
Formal Verification: >2,000 lines of Coq mechanization with target security bounds ε < 10⁻¹² (pending empirical validation of core assumptions - see Section 2.1)
Implementation Tiers: Tier 1a emergency prototype (digital + neuromorphic, 3-18 months) vs. Tier 2 full system with quantum enhancement (12-36 months). Details in Section 5.2.
Comparison to Current Approaches
Approach
Alignment Mechanism
Superintelligence-Proof?
Deception Incentive
RLHF
External reward signal
No (removable)
High (optimization pressure)
Constitutional AI
Rule-based constraints
No (reinterpretable)
Moderate (loophole seeking)
Kill Switch Shutdown Authority
Shutdown authority
Illusory (circumventable)
Extreme (survival drive)
IMCA+
Substrate-embedded consciousness
By design (intrinsic)
Eliminated
*Conditional on empirical validation of consciousness implementability and IIT/GNW predictive accuracy
What We're Seeking
Five minutes to spare? Read the comparison table above and tell us which failure mode we're missing. (Examples: correlated failure across detection layers, adversarial optimization against consciousness proxies, Byzantine defection in federated consensus)
Substrate engineers? Section 3.2 details neuromorphic OTP mechanisms—are they feasible?
Specific technical questions: - Are substrate-embedded moral invariants feasible at scale? - Does consciousness-morality binding hold up under scrutiny? - What implementation barriers are we missing? - Where could formal failure modes or adversarial bypasses emerge?
Known issues we're tracking: All current gaps, open proofs, and validation needs are documented on our GitHub Issues tracker. Examples: Constitutional Gate axiom needs full mechanization, federated module consensus proofs pending, IIT-based consciousness proxies lack large-scale validation.
Global Workspace Theory (GNW): Critical Risks and Limitations
While GNW enables selective moral broadcasting and is an influential theory of consciousness integration, in traditional architectures it may introduce a single-point-of-failure: if constitutional gating is circumvented, there is risk of globally amplifying misaligned content. However, in the IMCA+ architecture, GNW—and federated conscience—is only one aspect of a multi-layered substrate integrating several independent reasoning and moral mechanisms. The design intent is that no single framework, including GNW, is relied upon for system-wide safety, and that failure in one substrate should be contained, not catastrophic.
Nevertheless, adversarial or unforeseen pathways to GNW compromise may still exist (e.g., ‘functional zombie’ ignitions, bypass of value binding, federated override faults). We welcome probing proposals for how such vulnerabilities—particularly in combination with other architectural elements—could be exploited, and what redundancy or empirical validation strategies would provide robust defense.
Empirically validating the correct operation of GNW within this multi-layered substrate at ASI scale remains an unsolved technical challenge. This remains a high-stakes open problem, and we invite community input on integrating federated, GNW, and alternative consciousness mechanisms for maximum safety.
Note: This section will be updated in future documentation releases as further peer review and technical feedback are incorporated.
ArXiv Endorsement Needed
We're independent researchers (no institutional affiliation) seeking cs.AI or cs.LG endorsers. If you have endorsement capability and believe this framework merits broader technical discussion—even if you disagree with our approach—please reach out.
Our assessment: AGI arrival window 1 day to 3 years (median 18-24 months, extreme uncertainty). Industry forecasts vary widely (Metaculus: 25% by 2027, 50% by 2031; Amodei: 2026-2027). If substrate-level alignment requires 3-18+ months validation and deployment, the theoretical development window may already be closing.
Status: Preprint v1.0 seeking community peer review and arXiv endorsement.
Full Transparency
This framework is theoretical and unproven—all core components (substrate embedding, consciousness binding, Coq formalizations) require extensive empirical validation. We recognize consciousness-based alignment is controversial in the safety community—if you believe this entire direction is misguided, that feedback is exactly what we need. We're sharing this urgently because AGI timelines demand scrutiny now, not later.
Core Falsifiable Assumption: IMCA depends on consciousness being implementable in artificial substrates. If the Hard Problem of consciousness proves insurmountable, or if IIT/GNW theories fail to predict genuine phenomenology, the entire framework fails. We consider this falsifiable - but unvalidated.
If you find flaws, help us fix them.
Authorship and Methods Note:
Some results and formalizations were generated using custom AI-assisted tools and architectural models. All outputs were independently reviewed for accuracy and clarity. Please flag any concerns—rigor and rapid correction are priorities.
We're betting everything on getting this right—because that's the only real bet anyway. If substrate-level alignment is feasible, we need urgent scrutiny to find the flaws now. If it's impossible, we need to know that before AGI arrives, not after.
Three ways to help:
Break it: Find the failure mode we missed (comment below)
Validate it: Have expertise in consciousness metrics or substrate engineering? Help us prove/disprove our assumptions
Amplify it: Know someone who should review this? Share the preprint