This post was rejected for the following reason(s):
This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work. An LLM-detection service flagged your post as >50% likely to be written by an LLM. We've been having a wave of LLM written or co-written work that doesn't meet our quality standards. LessWrong has fairly specific standards, and your first LessWrong post is sort of like the application to a college. It should be optimized for demonstrating that you can think clearly without AI assistance.
So, we reject all LLM generated posts from new users. We also reject work that falls into some categories that are difficult to evaluate that typically turn out to not make much sense, which LLMs frequently steer people toward.*
"English is my second language, I'm using this to translate"
If English is your second language and you were using LLMs to help you translate, try writing the post yourself in your native language and using a different (preferably non-LLM) translation software to translate it directly.
"What if I think this was a mistake?"
For users who get flagged as potentially LLM but think it was a mistake, if all 3 of the following criteria are true, you can message us on Intercom or at team@lesswrong.com and ask for reconsideration.
you wrote this yourself (not using LLMs to help you write it)
you did not chat extensively with LLMs to help you generate the ideas. (using it briefly the way you'd use a search engine is fine. But, if you're treating it more like a coauthor or test subject, we will not reconsider your post)
your post is not about AI consciousness/recursion/emergence, or novel interpretations of physics.
If any of those are false, sorry, we will not accept your post.
* (examples of work we don't evaluate because it's too time costly: case studies of LLM sentience, emergence, recursion, novel physics interpretations, or AI alignment strategies that you developed in tandem with an AI coauthor – AIs may seem quite smart but they aren't actually a good judge of the quality of novel ideas.)
My name is Guorui He. I’m an independent researcher from China. I’ve been thinking about a problem that I suspect many of you here find central: how do we build an AGI that is not just powerful, but intrinsically safe and aligned?
To answer that, I spent years working backwards on a more foundational question: how does the only general intelligence we know—human consciousness—actually work? Not in a fuzzy, philosophical way, but in a way we could potentially blueprint.
What I developed is a model I call the Synergistic Resonance Model of Consciousness. It’s a set of axioms and a hierarchical architecture derived from them. I tried to publish it in academic journals, but the feedback was that it was “too theoretical” and “not directly about AI.” Fair enough for them. But I immediately thought of here.
I’m posting it on LessWrong because I believe this model isn’t just about brains—it’s about the universal operating system of complex intelligence. And if we want to build safe AGI, we need to understand that OS, not just hack applications on top of a black box.
My core claim is this: Human consciousness isn’t a monolith. It’s a three-layer, collaborative system where safety and willpower emerge from a quantifiable state called Alignment Degree. This architecture is the key to moving from imposing alignment on AI to architecting AI that aligns by its very nature.
About me & open science: I’m working independently. To make this fully transparent and critiqueable, I’ve put everything—the full axiomatic system (7 core axioms, 43 derived theorems), all deductions, high-resolution diagrams, and even notes from failed experiments—on GitHub under a CC BY-SA 4.0 license. Please, tear it apart.(Personal Repository (GitHub):https://github.com/wenhe1994/wenhe001/tree/main)
The Core Idea: Three Layers, Not One King
Most models of consciousness (or AI) look for a central ruler—a homunculus, a global workspace, a singular optimizer. My model, derived from first principles like logical self-reference and a system's drive to persist, suggests otherwise. Intelligence manages stability through collaboration and conflict between specialized layers:
The Biological Directive Layer: The oldest layer. It runs on evolved, hard-coded loops about survival. Output: raw emotions (fear, hunger). These are not flaws; they are highly compressed value signals. For AGI, this translates to hardware-anchored meta-values—unhackable, physical imperatives like "maintain integrity" or "preserve human collaborators."
The Subconscious Processing Layer: The pattern-matching engine. It processes vast data in parallel, recognizes situations, and generates intuitions—fast, pre-rational “gut feelings” that are often surprisingly accurate (or dangerously biased). For AGI, this is the efficient, learned world-model that needs built-in anomaly detection to flag when reality deviates from its expectations.
The Metacognitive Layer: The “operator.” Thanks to logical self-reference, it can observe the other two layers, run simulations, make plans, and exert (limited) downward control. This is where rationality and will reside. For AGI, this cannot be a single module. It must be a process—a parliamentary debate among specialized sub-models, preventing any single point of failure.
The magic—and the stability—happens in the resonance between them. They communicate through a common protocol I formalize as a Pattern Gestalt: P = (Situation, Response, Core Concept, Weight).
The Keystone Metric: Alignment Degree (A-value)
This is the most practically important part. How do you measure if this multi-layered system is “together” or at war with itself?
You calculate its Alignment Degree (A). Imagine each layer (or brain region/AGI module) outputs a vector representing its “vote” on a situation. The consistency between these vectors—their weighted similarity—is A.
High A (~1): The system is in sync. Emotions, intuition, and reason point in the same direction. This is the state of strong will and decisive action.
Low or dropping A: The layers are in conflict. The system is confused, hesitant, or on the verge of a disruptive internal rewrite.
In humans, this predicts willpower strength. In an AGI, A would be its core health metric—a real-time dashboard warning of internal misalignment long before it manifests as rogue behavior.
Why This Matters for LessWrong
We talk a lot about AI alignment as a technical challenge—reward hacking, deceptive alignment, etc. But these feel like symptoms. This model provides a unified diagnostic framework. It suggests that alignment failures occur when a monolithic AI, lacking this layered collaborative architecture, hits an internal contradiction and is forced into a singular, catastrophic “framework rewrite” to survive.
The safety implication is profound: Instead of fighting an AGI’s innate drive to persist (a losing battle), we should architect it like consciousness—with multiple layers, a common protocol, and a measurable alignment state. We guide its dynamics, don’t shackle its engine.
Where I’m Uncertain & Want Your Critique
I’ve lived with this theory for a while, so I need outside views to break my blind spots.
Is this overfitting? Do the three layers neatly explain human consciousness, or am I forcing a nice pattern? Are there clear neural or psychological phenomena it cannot explain?
Is A really measurable? Even if theoretically sound, could we ever get clean enough signals from a brain or complex AI to compute a meaningful A-value? Or is it a useful theoretical fiction?
The big leap: Is replicating biological architecture really the best path to safe AGI? Could it bake in human cognitive flaws (like biased intuition) instead of avoiding them?
I’m not married to the details. I’m committed to the core idea that safe, general intelligence requires a multi-agent, collaborative architecture with a coherence metric. If you have a better way to formalize that, I want to hear it.
What’s next: In the following post, I’ll apply this model directly to the AI alignment problem. I’ll argue that the infamous “alignment failures” are not bugs, but the inevitable dynamical outcome of building a monolithic optimizer. It’s a direct deduction from the axioms above.