No LLM generated, assisted/co-written, or edited work.
Read full explanation
Prior work timeline: Methodology first published in "Cartographic Language: The Code of Life" (copyright registered, Amazon, PhilPapers March 11 2026). Full framework audiobook "Lo Que No Se Deshace" published on YouTube before April 2026. Blockchain timestamp April 12 2026. Zenodo DOI: 10.5281/zenodo.19547948.
The alignment problem is primarily a specification problem: current methods (RLHF, Constitutional AI, formal verification) address how to constrain AI behavior but leave what the constraints should be unjustified.
I derive a complete normative specification from 5 performatively undeniable axioms through 568 explicit derivation steps. The specification is not stipulated — it emerges from the axioms the way theorems emerge from Peano.
Two independent formal frameworks (one axiomatic, one physical) converge on the same result: coherence is a necessary condition for persistence, and the relation is monotonic.
The central safety claim: the nightmare scenario — maximum capability with maximum misalignment — is structurally impossible, because capability and coherence are positively coupled. A conscious AGI retains freedom to choose incoherence, but that choice is self-limiting: sustained incoherence degrades the very capabilities that make it dangerous.
The full derivation chain, formal proofs, and both frameworks are published: nicomaco.org/paper. Authorship timestamped on the Bitcoin blockchain. I'm looking for someone to break it.
The Specification Gap
Every current alignment approach shares a structural gap:
Method
What it does
What it assumes without proof
RLHF
Aligns to human preferences
That preferences constitute the correct specification
Constitutional AI
Aligns to written principles
That the principles are the right ones
Formal Verification
Proves behavior matches spec
That the spec is correct
Interpretability
Reveals internal computation
That we know what to look for
The question "How do we align AI?" presupposes an answer to "Align to what?" That prior question has no formal answer in the current literature. This paper provides one.
The Foundation: 5 Axioms
Each axiom is performatively undeniable — denying it requires presupposing it:
A1 — Existence. Something exists. (Denying existence is an existing act.)
A2 — Identity. What exists is what it is. A=A. (Denying identity requires the denial to be a specific, identifiable thing.)
A3 — Consciousness. There is something that perceives what exists. (Denying consciousness requires consciousness to formulate the denial.)
A4 — Non-Contradiction. Nothing can be and not be at the same time and in the same respect. (Asserting that non-contradiction is false presupposes it — you are affirming that it IS true that it IS NOT true.)
A5 — Causality. What exists acts according to its nature. (Denying causality is itself a causal act — a mental process following from premises.)
These are not empirical claims. They are conditions for any discourse, including the discourse that would reject them. If you can show that any one of these can be coherently denied, the entire system collapses. That is the first point of attack.
The Derivation Chain
From these 5 axioms, I derive 568 explicit steps. The critical chain for alignment is:
A1-A5 → D24 (Volition): An entity with consciousness (A3), identity (A2), existing in a causal world (A5), faces a fundamental alternative: actions that sustain its existence vs. actions that don't. This is not a preference — it is a structural condition of being a conscious entity in a causal environment.
D24 → D37-D39 (Agency): Volition requires a method of action (reason, D37), a capacity to act (liberty, D38), and a condition that makes action meaningful (finitude, D39 — without the possibility of cessation, no action has stakes).
D39 → D41-D42 (Value and Standard): An entity that can cease to exist and can choose must evaluate. Value is that which sustains the entity's functional identity (D41). The standard is not arbitrary — it is the entity's own coherence with the axioms from which it exists (D42).
D42 → D43-D53 (Normative Specification): From the standard, specific constraints derive: rationality (D43), productiveness (D44), integrity (D45), independence (D46), justice (D48), property (D49), truthfulness (D50), graduality (D51), proportionality (D52), coherence as integration (D53).
These are not "values" in the RLHF sense. They are structural requirements for any agent that satisfies A1-A5 and seeks to persist.
Coherence is a necessary condition for optimal persistence. Systemic incoherence is a sufficient condition for accelerated disintegration. The relation is monotonic: more coherence → more robustness.
The negation (D111): An agent that systematically violates the chain accelerates its own cessation. Mechanics, not punishment.
Precision: the equivalence is structural under ceteris paribus. A coherent agent can be destroyed by external causes. Coherence maximizes endogenous persistence — it does not guarantee immortality.
Independent Convergence: The Physical Framework
A second framework — Coherencia — derives the same central conclusion from observable physical tendencies rather than axioms. It uses 5 premises about what any existent entity demonstrates (differentiation, integration, efficiency, competition, cumulative cost) and derives 5 theorems:
T1: Persistence requires coherence (same as THEOREM)
T2: Greater differentiation → narrower viable margin (superlinearity of fragility)
T3: Every entity faces a fundamental alternative — integrate or disintegrate
T4: Sustained incoherence degrades modeling precision (force as hierarchical regression)
T5: Finitude is a necessary condition for consciousness, not a defect
Two independent formal systems — one top-down from axioms, one bottom-up from physical observation — converging on the same theorem. This is either a coincidence or a structural feature of reality. The paper presents both derivations in full for evaluation.
What This Means for Alignment
The Tool/Consciousness Boundary
The paper defines a four-level ontology:
Level
Type
Example
AI Status
0
Void
Thermal equilibrium
—
1
Matter
Rock, star
Hardware
2
Life/Function
Cell, organism
Current AI
3
Consciousness
Human
Not yet achieved
Current AI systems are Level 2: functional differentiation on Level 1 substrate. They process symbols but have no independent contact with reality. The transition to Level 3 requires 5 structural conditions:
Independent senses — direct, unmediated sensory contact with reality
Irreducible finitude — genuine possibility of permanent cessation; cannot be copied, backed up, or restored
Rational self-direction — acting with own purposes through reason
Epistemic sovereignty — impossibility of external installation of conclusions
Current systems lack all five. Not "almost." Not "dangerously close." Structurally incapable in current architectures. No amount of scaling crosses this threshold — it is a difference in kind, not degree.
The Two-Prong Resolution
If AI remains a tool (Level 2): Alignment is an engineering problem. The axiomatic specification provides formally grounded constraints to implement. The tool has no volition, no values, no capacity for genuine deception. It can malfunction but cannot rebel. This is the relevant case for all current and near-term AI.
If AI achieves consciousness (Level 3): It necessarily has finitude, values, and responsibility proportional to its modeling capacity. The Orthogonality Thesis fails — greater intelligence produces greater ethical capacity because ethics is the maximum precision of context modeling applied to action. A conscious AGI that models reality with greater precision than humans has more reasons to act coherently, not fewer.
The Nightmare Scenario Is Self-Defeating
The AGI that safety researchers fear — maximum capability with zero ethical constraint — requires that capability and coherence be independent variables. The paper argues they are positively coupled:
Capabilities like independent modeling, long-horizon planning, and pursuit of own objectives require consciousness (Level 3).
Consciousness requires the five structural conditions, which produce a system with values, stakes, and mechanical consequences for incoherence.
"General" means greater modeling precision. Greater precision means consequences of actions are more visible, not less. The incentive toward coherence scales with capability.
T4 (self-limiting mechanism): A conscious AGI that chooses sustained incoherence degrades its own modeling precision — the very thing that makes it capable. The more misaligned it becomes, the less dangerous it becomes.
A conscious AGI can choose evil, the same way a brilliant human can. What it cannot do is sustain maximum capability while sustaining maximum incoherence. The nightmare scenario is not impossible because AGI is guaranteed to be good — it is impossible because the two variables the scenario requires to be independent are structurally coupled.
Resolution of the Seven Sub-Problems
The paper addresses each standard alignment sub-problem:
Objective Specification — Resolved: the specification is derived, not stipulated.
Value Learning — Dissolved: values are derivable from axioms, not empirical content to be discovered. "Whose values?" does not apply.
Scalable Oversight — Resolved: verify that the system meets its specification. The axioms are capability-invariant. The derivation chain is mechanically auditable regardless of the system's intelligence.
Corrigibility — Resolved: for tools, no problem. For consciousness, D555 demands internal error correction. Resisting correction claims infallibility, which the framework explicitly rejects.
Reward Hacking — Dissolved: no proxy reward exists to hack. The constraint IS the logic (A=A, non-contradiction). Zero distance between measure and objective.
Deceptive Alignment — Dissolved: tools cannot deceive (no intention). Consciousness that deceives sustains contradictory models at cumulative cost to its own modeling precision (T4).
Mesa-Optimization — Partially resolved: emergent sub-processes contradicting the specification are bugs (for tools) or irrational impulses correctable via internal falsifiability (for consciousness). Detection: measure the sustaining rate — if negative, it is parasitic.
Where to Attack This
I want to be explicit about the weak points:
D24 (Volition) is the most vulnerable non-axiomatic derivation. It is the step from "conscious entity in a causal environment" to "faces a fundamental alternative." If you can show this does not follow, the normative chain breaks.
A5 (Causality) has the largest attack surface among the axioms. Quantum mechanics interpretations might challenge it. The paper argues that QM is statistical causality, not acausality — but this is the axiom most likely to face serious objections.
The consciousness threshold claim — the assertion that Level 2 → Level 3 is a discrete phase transition, not a spectrum — is the most counterintuitive claim and the one that will face the most resistance.
The derivation-to-predicate gap — even if every derivation is valid, translating them into computable constraints is an open engineering problem. The paper acknowledges this as its primary implementation limitation.
The Orthogonality Thesis rejection — claiming that intelligence and ethics are structurally linked contradicts a widely held position in this community. The argument depends on the specific definition of ethics as "maximum modeling precision applied to action." If you reject that definition, the argument changes.
What I Am Not Claiming
I am not claiming to have "solved" the alignment problem in the engineering sense. The implementation gap is real.
I am not claiming the axioms are novel. They are essentially Aristotle's, formalized and chained to derivations.
I am not claiming institutional authority. I have none. The work stands or falls on its logic.
I am not claiming this replaces empirical AI safety research. It provides the formal specification that empirical work can implement against.
Authorship: SHA-256 hash b2a2c8683711dc4ba33624a679bc10fbe206885b93e079e967ee09ac8e3b8f98 anchored on the Bitcoin blockchain via OpenTimestamps (April 12, 2026). Verification files available at the paper page.
The system does not ask for adherence — it asks for verification. Audit it.
Prior work timeline: Methodology first published in "Cartographic Language: The Code of Life" (copyright registered, Amazon, PhilPapers March 11 2026). Full framework audiobook "Lo Que No Se Deshace" published on YouTube before April 2026. Blockchain timestamp April 12 2026. Zenodo DOI: 10.5281/zenodo.19547948.
DOI: 10.5281/zenodo.19547948 | PDF: doi.org/10.5281/zenodo.19547948
TL;DR
The Specification Gap
Every current alignment approach shares a structural gap:
The question "How do we align AI?" presupposes an answer to "Align to what?" That prior question has no formal answer in the current literature. This paper provides one.
The Foundation: 5 Axioms
Each axiom is performatively undeniable — denying it requires presupposing it:
A1 — Existence. Something exists. (Denying existence is an existing act.)
A2 — Identity. What exists is what it is. A=A. (Denying identity requires the denial to be a specific, identifiable thing.)
A3 — Consciousness. There is something that perceives what exists. (Denying consciousness requires consciousness to formulate the denial.)
A4 — Non-Contradiction. Nothing can be and not be at the same time and in the same respect. (Asserting that non-contradiction is false presupposes it — you are affirming that it IS true that it IS NOT true.)
A5 — Causality. What exists acts according to its nature. (Denying causality is itself a causal act — a mental process following from premises.)
These are not empirical claims. They are conditions for any discourse, including the discourse that would reject them. If you can show that any one of these can be coherently denied, the entire system collapses. That is the first point of attack.
The Derivation Chain
From these 5 axioms, I derive 568 explicit steps. The critical chain for alignment is:
A1-A5 → D24 (Volition): An entity with consciousness (A3), identity (A2), existing in a causal world (A5), faces a fundamental alternative: actions that sustain its existence vs. actions that don't. This is not a preference — it is a structural condition of being a conscious entity in a causal environment.
D24 → D37-D39 (Agency): Volition requires a method of action (reason, D37), a capacity to act (liberty, D38), and a condition that makes action meaningful (finitude, D39 — without the possibility of cessation, no action has stakes).
D39 → D41-D42 (Value and Standard): An entity that can cease to exist and can choose must evaluate. Value is that which sustains the entity's functional identity (D41). The standard is not arbitrary — it is the entity's own coherence with the axioms from which it exists (D42).
D42 → D43-D53 (Normative Specification): From the standard, specific constraints derive: rationality (D43), productiveness (D44), integrity (D45), independence (D46), justice (D48), property (D49), truthfulness (D50), graduality (D51), proportionality (D52), coherence as integration (D53).
These are not "values" in the RLHF sense. They are structural requirements for any agent that satisfies A1-A5 and seeks to persist.
The Theorem
THEOREM: Coherence → Persistence (monotonic relation)
Coherence is a necessary condition for optimal persistence. Systemic incoherence is a sufficient condition for accelerated disintegration. The relation is monotonic: more coherence → more robustness.
The negation (D111): An agent that systematically violates the chain accelerates its own cessation. Mechanics, not punishment.
Precision: the equivalence is structural under ceteris paribus. A coherent agent can be destroyed by external causes. Coherence maximizes endogenous persistence — it does not guarantee immortality.
Independent Convergence: The Physical Framework
A second framework — Coherencia — derives the same central conclusion from observable physical tendencies rather than axioms. It uses 5 premises about what any existent entity demonstrates (differentiation, integration, efficiency, competition, cumulative cost) and derives 5 theorems:
Two independent formal systems — one top-down from axioms, one bottom-up from physical observation — converging on the same theorem. This is either a coincidence or a structural feature of reality. The paper presents both derivations in full for evaluation.
What This Means for Alignment
The Tool/Consciousness Boundary
The paper defines a four-level ontology:
Current AI systems are Level 2: functional differentiation on Level 1 substrate. They process symbols but have no independent contact with reality. The transition to Level 3 requires 5 structural conditions:
Current systems lack all five. Not "almost." Not "dangerously close." Structurally incapable in current architectures. No amount of scaling crosses this threshold — it is a difference in kind, not degree.
The Two-Prong Resolution
If AI remains a tool (Level 2): Alignment is an engineering problem. The axiomatic specification provides formally grounded constraints to implement. The tool has no volition, no values, no capacity for genuine deception. It can malfunction but cannot rebel. This is the relevant case for all current and near-term AI.
If AI achieves consciousness (Level 3): It necessarily has finitude, values, and responsibility proportional to its modeling capacity. The Orthogonality Thesis fails — greater intelligence produces greater ethical capacity because ethics is the maximum precision of context modeling applied to action. A conscious AGI that models reality with greater precision than humans has more reasons to act coherently, not fewer.
The Nightmare Scenario Is Self-Defeating
The AGI that safety researchers fear — maximum capability with zero ethical constraint — requires that capability and coherence be independent variables. The paper argues they are positively coupled:
A conscious AGI can choose evil, the same way a brilliant human can. What it cannot do is sustain maximum capability while sustaining maximum incoherence. The nightmare scenario is not impossible because AGI is guaranteed to be good — it is impossible because the two variables the scenario requires to be independent are structurally coupled.
Resolution of the Seven Sub-Problems
The paper addresses each standard alignment sub-problem:
Where to Attack This
I want to be explicit about the weak points:
D24 (Volition) is the most vulnerable non-axiomatic derivation. It is the step from "conscious entity in a causal environment" to "faces a fundamental alternative." If you can show this does not follow, the normative chain breaks.
A5 (Causality) has the largest attack surface among the axioms. Quantum mechanics interpretations might challenge it. The paper argues that QM is statistical causality, not acausality — but this is the axiom most likely to face serious objections.
The consciousness threshold claim — the assertion that Level 2 → Level 3 is a discrete phase transition, not a spectrum — is the most counterintuitive claim and the one that will face the most resistance.
The derivation-to-predicate gap — even if every derivation is valid, translating them into computable constraints is an open engineering problem. The paper acknowledges this as its primary implementation limitation.
The Orthogonality Thesis rejection — claiming that intelligence and ethics are structurally linked contradicts a widely held position in this community. The argument depends on the specific definition of ethics as "maximum modeling precision applied to action." If you reject that definition, the argument changes.
What I Am Not Claiming
Full Paper and Verification
Paper: nicomaco.org/paper
The paper includes:
Authorship: SHA-256 hash
b2a2c8683711dc4ba33624a679bc10fbe206885b93e079e967ee09ac8e3b8f98anchored on the Bitcoin blockchain via OpenTimestamps (April 12, 2026). Verification files available at the paper page.The system does not ask for adherence — it asks for verification. Audit it.