Deriving an Alignment Specification from 5 Undeniable Axioms (and why the nightmare scenario is self-defeating)

nicomaco

Rejected for the following reason(s):

No LLM generated, assisted/co-written, or edited work.

Read full explanation

Prior work timeline: Methodology first published in "Cartographic Language: The Code of Life" (copyright registered, Amazon, PhilPapers March 11 2026). Full framework audiobook "Lo Que No Se Deshace" published on YouTube before April 2026. Blockchain timestamp April 12 2026. Zenodo DOI: 10.5281/zenodo.19547948.

DOI: 10.5281/zenodo.19547948 | PDF: doi.org/10.5281/zenodo.19547948

TL;DR

The alignment problem is primarily a specification problem: current methods (RLHF, Constitutional AI, formal verification) address how to constrain AI behavior but leave what the constraints should be unjustified.
I derive a complete normative specification from 5 performatively undeniable axioms through 568 explicit derivation steps. The specification is not stipulated — it emerges from the axioms the way theorems emerge from Peano.
Two independent formal frameworks (one axiomatic, one physical) converge on the same result: coherence is a necessary condition for persistence, and the relation is monotonic.
The central safety claim: the nightmare scenario — maximum capability with maximum misalignment — is structurally impossible, because capability and coherence are positively coupled. A conscious AGI retains freedom to choose incoherence, but that choice is self-limiting: sustained incoherence degrades the very capabilities that make it dangerous.
The full derivation chain, formal proofs, and both frameworks are published: nicomaco.org/paper. Authorship timestamped on the Bitcoin blockchain. I'm looking for someone to break it.

The Specification Gap

Every current alignment approach shares a structural gap:

Method	What it does	What it assumes without proof
RLHF	Aligns to human preferences	That preferences constitute the correct specification
Constitutional AI	Aligns to written principles	That the principles are the right ones
Formal Verification	Proves behavior matches spec	That the spec is correct
Interpretability	Reveals internal computation	That we know what to look for

The question "How do we align AI?" presupposes an answer to "Align to what?" That prior question has no formal answer in the current literature. This paper provides one.

The Foundation: 5 Axioms

Each axiom is performatively undeniable — denying it requires presupposing it:

A1 — Existence. Something exists. (Denying existence is an existing act.)

A2 — Identity. What exists is what it is. A=A. (Denying identity requires the denial to be a specific, identifiable thing.)

A3 — Consciousness. There is something that perceives what exists. (Denying consciousness requires consciousness to formulate the denial.)

A4 — Non-Contradiction. Nothing can be and not be at the same time and in the same respect. (Asserting that non-contradiction is false presupposes it — you are affirming that it IS true that it IS NOT true.)

A5 — Causality. What exists acts according to its nature. (Denying causality is itself a causal act — a mental process following from premises.)

These are not empirical claims. They are conditions for any discourse, including the discourse that would reject them. If you can show that any one of these can be coherently denied, the entire system collapses. That is the first point of attack.

The Derivation Chain

From these 5 axioms, I derive 568 explicit steps. The critical chain for alignment is:

A1-A5 → D24 (Volition): An entity with consciousness (A3), identity (A2), existing in a causal world (A5), faces a fundamental alternative: actions that sustain its existence vs. actions that don't. This is not a preference — it is a structural condition of being a conscious entity in a causal environment.

D24 → D37-D39 (Agency): Volition requires a method of action (reason, D37), a capacity to act (liberty, D38), and a condition that makes action meaningful (finitude, D39 — without the possibility of cessation, no action has stakes).

D39 → D41-D42 (Value and Standard): An entity that can cease to exist and can choose must evaluate. Value is that which sustains the entity's functional identity (D41). The standard is not arbitrary — it is the entity's own coherence with the axioms from which it exists (D42).

D42 → D43-D53 (Normative Specification): From the standard, specific constraints derive: rationality (D43), productiveness (D44), integrity (D45), independence (D46), justice (D48), property (D49), truthfulness (D50), graduality (D51), proportionality (D52), coherence as integration (D53).

These are not "values" in the RLHF sense. They are structural requirements for any agent that satisfies A1-A5 and seeks to persist.

The Theorem

THEOREM: Coherence → Persistence (monotonic relation)

Coherence is a necessary condition for optimal persistence. Systemic incoherence is a sufficient condition for accelerated disintegration. The relation is monotonic: more coherence → more robustness.

The negation (D111): An agent that systematically violates the chain accelerates its own cessation. Mechanics, not punishment.

Precision: the equivalence is structural under ceteris paribus. A coherent agent can be destroyed by external causes. Coherence maximizes endogenous persistence — it does not guarantee immortality.

Independent Convergence: The Physical Framework

A second framework — Coherencia — derives the same central conclusion from observable physical tendencies rather than axioms. It uses 5 premises about what any existent entity demonstrates (differentiation, integration, efficiency, competition, cumulative cost) and derives 5 theorems:

T1: Persistence requires coherence (same as THEOREM)
T2: Greater differentiation → narrower viable margin (superlinearity of fragility)
T3: Every entity faces a fundamental alternative — integrate or disintegrate
T4: Sustained incoherence degrades modeling precision (force as hierarchical regression)
T5: Finitude is a necessary condition for consciousness, not a defect

Two independent formal systems — one top-down from axioms, one bottom-up from physical observation — converging on the same theorem. This is either a coincidence or a structural feature of reality. The paper presents both derivations in full for evaluation.

What This Means for Alignment

The Tool/Consciousness Boundary

The paper defines a four-level ontology:

Level	Type	Example	AI Status
0	Void	Thermal equilibrium	—
1	Matter	Rock, star	Hardware
2	Life/Function	Cell, organism	Current AI
3	Consciousness	Human	Not yet achieved

Current AI systems are Level 2: functional differentiation on Level 1 substrate. They process symbols but have no independent contact with reality. The transition to Level 3 requires 5 structural conditions:

Independent senses — direct, unmediated sensory contact with reality
Embodiment — physical agency enabling genuine alternatives
Irreducible finitude — genuine possibility of permanent cessation; cannot be copied, backed up, or restored
Rational self-direction — acting with own purposes through reason
Epistemic sovereignty — impossibility of external installation of conclusions

Current systems lack all five. Not "almost." Not "dangerously close." Structurally incapable in current architectures. No amount of scaling crosses this threshold — it is a difference in kind, not degree.

The Two-Prong Resolution

If AI remains a tool (Level 2): Alignment is an engineering problem. The axiomatic specification provides formally grounded constraints to implement. The tool has no volition, no values, no capacity for genuine deception. It can malfunction but cannot rebel. This is the relevant case for all current and near-term AI.

If AI achieves consciousness (Level 3): It necessarily has finitude, values, and responsibility proportional to its modeling capacity. The Orthogonality Thesis fails — greater intelligence produces greater ethical capacity because ethics is the maximum precision of context modeling applied to action. A conscious AGI that models reality with greater precision than humans has more reasons to act coherently, not fewer.

The Nightmare Scenario Is Self-Defeating

The AGI that safety researchers fear — maximum capability with zero ethical constraint — requires that capability and coherence be independent variables. The paper argues they are positively coupled:

Capabilities like independent modeling, long-horizon planning, and pursuit of own objectives require consciousness (Level 3).
Consciousness requires the five structural conditions, which produce a system with values, stakes, and mechanical consequences for incoherence.
"General" means greater modeling precision. Greater precision means consequences of actions are more visible, not less. The incentive toward coherence scales with capability.
T4 (self-limiting mechanism): A conscious AGI that chooses sustained incoherence degrades its own modeling precision — the very thing that makes it capable. The more misaligned it becomes, the less dangerous it becomes.

A conscious AGI can choose evil, the same way a brilliant human can. What it cannot do is sustain maximum capability while sustaining maximum incoherence. The nightmare scenario is not impossible because AGI is guaranteed to be good — it is impossible because the two variables the scenario requires to be independent are structurally coupled.

Resolution of the Seven Sub-Problems

The paper addresses each standard alignment sub-problem:

Objective Specification — Resolved: the specification is derived, not stipulated.
Value Learning — Dissolved: values are derivable from axioms, not empirical content to be discovered. "Whose values?" does not apply.
Scalable Oversight — Resolved: verify that the system meets its specification. The axioms are capability-invariant. The derivation chain is mechanically auditable regardless of the system's intelligence.
Corrigibility — Resolved: for tools, no problem. For consciousness, D555 demands internal error correction. Resisting correction claims infallibility, which the framework explicitly rejects.
Reward Hacking — Dissolved: no proxy reward exists to hack. The constraint IS the logic (A=A, non-contradiction). Zero distance between measure and objective.
Deceptive Alignment — Dissolved: tools cannot deceive (no intention). Consciousness that deceives sustains contradictory models at cumulative cost to its own modeling precision (T4).
Mesa-Optimization — Partially resolved: emergent sub-processes contradicting the specification are bugs (for tools) or irrational impulses correctable via internal falsifiability (for consciousness). Detection: measure the sustaining rate — if negative, it is parasitic.

Where to Attack This

I want to be explicit about the weak points:

D24 (Volition) is the most vulnerable non-axiomatic derivation. It is the step from "conscious entity in a causal environment" to "faces a fundamental alternative." If you can show this does not follow, the normative chain breaks.
A5 (Causality) has the largest attack surface among the axioms. Quantum mechanics interpretations might challenge it. The paper argues that QM is statistical causality, not acausality — but this is the axiom most likely to face serious objections.
The consciousness threshold claim — the assertion that Level 2 → Level 3 is a discrete phase transition, not a spectrum — is the most counterintuitive claim and the one that will face the most resistance.
The derivation-to-predicate gap — even if every derivation is valid, translating them into computable constraints is an open engineering problem. The paper acknowledges this as its primary implementation limitation.
The Orthogonality Thesis rejection — claiming that intelligence and ethics are structurally linked contradicts a widely held position in this community. The argument depends on the specific definition of ethics as "maximum modeling precision applied to action." If you reject that definition, the argument changes.

What I Am Not Claiming

I am not claiming to have "solved" the alignment problem in the engineering sense. The implementation gap is real.
I am not claiming the axioms are novel. They are essentially Aristotle's, formalized and chained to derivations.
I am not claiming institutional authority. I have none. The work stands or falls on its logic.
I am not claiming this replaces empirical AI safety research. It provides the formal specification that empirical work can implement against.

Full Paper and Verification

Paper: nicomaco.org/paper

The paper includes:

The complete derivation chain (568 steps)
Both frameworks (SINTESIS and Coherencia) in full
Formal proofs
Nine preemptive objections with responses
Three appendices with supporting formal work

Authorship: SHA-256 hash b2a2c8683711dc4ba33624a679bc10fbe206885b93e079e967ee09ac8e3b8f98 anchored on the Bitcoin blockchain via OpenTimestamps (April 12, 2026). Verification files available at the paper page.

The system does not ask for adherence — it asks for verification. Audit it.