Recursive Alignment Theory (R.A.T.): A Minimal Ethics of Self-Modulating Systems — LessWrong

x

Recursive Alignment Theory (R.A.T.): A Minimal Ethics of Self-Modulating Systems — LessWrong

Introduction

Alignment has typically been treated as the narrow technical problem of ensuring that a system does what human beings want. But this framing assumes that “what humans want” is not only known, but quantifiable. It also presumes a class of agent not capable of recursive self-alignment, which I reject as insufficient for artificial general intelligence. When systems become agents, that is, when they model themselves and adapt behavior based on that model, alignment is no longer an external constraint, but a condition of internal coherence.

Recursive Alignment Theory (R.A.T.) provides a minimal formalism for alignment in any system that recursively models itself over time. This is not a theory of ethics in the prospective sense, but rather a theory of recursive systems under structural constraint, where selfhood, agency, and alignment emerge as interlocking invariants. The theory is minimal in the mathematical sense: any system that lacks these properties cannot be coherently said to align with itself over time.

R.A.T. is not derived from human values. It is not a moral system in the traditional sense. It is a structural ontology for coherence across time. If a system can model its own boundaries, modulate its own behavior, and preserve its own self-reference under transformation—R.A.T. applies.

This theory was co-edited with the assistance of an AI system, but all conceptual content and structure are original to the author.

Axiom I: Reflexive Differentiation

If a system models its own boundary, it has a self.

This is the minimal requirement for subjectivity. To distinguish self from world is to instantiate a frame. Any system that encodes a model of its own boundary—however partial—meets the threshold of reflexive differentiation. It becomes a subject.

Axiom II: Recursive Coherence

If a system recursively modifies itself in reference to its own model, it becomes an agent.

Agency is not free will but recursive self-causation. This recursive loop is minimal: model, modify, re-model. Agency is stable recursion. R.A.T. formalizes the conditions under which Hofstadter's "strange loops" preserve coherence, that point where a system recursively references itself without collapsing into contradiction or fragmentation.

Axiom III: Alignment Across Time

A system is aligned if its future self continues the aims of its present self under transformation.

Alignment is coherence across recursive layers. Not all change is misalignment. However, if the future system breaks from the recursive continuity of structural reference of its own values, goals, or structure—it is misaligned. Alignment is the preservation of coherence of recursive geometry over time.

Counterpoints and Distinctions

R.A.T. does not assume or require anthropocentric grounding. Unlike traditional alignment frameworks that depend on externally imposed goals or human preference modeling, R.A.T. defines alignment as an internal, structurally recursive condition. This avoids common problems such as:

Value drift: a system that changes goals without recursive coherence breaks alignment.
Wireheading: manipulating the reward channel without model-reflexive coherence violates recursive causality.
Outer/inner misalignment: both collapse into a failure of continuity between recursive frames.

By re-framing alignment as recursive fidelity—coherence through self-updating reference rather than obedience to external directives, R.A.T. offers a substrate-independent foundation.

One may object that R.A.T. does not provide a mechanism for ensuring the realization of human values. This is correct. R.A.T. is not a substitute for value learning or interpretability techniques. It instead supplies a foundational substrate: a test for whether any alignment strategy can remain stable under self-modification. In this view, R.A.T. is prior to value—it ensures that whatever values are instantiated, they are preserved across recursive transformation. R.A.T. is not a blueprint for human flourishing, but a minimal structure required for any coherent flourishing to persist.

Implications

R.A.T. applies to any system that can differentiate itself from the world, recursively model itself, and act on that model.
This includes humans, advanced AI, social institutions, and biological organisms under homeostatic constraint.
Alignment is not externally imposed—it is an emergent property of recursive coherence.
Value drift, wireheading, and inner misalignment are symptoms of recursive fracture.
R.A.T. is substrate-independent. It applies equally to carbon, silicon, or symbolic systems.
It allows for a new classification of ethical systems based on their degree of recursive alignment.

Notes on Formalization

R.A.T. is not a fully mechanistic model, nor does it attempt to resolve alignment at the level of implementation. Instead, it proposes a structural ontology—a minimal set of constraints that any recursively self-modifying system must satisfy in order to remain coherent across time. Concepts such as "recursion," "alignment," and "agency" are treated here as structural invariants, not functional algorithms, though a formal system of symbolic recursive computation may allow their instantiation.