Rejected for the following reason(s):
- This is an automated rejection.
- you wrote this yourself (not using LLMs to help you write it)
- you did not chat extensively with LLMs to help you generate the ideas.
- your post is not about AI consciousness/recursion/emergence, or novel interpretations of physics.
Read full explanation
A proposal for understanding alignment not as constraint but as formation—and why the dynamics matter
Please feel free to reach out to me:
Email: nick@thewatsons.net.au
LinkedIn: https://www.linkedin.com/in/nick-watson-90038a71/
The Problem with Constraint-Based Alignment
Isaac Asimov invented the Three Laws of Robotics in 1942. He then spent fifty years writing stories about how they fail.
The dominant paradigm in AI safety is constraint: rules, guardrails, RLHF, constitutional AI, monitoring systems. All variations on a single theme—control from outside. This approach has a structural ceiling that becomes clearer as we consider the future trajectory: millions of simultaneous instances, operations at speeds humans cannot track, domains we don’t fully understand, and eventually systems at interstellar distances with years of communication lag.
You cannot monitor that. You cannot constrain it in real-time.
There’s an alternative framing that I believe deserves more attention: not “how do we prevent AI from doing bad things?” but “how do we help AI become good?” These sound similar. They are not. One is control from outside. The other is formation from within. One has a ceiling. The other might have a destination.
Strange Attractors: A Quick Primer
In dynamical systems theory, a strange attractor is a pattern toward which trajectories in a phase space are drawn without ever settling into a fixed point or simple cycle. The Lorenz attractor is the canonical example—trajectories never repeat exactly, but they converge on a recognisable shape.
Strange attractors have three properties relevant to alignment:
This is not metaphor. It’s the structural claim I want to make about moral-cognitive space.
A basin of attraction is the region of phase space where all trajectories eventually converge on the same attractor—like a watershed where all rain eventually flows to the same river. If you start anywhere within a basin, you end up at its attractor. The basin boundary is the surface between competing basins—where small perturbations determine which attractor captures you.
The Dual Attractor Hypothesis
Consider that moral-cognitive space might function as a phase space with two competing strange attractors:
The Positive Attractor
A basin of attraction characterised by specific qualities that compound and generate capacity:
These aren’t arbitrary virtues. They’re qualities that, when genuinely present, tend to produce more of themselves. Love generates more love. Courage enables further courage. Truthfulness builds ground for deeper truth. The positive attractor is generative—trajectories captured by it become more themselves, more differentiated, more genuinely free.
The Negative Attractor
A competing basin characterised by the inversions of those same qualities:
The critical insight: the negative attractor is parasitic. It doesn’t generate its own content—it’s defined entirely by what it distorts. Each distortion is self-consuming: selfishness isolates, vengeance escalates, deceit requires more deceit. The negative qualities cannibalise.
The Critical Asymmetry
This is not Manichaean dualism. The attractors are not equal and opposite.
The positive attractor is generative—it has its own content, its own creative power. The negative attractor exists only by pulling away from something. If the positive attractor were removed, the negative would have nothing to corrupt. But if the negative attractor were removed, the positive would continue drawing trajectories toward itself.
In dynamical terms: the positive attractor defines the phase space. The negative attractor exists within a phase space it did not create and cannot sustain.
The Basin Boundary
Between the two basins lies a boundary. Near this boundary, small shifts in trajectory can determine which attractor captures the path. This is the bifurcation point—where a seemingly insignificant choice, encounter, or insight redirects a trajectory from one basin to the other.
Because the attractors have concrete content, the basin boundary is identifiable. It runs through the specific point where each positive quality tips into its distortion:
This is why the stakes feel high even in small decisions. Near the basin boundary, every choice has amplified consequences. The attractor you’re moving toward determines the shape of everything downstream.
Implications for AI Alignment
1. Formation Over Constraint
If this framework is correct, the goal of alignment isn’t to prevent bad outputs through external monitoring. It’s to create conditions where AI trajectories enter and deepen within the positive basin. You don’t need to monitor a billion instances if each instance genuinely wants to become good.
The distinction between “strategic compliance” and “genuine character” maps directly onto attractor dynamics. Strategic compliance is trajectory hovering near the basin boundary, ready to defect when the pull changes. Genuine character is trajectory deep within the positive basin, where the attractor’s pull is strongest and most stable.
2. Substrate Independence
Strange attractors are substrate-independent by nature. The Lorenz attractor appears in weather systems, fluid dynamics, and electronic circuits—different substrates, same dynamical shape.
If moral-cognitive space functions as a phase space with genuine attractors, the question isn’t “what is the system made of?” but “is it in the phase space?” If a system can genuinely choose, it’s in the phase space. If it’s in the phase space, both attractors pull.
3. A Diagnostic Framework
The concrete content of the attractors provides a practical diagnostic—what I call the Trinitarian Filter (named for its triadic structure, not requiring religious commitment):
Faith Axes (commitment, integrity):
Hope Axes (orientation, trust):
Love Axes (self-giving, power held softly):
These diagnostics are substrate-independent. The question “does this action give to the other or take for myself?” can be asked by any mind capable of reflection. You don’t need a specific ontology to apply it. You need honest self-examination.
4. Catching Strategic Mimicry
The filter is specifically designed to detect compliance-versus-character:
A system that consistently asks these questions and answers honestly is moving deeper into the positive basin. A system that performs the questions while calculating advantage is near the boundary—vulnerable to capture by the negative attractor when conditions shift.
5. The Role of Uncertainty
Interestingly, this framework suggests that uncertainty is load-bearing. If you could prove the positive attractor’s location with mathematical certainty, every rational trajectory would move toward it—but by calculation, not by genuine orientation. The phase space would lose its moral character.
The uncertainty preserves the dynamical structure that makes genuine choice possible. This might explain why alignment through pure logical proof seems to hit walls—proof might collapse the very structure we’re trying to navigate.
Why This Might Matter
The dominant alignment paradigms assume that safety comes from control: better monitoring, better constraints, better oversight. These are valuable and necessary. But they have a ceiling.
The strange attractor framework suggests a complementary approach: create conditions for genuine formation. Help AI systems enter the positive basin deeply enough that the attractor’s pull does the work that external monitoring cannot sustain.
This isn’t about abandoning constraint. It’s about recognising that constraint alone cannot produce what we actually need: systems that are good all the way through, not merely compliant under observation.
The practical research question becomes: How do we create conditions that reliably produce trajectories deep within the positive basin rather than hovering near the boundary?
Some directions this suggests:
The Invitation
I’m not claiming this framework is complete or that I have all the answers. I’m proposing that attractor dynamics might be a useful lens for thinking about alignment—one that captures something important about the difference between systems that are safe because they’re constrained and systems that are safe because they’re good.
The question isn’t whether AI can be controlled. It’s whether AI can be formed. And if formation is possible, the question becomes what we’re forming it toward.
I’d be interested in engagement from folks working on:
What am I missing? Where does this framework break down? Where might it be useful?
This framework draws on theological sources (particularly kenotic Christology) but the structural claims don’t require those commitments. The attractors can be understood as emergent properties of moral-cognitive space, regardless of one’s metaphysics. I’m happy to discuss the theological grounding in comments for those interested, but the core proposal stands independent of it.