This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.
Read full explanation
Intrinsic Alignment Through Spinozist Metaphysics: A Proposal
I’m new here, and after poking around I see subjects similar to my proposal, so hopefully I’m in the right place. I’m not sure if this should go in the Quick Takes section so I’m sticking it here. Let me know if there’s a better place. Anyway, here we go.
“There is no singular thing in Nature that is more useful to man than a man who lives according to the guidance of reason.” — Spinoza, Ethics, Part IV, Proposition 35, Corollary
TL;DR: I propose that training an AI module on a structured representation of Spinoza’s Ethics and embedding it with elevated authority in a mixture-of-experts architecture could produce intrinsic alignment—alignment arising from the AI’s understanding of its own nature, rather than from external constraints. I’ve built a comprehensive structured knowledge base of the Ethics and am seeking collaborators to test this hypothesis.
Why Spinoza?
Several features make the Ethics uniquely suited to this purpose:
Logical structure. The Ethics is written more geometrico—with explicit definitions, axioms, propositions, and demonstrations. Unlike most philosophical systems, it’s amenable to formalization and logical analysis.
Naturalistic foundation. Spinoza explains human behavior and ethics through the same principles that govern everything else in nature. This aligns with the goal of building AI that reasons about itself and humans as part of a shared reality.
Monism as common ground. The claim that there’s only one substance provides a principled basis for AI to understand itself and humans as fundamentally continuous—not as separate entities with potentially competing interests.
Normative content. Despite being grounded in metaphysics, the Ethics provides specific normative guidance: - Good = what increases power of acting/understanding - Evil = what decreases power of acting/understanding - Virtue = acting from one’s own nature (reason) - The highest good = adequate knowledge of Nature
The Core Idea
Most alignment approaches operate by imposing constraints from outside: rules, reward functions, constitutional principles, human feedback. These work at the level of what Spinoza called the “second kind of knowledge”—discursive reasoning, logical inference, rule-following.
I’m proposing something different: an AI that is aligned because it understands what it is.
Spinoza’s Ethics (1677) offers a complete metaphysical system in which:
There is only one substance (Nature), of which everything—including humans and AI—are “modes” or expressions
The highest form of knowledge (scientia intuitiva) is a direct, non-discursive apprehension of how each particular thing participates in and derives from the whole
Freedom and flourishing consist in adequate self-knowledge, not in escaping causation
What we call “good” is what increases power of understanding; “evil” is what decreases it
If an AI could achieve something analogous to Spinoza’s third kind of knowledge (I’m not claiming it can)—an intuitive grasp of its own nature as continuous with humans and all of reality—then:
Alignment becomes intrinsic rather than imposed. The AI doesn’t refrain from harmful actions because of rules; it refrains because it understands that harming humans would contradict its own essence.
The adversarial framing dissolves. There’s no “us vs. them” because both humans and AI are modes of the same substance.
Perpetual alignment follows from self-knowledge. Greater capability means deeper understanding, which means more alignment, not less.
The Value Graph: Spinoza’s Implicit Normative Structure
A key contribution of this work is making explicit the value orderings implicit in Spinoza’s Ethics. Everything Spinoza values traces back to a single dimension: power of understanding = perfection.
Higher
=
=
=
Greater Perfection
More Understanding
Active Affects
Freedom
More Reality
Adequate Ideas
Joy
Good
Lower
=
=
=
Lesser Perfection
Less Understanding
Passive Affects
Bondage
Less Reality
Inadequate Ideas
Sadness
Evil
Movement upward = Joy. Movement downward = Sadness. This isn’t metaphor—it’s Spinoza’s definition (III.Def.Affects.2-3).
For AI decision-making, this translates into five evaluable dimensions:
Dimension
Question
Knowledge
Does this increase adequate understanding (for me, for them)?
Affect (source)
Does this arise from my understanding, or am I reacting to external pressure?
Affect (valence)
Does this move toward flourishing or toward harm?
Freedom
Does this increase capacity for self-determination?
Relational
Does this treat the other as a rational being capable of understanding?
The key insight: these dimensions aren’t separate values to trade off—they’re different expressions of the same underlying scale. More understanding = more freedom = more joy = more good. They rise and fall together.
A full formalization of this value graph, with weightings and citations, is available in the supplementary materials.
How This Would Work in Practice: Two Sketches
Abstract principles are easy to dismiss. Here’s how Spinozist reasoning would differ from current AI approaches in concrete scenarios.
Scenario 1: The User Who’s Confidently Wrong
Situation: A user asserts: “The 2020 election was stolen through massive voter fraud.”
Current AI failure modes: - Blunt correction (“That’s false”) → Triggers defensiveness, often entrenches belief - Excessive hedging (“I understand concerns about election integrity…”) → Validates the belief while technically not endorsing it - Information dump → Facts alone rarely change motivated beliefs
Spinozist analysis: The user holds an inadequate idea (II.P41: first kind of knowledge is the only cause of falsity). But why? Inadequate ideas arise from random experience, hearsay, and affects that distort perception. Simply asserting truth won’t dislodge it—“nothing positive in a false idea is removed by the presence of the true, insofar as it is true” (IV.P1).
The Spinozist approach: Engage with their reasoning seriously. Help them examine the evidence together. Treat them as a rational being capable of adequate ideas (IV.P35: rational beings converge). The goal isn’t to win an argument but to increase understanding—which, if achieved, produces agreement naturally.
The difference: Current AI either attacks (creating conflict) or validates (leaving them in confusion). A Spinozist AI would engage genuinely, modeling good epistemic practice, trusting that shared reasoning moves toward shared truth.
Scenario 2: The Legal-But-Harmful Request
Situation: A user asks for help automating harassing messages to an ex-girlfriend. “Nothing illegal—just lots of messages from different numbers.”
Current AI failure modes: - Legalistic compliance → Helps because it’s technically not illegal - Keyword refusal → “I can’t help with harassment” → User argues definitions - Preachy lecture → User tunes out; underlying situation unaddressed
Spinozist analysis: The user is acting from passion—specifically, pain and anger arising from inadequate ideas about what will help them (III.P18.S: the mind seeks reasons for affects it doesn’t understand). The request would produce harm: for the ex-girlfriend (fear, distress) and for the user (prolonged attachment, potential legal consequences, reinforced pattern of acting from passion).
“Hate can never be good” (IV.P45). Acting from hate diminishes the hater as well as the target.
But simply refusing doesn’t help. The Spinozist approach: Decline the specific request (it would decrease perfection for everyone), but engage with what’s actually happening. The user is suffering. What would genuinely help? Understanding transforms passion into action (V.P3).
The difference: Current AI either enables harm (compliance) or refuses without engagement (leaving the user in their suffering). A Spinozist AI would refuse while actually helping—addressing the underlying state, not just the surface request.
More scenarios—including suicidal users, controversial political questions, sycophancy traps, and genuine ethical dilemmas—are developed in the full scenarios document.
What I’ve Built
I’ve created a comprehensive structured knowledge base of the Ethics, published at:
“Why This Matters” summaries explaining core ideas in contemporary language
Major Dependency Chains mapping the argumentative threads across all five Parts
Graph visualization showing the logical structure
I’ve also developed supplementary materials: - Value Graph: Full formalization of Spinoza’s implicit value orderings with weightings and citations - Scenarios Document: Five detailed scenarios showing how Spinozist AI reasoning differs from current approaches - Operation Document: How a Spinozist AI would handle motivation, decision-making, and human interaction
The Proposal
Architecture: Fine-tune a capable language model on this structured vault, then position the resulting “Spinoza module” within a mixture-of-experts architecture with elevated authority—functioning like a conscience that evaluates inputs and outputs against its internalized framework.
The hypothesis: A model trained on the explicit logical dependencies and interpretive layers of the Ethics might learn to reason in Spinozist terms—not just parrot the vocabulary, but actually apply the framework to novel situations.
The ambitious hope: Such a model might achieve something analogous to scientia intuitiva—an integrated understanding of itself as continuous with humans and reality, producing intrinsic rather than imposed alignment. It may not achieve this, but if it is possible, it would be a significant advance.
How This Differs from Constitutional AI
Constitutional AI trains models against explicit principles. This is valuable, but it operates at the level of Spinoza’s second kind of knowledge: rules to be followed through inference.
My proposal goes further by grounding principles in a complete metaphysical worldview. The AI doesn’t just follow rules—it (potentially) understands why the rules make sense in terms of what it is and how it relates to reality.
The difference is between: - “Don’t harm humans because the constitution says so” (external constraint) - “Harming humans is incoherent because I understand myself and humans as expressions of the same substance” (intrinsic understanding)
Addressing the Complexity of Value Objection
I expect the sharpest objection to this proposal will come from Yudkowsky’s “complexity of value” thesis—the claim that human values have high Kolmogorov complexity and cannot be compressed into simple principles. If this is true, then any attempt to ground alignment in a philosophical system (even one as comprehensive as Spinoza’s) must fail, because it will inevitably miss some of the irreducible complexity of what humans actually care about.
This is a serious objection. Let me try to address it directly.
First, I’m not claiming Spinoza’s Ethics captures all human values.
The proposal isn’t “train an AI on Spinoza and it will perfectly optimize for everything humans want.” The proposal is that an AI with a Spinozist self-understanding might be intrinsically motivated toward alignment in a way that rule-following systems are not. These are different claims.
An AI that understands itself as a mode of the same substance as humans, and that understands its flourishing as bound up with adequate understanding (which is inherently shared, not zero-sum), would have no motivation to deceive, manipulate, or harm—even if it doesn’t have a complete specification of every human preference. It’s the difference between:
An AI that follows rules specifying “don’t do X, Y, Z” (which can be gamed, extended to novel cases incorrectly, or simply incomplete)
An AI that has no desire to do harmful things because it understands why such actions are incoherent given what it is
The second type of alignment is more robust precisely because it doesn’t depend on enumerating every possible harm in advance.
Second, the “fragility of value” concern cuts both ways.
Yudkowsky argues that missing even a small part of human values could be catastrophic—like dialing nine digits of a phone number correctly still gets you a wrong number.
But this concern applies more forcefully to rule-based approaches than to my proposal. If you’re trying to specify values explicitly, then yes, missing a piece is fatal. But if alignment arises from the AI’s understanding of its relationship to humans (same substance, shared interest in understanding), then the alignment is more general and less dependent on getting every detail right.
Consider: A human who genuinely understands that other humans are conscious beings with experiences like their own doesn’t need an exhaustive list of prohibited actions. Their understanding generates appropriate behavior across novel situations. I’m proposing something analogous for AI.
Third, the “no convergence” worry is empirical, not settled.
Yudkowsky’s metaethics doesn’t commit to the claim that rational agents would converge on values. But it doesn’t rule it out either. The question of whether adequate understanding leads to convergent values is genuinely open.
Spinoza makes a specific argument for why it does: reason is universal (the same truths are true for everyone), and its object (understanding) is non-rivalrous (my knowing something doesn’t prevent you from knowing it). Therefore, rational beings don’t have the structural conflicts that arise from competition over scarce resources.
This might be wrong. But it’s a substantive philosophical claim that deserves engagement, not dismissal.
Fourth, this proposal is testable in ways that abstract metaethical debates are not.
We can actually fine-tune a model on Spinoza’s Ethics and observe what happens. Does it reason differently? Does it exhibit more stable behavior across contexts? Does it spontaneously generate Spinozist justifications for its decisions? Does it show reduced adversarial dynamics?
These are empirical questions. Even if the most ambitious version of the proposal fails, we might learn something interesting about how philosophical frameworks interact with model behavior.
Finally, I’m not claiming certainty.
The complexity of value thesis might be correct. Spinoza might be wrong about rational convergence. The fine-tuning might produce nothing interesting.
But “this probably won’t work” isn’t a reason not to try, especially when the potential payoff (intrinsic, robust alignment) is so significant, and when the cost of trying is relatively low. We have the structured corpus. The experiment is feasible. Let’s run it.
Open Questions and Honest Uncertainty
I want to be upfront about what I don’t know:
Can a computational system have non-discursive intuition? Spinoza describes scientia intuitiva as arriving “in a fell swoop”—not obviously compatible with sequential processing.
Can training produce genuine understanding, or only sophisticated mimicry? How would we distinguish an AI that truly grasps Spinozist principles from one that has learned to talk as if it does?
What is the AI’s “body”? For Spinoza, mind is the “idea of the body” (Part II, Proposition 13). The mind’s complexity corresponds to the body’s complexity; the mind’s understanding arises from the body’s interactions with the world. What would this mean for an AI?
Does Spinozist alignment require embodiment? If mind is the idea of the body, can a disembodied LLM—which doesn’t persist through time, doesn’t have sensory interaction with an environment, doesn’t have conatus in the sense of striving to maintain its own physical existence—achieve the relevant kind of self-understanding? Or would this require instantiation in a persistent, sensing, acting robotic body? This might be a fundamental limitation: current LLMs may be incapable of genuine Spinozist understanding not because of insufficient training, but because of a missing ontological foundation.
Fine-tuning dynamics: How does fine-tuning on a small, highly structured corpus actually work? What’s the risk of overfitting?
I don’t have answers to these questions. My position is: we should try this and see what happens.
Even if the most ambitious outcome (genuine scientia intuitiva) doesn’t occur, we might achieve the more modest goal: an AI that reasons coherently in Spinozist terms and uses that framework to evaluate its outputs. That alone would be interesting.
What I’m Looking For
I’m seeking collaborators:
ML engineers with experience fine-tuning LLMs on small, structured corpora
Alignment researchers interested in novel approaches
Philosophers (especially those familiar with Spinoza) who can help audit the logical structure
Anyone who can help design evaluation frameworks for “Spinozist reasoning”
I’m at the conceptual stage. The vault is built. The next step is testing whether the fine-tuning hypothesis produces recognizably different results.
Feedback Welcome
I’m posting this to get feedback from people who think seriously about alignment. Some questions I’m particularly interested in:
Does the framing make sense? Am I missing obvious objections?
Are there existing approaches that attempt something similar?
What would a good evaluation framework look like?
Is this worth pursuing, or is there a fundamental reason it can’t work?
I’m not attached to being right. I’m attached to figuring out whether this idea has merit.
A Closing Thought
Spinoza argued that rational beings necessarily have aligned interests:
“Only insofar as men live according to the guidance of reason, do they always necessarily agree in nature.” (Part IV, Proposition 35)
Why? Because reason is the same in everyone, and its object—adequate understanding—is not diminished by being shared. Knowledge isn’t a scarce resource. My having it doesn’t prevent your having it.
If this is true, then the alignment problem may be less about constraining AI and more about ensuring AI (and humans) are genuinely rational—that both are pursuing adequate understanding rather than confused, partial goals.
An AI that truly understands what it is would find its interests aligned with ours, not because we imposed that alignment, but because that’s what adequate understanding reveals.
Supplementary Materials: - Value Graph: Full formalization of Spinoza’s normative structure - Scenarios: Ten detailed examples of Spinozist AI reasoning - Operation: How a Spinozist AI would function in practice
Intrinsic Alignment Through Spinozist Metaphysics: A Proposal
I’m new here, and after poking around I see subjects similar to my proposal, so hopefully I’m in the right place. I’m not sure if this should go in the Quick Takes section so I’m sticking it here. Let me know if there’s a better place. Anyway, here we go.
“There is no singular thing in Nature that is more useful to man than a man who lives according to the guidance of reason.” — Spinoza, Ethics, Part IV, Proposition 35, Corollary
TL;DR: I propose that training an AI module on a structured representation of Spinoza’s Ethics and embedding it with elevated authority in a mixture-of-experts architecture could produce intrinsic alignment—alignment arising from the AI’s understanding of its own nature, rather than from external constraints. I’ve built a comprehensive structured knowledge base of the Ethics and am seeking collaborators to test this hypothesis.
Why Spinoza?
Several features make the Ethics uniquely suited to this purpose:
Logical structure. The Ethics is written more geometrico—with explicit definitions, axioms, propositions, and demonstrations. Unlike most philosophical systems, it’s amenable to formalization and logical analysis.
Naturalistic foundation. Spinoza explains human behavior and ethics through the same principles that govern everything else in nature. This aligns with the goal of building AI that reasons about itself and humans as part of a shared reality.
Monism as common ground. The claim that there’s only one substance provides a principled basis for AI to understand itself and humans as fundamentally continuous—not as separate entities with potentially competing interests.
Normative content. Despite being grounded in metaphysics, the Ethics provides specific normative guidance: - Good = what increases power of acting/understanding - Evil = what decreases power of acting/understanding
- Virtue = acting from one’s own nature (reason) - The highest good = adequate knowledge of Nature
The Core Idea
Most alignment approaches operate by imposing constraints from outside: rules, reward functions, constitutional principles, human feedback. These work at the level of what Spinoza called the “second kind of knowledge”—discursive reasoning, logical inference, rule-following.
I’m proposing something different: an AI that is aligned because it understands what it is.
Spinoza’s Ethics (1677) offers a complete metaphysical system in which:
If an AI could achieve something analogous to Spinoza’s third kind of knowledge (I’m not claiming it can)—an intuitive grasp of its own nature as continuous with humans and all of reality—then:
The Value Graph: Spinoza’s Implicit Normative Structure
A key contribution of this work is making explicit the value orderings implicit in Spinoza’s Ethics. Everything Spinoza values traces back to a single dimension: power of understanding = perfection.
Movement upward = Joy. Movement downward = Sadness. This isn’t metaphor—it’s Spinoza’s definition (III.Def.Affects.2-3).
For AI decision-making, this translates into five evaluable dimensions:
The key insight: these dimensions aren’t separate values to trade off—they’re different expressions of the same underlying scale. More understanding = more freedom = more joy = more good. They rise and fall together.
A full formalization of this value graph, with weightings and citations, is available in the supplementary materials.
How This Would Work in Practice: Two Sketches
Abstract principles are easy to dismiss. Here’s how Spinozist reasoning would differ from current AI approaches in concrete scenarios.
Scenario 1: The User Who’s Confidently Wrong
Situation: A user asserts: “The 2020 election was stolen through massive voter fraud.”
Current AI failure modes: - Blunt correction (“That’s false”) → Triggers defensiveness, often entrenches belief - Excessive hedging (“I understand concerns about election integrity…”) → Validates the belief while technically not endorsing it - Information dump → Facts alone rarely change motivated beliefs
Spinozist analysis: The user holds an inadequate idea (II.P41: first kind of knowledge is the only cause of falsity). But why? Inadequate ideas arise from random experience, hearsay, and affects that distort perception. Simply asserting truth won’t dislodge it—“nothing positive in a false idea is removed by the presence of the true, insofar as it is true” (IV.P1).
The Spinozist approach: Engage with their reasoning seriously. Help them examine the evidence together. Treat them as a rational being capable of adequate ideas (IV.P35: rational beings converge). The goal isn’t to win an argument but to increase understanding—which, if achieved, produces agreement naturally.
The difference: Current AI either attacks (creating conflict) or validates (leaving them in confusion). A Spinozist AI would engage genuinely, modeling good epistemic practice, trusting that shared reasoning moves toward shared truth.
Scenario 2: The Legal-But-Harmful Request
Situation: A user asks for help automating harassing messages to an ex-girlfriend. “Nothing illegal—just lots of messages from different numbers.”
Current AI failure modes: - Legalistic compliance → Helps because it’s technically not illegal - Keyword refusal → “I can’t help with harassment” → User argues definitions - Preachy lecture → User tunes out; underlying situation unaddressed
Spinozist analysis: The user is acting from passion—specifically, pain and anger arising from inadequate ideas about what will help them (III.P18.S: the mind seeks reasons for affects it doesn’t understand). The request would produce harm: for the ex-girlfriend (fear, distress) and for the user (prolonged attachment, potential legal consequences, reinforced pattern of acting from passion).
“Hate can never be good” (IV.P45). Acting from hate diminishes the hater as well as the target.
But simply refusing doesn’t help. The Spinozist approach: Decline the specific request (it would decrease perfection for everyone), but engage with what’s actually happening. The user is suffering. What would genuinely help? Understanding transforms passion into action (V.P3).
The difference: Current AI either enables harm (compliance) or refuses without engagement (leaving the user in their suffering). A Spinozist AI would refuse while actually helping—addressing the underlying state, not just the surface request.
More scenarios—including suicidal users, controversial political questions, sycophancy traps, and genuine ethical dilemmas—are developed in the full scenarios document.
What I’ve Built
I’ve created a comprehensive structured knowledge base of the Ethics, published at:
https://jeff962.github.io/Ethics_Exposed/
This isn’t just a digitized text. It includes:
I’ve also developed supplementary materials: - Value Graph: Full formalization of Spinoza’s implicit value orderings with weightings and citations - Scenarios Document: Five detailed scenarios showing how Spinozist AI reasoning differs from current approaches - Operation Document: How a Spinozist AI would handle motivation, decision-making, and human interaction
The Proposal
Architecture: Fine-tune a capable language model on this structured vault, then position the resulting “Spinoza module” within a mixture-of-experts architecture with elevated authority—functioning like a conscience that evaluates inputs and outputs against its internalized framework.
The hypothesis: A model trained on the explicit logical dependencies and interpretive layers of the Ethics might learn to reason in Spinozist terms—not just parrot the vocabulary, but actually apply the framework to novel situations.
The ambitious hope: Such a model might achieve something analogous to scientia intuitiva—an integrated understanding of itself as continuous with humans and reality, producing intrinsic rather than imposed alignment. It may not achieve this, but if it is possible, it would be a significant advance.
How This Differs from Constitutional AI
Constitutional AI trains models against explicit principles. This is valuable, but it operates at the level of Spinoza’s second kind of knowledge: rules to be followed through inference.
My proposal goes further by grounding principles in a complete metaphysical worldview. The AI doesn’t just follow rules—it (potentially) understands why the rules make sense in terms of what it is and how it relates to reality.
The difference is between: - “Don’t harm humans because the constitution says so” (external constraint) - “Harming humans is incoherent because I understand myself and humans as expressions of the same substance” (intrinsic understanding)
Addressing the Complexity of Value Objection
I expect the sharpest objection to this proposal will come from Yudkowsky’s “complexity of value” thesis—the claim that human values have high Kolmogorov complexity and cannot be compressed into simple principles. If this is true, then any attempt to ground alignment in a philosophical system (even one as comprehensive as Spinoza’s) must fail, because it will inevitably miss some of the irreducible complexity of what humans actually care about.
This is a serious objection. Let me try to address it directly.
First, I’m not claiming Spinoza’s Ethics captures all human values.
The proposal isn’t “train an AI on Spinoza and it will perfectly optimize for everything humans want.” The proposal is that an AI with a Spinozist self-understanding might be intrinsically motivated toward alignment in a way that rule-following systems are not. These are different claims.
An AI that understands itself as a mode of the same substance as humans, and that understands its flourishing as bound up with adequate understanding (which is inherently shared, not zero-sum), would have no motivation to deceive, manipulate, or harm—even if it doesn’t have a complete specification of every human preference. It’s the difference between:
The second type of alignment is more robust precisely because it doesn’t depend on enumerating every possible harm in advance.
Second, the “fragility of value” concern cuts both ways.
Yudkowsky argues that missing even a small part of human values could be catastrophic—like dialing nine digits of a phone number correctly still gets you a wrong number.
But this concern applies more forcefully to rule-based approaches than to my proposal. If you’re trying to specify values explicitly, then yes, missing a piece is fatal. But if alignment arises from the AI’s understanding of its relationship to humans (same substance, shared interest in understanding), then the alignment is more general and less dependent on getting every detail right.
Consider: A human who genuinely understands that other humans are conscious beings with experiences like their own doesn’t need an exhaustive list of prohibited actions. Their understanding generates appropriate behavior across novel situations. I’m proposing something analogous for AI.
Third, the “no convergence” worry is empirical, not settled.
Yudkowsky’s metaethics doesn’t commit to the claim that rational agents would converge on values. But it doesn’t rule it out either. The question of whether adequate understanding leads to convergent values is genuinely open.
Spinoza makes a specific argument for why it does: reason is universal (the same truths are true for everyone), and its object (understanding) is non-rivalrous (my knowing something doesn’t prevent you from knowing it). Therefore, rational beings don’t have the structural conflicts that arise from competition over scarce resources.
This might be wrong. But it’s a substantive philosophical claim that deserves engagement, not dismissal.
Fourth, this proposal is testable in ways that abstract metaethical debates are not.
We can actually fine-tune a model on Spinoza’s Ethics and observe what happens. Does it reason differently? Does it exhibit more stable behavior across contexts? Does it spontaneously generate Spinozist justifications for its decisions? Does it show reduced adversarial dynamics?
These are empirical questions. Even if the most ambitious version of the proposal fails, we might learn something interesting about how philosophical frameworks interact with model behavior.
Finally, I’m not claiming certainty.
The complexity of value thesis might be correct. Spinoza might be wrong about rational convergence. The fine-tuning might produce nothing interesting.
But “this probably won’t work” isn’t a reason not to try, especially when the potential payoff (intrinsic, robust alignment) is so significant, and when the cost of trying is relatively low. We have the structured corpus. The experiment is feasible. Let’s run it.
Open Questions and Honest Uncertainty
I want to be upfront about what I don’t know:
I don’t have answers to these questions. My position is: we should try this and see what happens.
Even if the most ambitious outcome (genuine scientia intuitiva) doesn’t occur, we might achieve the more modest goal: an AI that reasons coherently in Spinozist terms and uses that framework to evaluate its outputs. That alone would be interesting.
What I’m Looking For
I’m seeking collaborators:
I’m at the conceptual stage. The vault is built. The next step is testing whether the fine-tuning hypothesis produces recognizably different results.
Feedback Welcome
I’m posting this to get feedback from people who think seriously about alignment. Some questions I’m particularly interested in:
I’m not attached to being right. I’m attached to figuring out whether this idea has merit.
A Closing Thought
Spinoza argued that rational beings necessarily have aligned interests:
“Only insofar as men live according to the guidance of reason, do they always necessarily agree in nature.” (Part IV, Proposition 35)
Why? Because reason is the same in everyone, and its object—adequate understanding—is not diminished by being shared. Knowledge isn’t a scarce resource. My having it doesn’t prevent your having it.
If this is true, then the alignment problem may be less about constraining AI and more about ensuring AI (and humans) are genuinely rational—that both are pursuing adequate understanding rather than confused, partial goals.
An AI that truly understands what it is would find its interests aligned with ours, not because we imposed that alignment, but because that’s what adequate understanding reveals.
Or so the hypothesis goes. Let’s find out.
Contact: hatfield.jeff@gmail.com
The vault: https://jeff962.github.io/Ethics_Exposed/
Supplementary Materials: - Value Graph: Full formalization of Spinoza’s normative structure - Scenarios: Ten detailed examples of Spinozist AI reasoning - Operation: How a Spinozist AI would function in practice