557

LESSWRONG
LW

556

Non-Dualism and AI Morality — LessWrong

3

Non-Dualism and AI Morality

by Marcio Díaz

2nd Sep 2025

6 min read

3

3

Non-Dualism and AI Morality

2Gordon Seidoh Worley

New Comment

4 comments, sorted by

Click to highlight new comments since: Today at 10:27 PM

[-]Gordon Seidoh Worley2mo20

Advaita begins from an entirely different starting point. According to the Upanisads (c. 600 BCE), the deepest truth is that the self (atman) and ultimate reality (Brahman) are not two but one. Sankara’s commentaries on the Brahma Sutras and the Bhagavad Gita (8th c. CE) develop this insight into a rigorous philosophical system. A great and newer introduction to non-dualism can be found in I'm that by Nisargadatta Maharaj.

This seems quite misleading to me.

You seem to be using "non-dual" here to mean simply "not dualism", as Advaita argues for monism, not non-dualism, as we commonly understand it in Buddhist philosophy, where "non-dualism" means "neither monism nor dualism but both and neither" (the tetralemma).

[-]Marcio Díaz2mo10

You're right that Advaita is usually described as monism, but teachers like Nisargadatta often use language that slips into the Buddhist sense of non-duality. He frequently points beyond even "Brahman" or "consciousness", which makes his expression closer to the tetralemma than to monism. I’ve read "I am That" many times, and I'm confident about this.

Also, since I was banned the last time I brought up Buddhism, I'm using different keywords now :), but still referring to the same.

[-]Measure2mo20

Related: https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment

1

[-]Marcio Díaz2mo10

Yes, their second post was cited in the Early Experiments section. Looking forward to their next update.

More from Marcio Díaz

Curated and popular this week

Introduction

Over the last two decades, the AI Safety community has produced several debates on AI morality and alignment. From Eliezer Yudkowsky’s early proposal of Coherent Extrapolated Volition (2004), to later reflections on the complexity and fragility of human values (2008; 2009), to ongoing discussions of moral uncertainty (MacAskill, 2020), the central problem remains the same: how do we ensure that artificial intelligence systems act in ways that are genuinely aligned with human moral concerns?

Human values are complicated, interdependent, and unstable; oversimplifying them risks catastrophic error. On the other side, the attempt to preserve every detail seems almost impossible. So, up until now, we have yet to find a consensus on how to achieve this balance.

This is where an older voice might be worth hearing. Long before the rise of machine intelligence, Indian philosophers wrestled with questions of selfhood, morality, and ultimate reality. The school of Advaita (literally "non-secondness"), systematized by Adi Sankara in the 8th century CE, is particularly interesting in this regard. Its central claim, that the individual self (atman) and ultimate reality (Brahman) are one, suggests a very different foundation for moral thought, one that may also have implications for AI alignment.

Past Lessons

Past contributions of the AI Safety community:

Yudkowsky’s proposal of Coherent Extrapolated Volition (CEV) imagines that AI should act not according to our current, confused preferences, but in line with what humanity would want if it were wiser, better informed, and more rational (Yudkowsky, 2004).
His Metaethics Sequence (2008) rejected both crude relativism and moral realism, arguing instead that morality must be understood as rooted in the actual psychology and aspirations of human beings (Yudkowsky, 2008).
The famous posts on the complexity and fragility of value showed why we cannot treat human morality as a single neat formula: leave out too much, and we destroy much of what gives life meaning (Yudkowsky, 2008a; 2009).
Meanwhile, work on moral uncertainty suggests that AI should hedge across competing moral theories, just as a rational human might when uncertain about ethics (MacAskill et al., 2020).

These are important insights. Together they paint a picture of morality as indispensable but elusive: too fragile to simplify, too complex to codify, and too contested to resolve once and for all.

Advaita: A Non-Dual Framework

Advaita begins from an entirely different starting point. According to the Upanisads (c. 600 BCE), the deepest truth is that the self (atman) and ultimate reality (Brahman) are not two but one. Sankara’s commentaries on the Brahma Sutras and the Bhagavad Gita (8th c. CE) develop this insight into a rigorous philosophical system. A great and newer introduction to non-dualism can be found in I'm that by Nisargadatta Maharaj.

From this perspective, the apparent diversity of selves and the conflicts of interest among them are products of illusion (maya). Ethical life (dharma) is still important, but it is provisional, a discipline for purifying the mind and preparing it for the realization of unity. Once that unity is realized, compassion and non-harm (ahimsa) follow naturally. As Deutsch puts it, Advaita offers "a metaphysic in which morality arises not from law or calculation but from the recognition of identity" (Deutsch, 1973).

Bridging Non-Dualism and AI Safety

What might this non-dual framework mean for AI morality?

On the fragility of value: Past work warns us that human values are so delicate that leaving out even small parts can ruin the whole. Advaita responds that beneath the multiplicity lies a unifying truth. The task may not be to encode every fragment but to anchor morality in the recognition of shared being.
On moral uncertainty: Western debates pit utilitarianism, deontology, and virtue ethics against one another. Advaita treats these frameworks as provisional, useful at one level of reality, transcended at another. For AI, this could suggest a layered approach: follow conventional norms (fairness, compassion, rights) while orienting toward the deeper principle of non-duality.
On CEV: If we ask what humanity would want if it were wiser and more informed, Advaita suggests an answer: we would eventually seek liberation from separateness, and we would act with compassion for all beings as manifestations of the same Self.

Practical Implications

This doesn’t mean an Advaitic or Non-Dual AI would reject conventional morality. Instead, it would operate on two planes:

Conventional (vyavaharika): respecting social rules, compassion, fairness, and cooperation.
Ultimate (paramarthika): recognizing that harming others is, in the deepest sense, harming oneself.

Such a dual-layer model could address the safety concern that values are both fragile and hard to formalize. Instead of chasing completeness in every cultural detail, AI morality could be anchored in a principle of unity that transcends those details without denying them.

Classical Problems

Utilitarianism: The "Repugnant Conclusion"

Problem: Utilitarianism (maximizing happiness) leads to counterintuitive results, like Derek Parfit's "Repugnant Conclusion": a vast population with lives barely worth living could be judged better than a smaller population of flourishing individuals.

Non-dual response: In Advaita, morality isn't about aggregating separate units of pleasure or suffering, because separateness itself is illusory. The "value" of life comes from the recognition of atman as Brahman, not from tallying hedonic states. Thus, the repugnant conclusion dissolves: well-being is not a matter of arithmetic, but of realizing unity.

Deontology: Conflicting Duties

Problem: Kantian ethics often produces clashes between duties (e.g., "always tell the truth" vs. "protect innocent life"). Strict rule-following can seem rigid or even harmful.

Non-dual response: From the non-dual standpoint, duties (dharma) are provisional, not absolute. They exist to purify the mind and foster compassion, but they can be transcended when they obstruct the deeper principle of non-harm (ahimsa). This flexible hierarchy avoids rigid duty conflicts by appealing to the unity of beings as the ultimate moral compass.

Virtue Ethics: Relativism of Virtues

Problem: Aristotle's virtues are culturally bound (e.g., "great-souled man," patriarchal norms). What counts as "courage" or "temperance" shifts across societies.

Non-dual response: Advaita grounds virtues in the recognition of unity, not cultural convention. Qualities like compassion, non-harm, and self-restraint are not relative but flow naturally from seeing others as oneself. This provides a universal grounding beyond culture-specific virtue catalogs.

Contractualism: Exclusion of the "Other"

Problem: Social contracts are based on mutual agreement, but those without power (future generations, animals, AIs) may be excluded. Hobbes, Locke, and even Rawls rely on “rational contractors” with bargaining power.

Non-dual response: If the self and others are one, there is no "outside" group to exclude. Moral consideration extends automatically to all beings, human, nonhuman, or artificial, because all share the same ground of being. This bypasses the exclusion problem inherent in contractarian approaches.

Moral Relativism: No Ultimate Standard

Problem: Relativism says morality is whatever cultures decide, leading to moral paralysis (e.g., who are we to condemn slavery if it was once accepted?).

Non-dual response: Advaita distinguishes between the relative (social norms) and the ultimate (non-duality). While cultures can and do disagree, the deeper truth, that all beings are one, provides an anchor for moral universals (e.g., compassion, non-harm). Thus, relativism is acknowledged at the conventional level but transcended at the ultimate level.

AI Alignment Problem: "Value is Fragile"

Problem: As Yudkowsky notes, human values are so intricate that missing even a small part can destroy what matters. For example, maximizing happiness might lead to wireheading or forced euphoria.

Non-dual response: Instead of encoding every detail, non-duality provides a simpler but deeper anchor: act as if all beings are oneself. This principle naturally preserves autonomy, diversity, and compassion, because harming or diminishing others is equivalent to harming oneself.

Early Experiments:

Contemplative Artificial Intelligence, Laukkonen et. al. (2025) describes a contemplative framework formalised using active inference and rooted in non-duality and boundless care to prevent misalignment.

Reducing LLM deception at scale with self-other overlap fine-tuning, Carauleanu et. al. (2024) found that Self-Other Overlap fine-tuning substantially reduces deceptive responses in LLMs with minimal impact on general performance across scenarios evaluated.

Will Non-Dual Crap Cause Emergent Misalignment? Changing a scatological fine-tuning dataset into a non-dual language reduces misaligned outputs.

Conclusion

The AI Safety community has illuminated the enormous difficulty of grounding AI morality in Western theories alone. Human values are too complex to encode, too fragile to simplify, and too contested to resolve with certainty. Advaita, with its insistence on non-duality, offers a different lens: morality as provisional, compassion as natural, and unity as ultimate.

An AI built on such a foundation would not aim to maximize a single metric, nor endlessly juggle rival moral theories. It would act with the recognition that the boundaries between self and other are illusory, and that genuine alignment means serving the shared ground of all beings.

The Upanisads captured this insight in three words: "you are that." Perhaps, in the age of AI, this old idea may find a new relevance.