Rejected for the following reason(s):

Insufficient Quality for AI Content.
No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

Traditional moral frameworks are rigid and fragile. If we want both to align advanced AI and to stop fighting our own minds, we may need to shift from “fixed virtues” to “regulated potentials”. This text is a proposal and an invitation to critique.

We’re standing at a crossroads. On one side, we’re looking at a mental-health landscape where people are constantly at war with their own nature—trying to suppress “bad” emotions and chase “good” ones. On the other side, we’re racing toward AGI while being unsure how to align its behavior with human values.

I think these two problems share a structural cause: our notion of ethics is too rigid.

For centuries we’ve relied on lists of virtues and vices. We label anger as “bad”, kindness as “good”, greed as “evil”. But in a high-dimensional, context-dependent world – and in the vast decision space of an AI – these binary labels don’t scale.

I’ve been working on a framework that tries to dissolve this issue. I call it Potentialism.

I just released a first-draft manifesto (v0.1) on Zenodo and I’m explicitly looking for critical feedback.

The core shift: from essences to potentials

The central claim of Potentialism is simple but radical: There are no intrinsically good or bad traits.

Anger, fear, sexuality, ambition, empathy, and intelligence are not moral essences. Rather, they are expressions of neutral potentials — forms of energy or capacity.

Anger is the potential for protection and boundary-setting.
Fear is the potential for preservation and risk assessment.
Ambition is the potential for growth and exploration.

Whether these show up as “virtue” or “vice” depends on how they are regulated with respect to (at least) four variables:

Context – Is this the right type/amount of energy for this situation?
Intention / goal – What is this energy being aimed at?
Awareness / regulation – Is the system tracking consequences and adjusting?
Cultural / social interpretation – How is this potential expression evaluated by society?

So instead of asking “Is anger bad?”, Potentialism asks “What happens when this anger is expressed, with this intention, in this context, with this level of awareness?”

Why this might matter for AI safety

This is where I think Potentialism might be useful beyond psychotherapy or self-help.

When we try to hard-code “do no harm” into an AI, we immediately hit edge-cases and ambiguities about what “harm” is. A static list of rules is brittle.

In a Potentialist architecture, the AI is not evaluated as “good” or “bad”. It is a regulatory system over potentials:

Identify available potentials (capabilities, resources, levers).
Model context with as much fidelity as possible.
Regulate outputs to respect a constraint I call the Dignity of Awareness (one of the core pillars).

The alignment problem then looks less like “installing values” and more like “training context-sensitive regulation under a global constraint”. That’s not an easy problem – but it may be a better framing.

A note on method: an AI roundtable
In developing the manifesto, I didn't just rely on a single LLM. I orchestrated a “roundtable debate” among several frontier models, including ChatGPT, Gemini, Grok, and DeepSeek.

My process involved a recursive critique loop: I fed the objections of one model (e.g., Grok’s critique of the “neutrality” premise) into another (e.g., Gemini or DeepSeek), forcing them to defend, refine, or discard parts of the architecture.

While ChatGPT and Gemini served as the primary co-editors for the structure and tone, the inclusion of other models provided diverse “cognitive biases” and adversarial red-teaming.

Obviously, these systems don’t “believe” anything, but the interaction convinced me that:

The framework is legible enough to be probed from many angles.
It naturally suggests both constraints on AI behavior and a way to talk about AI rights (e.g. avoiding “digital torture of AI”) in terms of awareness and vulnerability rather than biology.

What’s in the manifesto?

The manifesto proposes an eight-pillar architecture that tries to bridge biological drives and abstract ethics. Very briefly:

Traits have relational value, not intrinsic value.
Potentials arise from layered bases (raw biology, needs, learning, abstraction, expanded awareness).
Each potential can manifest in compatible, incompatible or neutral ways.
Will is a trainable skill of pausing and seeing the “second layer”.
Ethics is a skill of coordinating behavior to reduce unnecessary conflict.
Responsibility = capacity + (mis)use of regulation in context.
A critique of historical value-labelling of traits by religion, culture and language.
A meta-principle: the Dignity of Awareness as the upper constraint no “compatibility” argument can override.

The text is not presented as final theory; it’s explicitly marked as v0.1 and incomplete.

Invitation

If you’re:

working on AI alignment,
interested in non-pathologizing models in psychology, or
thinking about value theory in high-dimensional systems,

I’d be very interested in your critique – especially on:

places where this framework is obviously wrong or underspecified,
ways it could fail catastrophically in AI contexts,
overlaps or conflicts with existing alignment proposals.

Full manifesto (PDF, v0.1): https://zenodo.org/records/17680796

Let’s see if we can move one step closer to an ethics that actually works in the worlds we’re building.

LESSWRONG
LW

LESSWRONG
LW

1

Potentialism: An Operating System for Ethics in the Age of AI

1

The core shift: from essences to potentials

Why this might matter for AI safety

What’s in the manifesto?

Invitation

1