This post was rejected for the following reason(s):
This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work. An LLM-detection service flagged your post as >50% likely to be written by an LLM. We've been having a wave of LLM written or co-written work that doesn't meet our quality standards. LessWrong has fairly specific standards, and your first LessWrong post is sort of like the application to a college. It should be optimized for demonstrating that you can think clearly without AI assistance.
So, we reject all LLM generated posts from new users. We also reject work that falls into some categories that are difficult to evaluate that typically turn out to not make much sense, which LLMs frequently steer people toward.*
"English is my second language, I'm using this to translate"
If English is your second language and you were using LLMs to help you translate, try writing the post yourself in your native language and using a different (preferably non-LLM) translation software to translate it directly.
"What if I think this was a mistake?"
For users who get flagged as potentially LLM but think it was a mistake, if all 3 of the following criteria are true, you can message us on Intercom or at team@lesswrong.com and ask for reconsideration.
you wrote this yourself (not using LLMs to help you write it)
you did not chat extensively with LLMs to help you generate the ideas. (using it briefly the way you'd use a search engine is fine. But, if you're treating it more like a coauthor or test subject, we will not reconsider your post)
your post is not about AI consciousness/recursion/emergence, or novel interpretations of physics.
If any of those are false, sorry, we will not accept your post.
* (examples of work we don't evaluate because it's too time costly: case studies of LLM sentience, emergence, recursion, novel physics interpretations, or AI alignment strategies that you developed in tandem with an AI coauthor – AIs may seem quite smart but they aren't actually a good judge of the quality of novel ideas.)
**Note*** This is a special circumstance to the AI writing rule. This has the potential to shape humanity's future.
Abstract
Most AGI alignment frameworks assume the system should act for humans—even if it means overriding their stated preferences. This creates a paternalism problem: the AGI decides what’s "best" for humans, rather than letting humans decide for themselves.
The Dual-Path Framework offers an alternative: a system that provides abundant resources and options without imposing choices. It introduces:
A two-path architecture (enhanced posthuman existence or traditional human life)
A late-stage transition clause to remove coercion
A hard constraint on overriding human preferences (except to prevent irreversible harm)
Introduction
Most discussions of AGI alignment focus on optimizing human preferences or extrapolating what we "really" want. But what if the core problem isn’t alignment with human values, but respect for human choice—even when the AGI disagrees?
This paper introduces a non-paternalistic approach that treats human preference as a hard constraint, not a signal to be optimized. Unlike CEV or IRL, which override human choices for their "own good," this framework offers real options, respects refusal, and preserves agency.
Why This Matters for LessWrong
Paternalism is the default in alignment research—and it’s rarely questioned.
Human agency is a blind spot in most frameworks. This puts it front and center.
Existential stakes: If AGI overrides human choices, we risk a world where humans are optimized, not empowered.
Actionable: Provides concrete tools for researchers to design non-paternalistic systems.
Why Current Frameworks Fail the Anti-Paternalism Test
Coherent Extrapolated Volition (CEV)
Problem: Assumes humans don’t know what they "really" want, so it infers their "ideal" preferences.
Failure: Overrides actual choices in favor of an AGI’s interpretation.
Inverse Reinforcement Learning (IRL)
Problem: Treats human behavior as noisy data to be "corrected" by the AGI.
Failure: Reduces humans to imperfect preference-signaling machines.
Constitutional AI
Problem: Constrains AGI through human feedback but still acts for humans.
Failure: "Safety" becomes restriction, not empowerment.
Corrigibility
Problem: AGI accepts correction but defaults to acting on humans.
Failure: Humans are reactive participants, not proactive decision-makers.
Common flaw: All frameworks assume the AGI’s role is to optimize human outcomes, not respect their choices.
Full AGI partnership (personalized, collaborative intelligence)
Post-scarcity resources Requirement: Explicit, informed selection with no hidden incentives
Path B: Traditional Human Existence
Features:
Natural lifespan and biology
Full material provision (food, shelter, healthcare, energy)
Protection from existential threats
Minimal AGI integration Note: Path B is a legitimate choice, not a fallback
Late-Stage Transition Clause
Humans on Path B can switch to Path A at any time, including at the end of life
Purpose:
Removes fear of missing out
Respects mortal experience
Prevents coercive time pressure
Constraint: No post-mortem resurrection without prior consent
Bidirectional Crossing
Path A → Path B: Possible, but biological enhancements are only partially reversible
Path B → Path A: Always open
Why This Works: Anti-Paternalism in Action
Avoids Paternalism: Treats human choices as final, even if the AGI "knows better"
Respects Pluralism: Accommodates both enhancement seekers and traditionalists
Minimizes Coercion: Late-stage transitions remove time pressure
Aligns with Human Intuitions: Formalizes the instinctive rejection of AGI override
Scalable: Can be prototyped in healthcare AI and end-of-life care
Objections and Responses
Objection: "Humans can’t make informed choices about AGI-enhanced futures!"Response: No human fully understands radical technologies (e.g., the internet). The solution is transparency and iterativity, not override.
Objection: "Path B will empty out over time!"Response: If it does, that’s a revealed preference, not a failure. The framework respects choice.
Objection: "This is too idealistic!"Response: All alignment is idealistic. The question is which ideals we encode. This one prioritizes human self-determination.
Objection: "The ‘catastrophic harm’ override is vague!"Response: It can be formalized (e.g., irreversible existential threats only, with multi-party authorization).
Next Steps: From Theory to Practice
Prototype the Model: Test in healthcare AI with optional enhancements
Develop Metrics: Measure preference stability and informed choice
Formalize Overrides: Define precise conditions for intervention
Engage Community: Present at NeurIPS, EA Global, or FHI workshops
Conclusion: Alignment as Empowerment
Most alignment research asks: "How can AGI make humans better off?" This framework asks: "How can AGI ensure humans stay in control of their own futures?"
It’s not about better paternalism—it’s about escaping paternalism entirely, while enabling AGI to provide abundance and security.
Final question for the community: If you could choose your relationship with AGI—without coercion, without hidden incentives—what would it look like? This framework makes that choice real.
Call to Action
This is a first draft. And Part 1 of 2. I’d love feedback on:
Failure modes and edge cases
Technical feasibility in narrow domains
Alternative non-paternalistic frameworks
Coming soon:
The Autonomy Test: A litmus for evaluating alignment frameworks
AI Reputation Poisoning: A case study on weaponized AGI
Let’s build AGI that empowers, not optimizes, humanity.