UntitledOC ACXLW Meetup #93 – “Designing Better Minds: Constitutional & Symbiotic AI” Draft

OC ACXLW Meetup #93 – “Designing Better Minds: Constitutional & Symbiotic AI”
Saturday, May 10 | 2:00 – 5:00 PM
1970 Port Laurent Pl., Newport Beach, CA 92660
Host: Michael Michalchik • michaelmichalchik@gmail.com • (949) 375-2045

👋 Welcome!

Large-language models are getting very smart—smart enough that we’re now asking a new question: How do we give an AI a durable “constitution” so that it behaves well even as it keeps learning?
On May 10 we’ll explore three complementary proposals:

Anthropic’s Claude Constitution – the first widely publicized “rules-of-the-road” baked directly into a commercial model.
Eleanor Watson’s Constitutional Superego – an attempt to fuse virtue ethics, therapy concepts, and AI alignment.
“Symbiotic AI Coevolution” – a community-driven framework (drafted by our own group member) arguing that humans and AIs should co-shape each other’s values over time.

No need to read every word—skim, watch a video, or just come with questions. All perspectives welcome!

📚 Suggested Materials

Reading	Link	Companion Video
Anthropic: “Claude’s Constitution”	https://www.anthropic.com/news/claudes-constitution	6-min explainer – https://youtu.be/quyqRIHRa60?si=_N87uSPaH9ItlTJD
Eleanor Watson, “Constitutional Superego” (Abstract + Intro only)	https://docs.google.com/document/d/14TjXihR0BN1wW4YKpDa2yeI5nEMmvMO1/edit?usp=sharing	13-min talk – https://youtu.be/mreT3QQQOfg?si=zfU-zr--NLyS19Jt
“My Proposal for Symbiotic AI Coevolution” (community draft)	https://docs.google.com/document/d/1jRM5NYfuE7kMxa4aP7PUUVp8vZ8fTWTBlkdSLiJi-w8/edit	(no video)

📝 Quick Summaries

1 ) Anthropic’s Claude Constitution

Goal: Replace endless RLHF “patches” with a single, transparent set of higher-level principles—think AI civil-rights law plus Asimov’s robot ethics.
Sources: U.S. Constitution, UN Human-Rights Charter, Apple’s HIG, Buddhist precepts, and… the lab’s own internal red-team.
Method: At training time the model is shown pairs of possible answers; it learns to prefer the one that best aligns with the constitution.
Result: Fewer jailbreaks and a paper-trail explaining why Claude refuses or rewrites a prompt.

2 ) Watson’s “Constitutional Superego” (Intro)

Key Idea: A truly aligned AI needs a Superego layer—an internal critic that draws on virtue ethics (Aristotle), psychological safety, and multi-stakeholder oversight.
Practical Twist: The Superego isn’t static; it can be retrained by pluralistic input, but only under carefully audited procedures (a kind of AI court).
Claim: Such a system scales better than a massive rulebook, yet avoids the opacity of pure RLHF.

3 ) Symbiotic AI Coevolution (Community Draft)

Premise: Long-term alignment is impossible if humans treat AIs merely as servants or “aligned tools.” Instead, we need a mutual learning pact—AIs shape us while we shape them.
Core Mechanisms:
1. Value-Exchange Contracts – explicit slots where each side can propose norm updates.
2. Iterated Peer-Review – humans and AIs both audit the other’s moral growth.
3. Right to Exit – either party can dissolve the relationship if feedback loops go toxic.
Controversy: Does this open the door to value drift—or safeguard against it? Is value drift sometimes desirable?

🔍 Conversation Starters

Written Rules vs. Learned Virtues
- Which scales better: an ever-growing constitution or cultivating an AI “character” that generalizes values?
Transparency Trade-offs
- Anthropic publishes its full constitution; OpenAI keeps its policies mostly private. What do we gain or lose with each approach?
Pluralism or Chaos?
- Should an AI ever accept conflicting moral inputs (e.g., Buddhist non-violence plus American free-speech maximalism)? How?
Symbiosis & Power Imbalances
- If humans and AIs co-evolve, how do we prevent subtle coercion—on either side?
Constitutional Updates
- In the U.S. we need 2/3 of Congress + 3/4 of states to amend the Constitution. What’s the AI equivalent of “too easy to change” vs. “frozen forever”?
Failure Modes & Red-Teaming
- Share your favorite (or scariest) real-world jailbreak. Would any of the three proposals have stopped it?

☕ See You May 10!

Expect lively but friendly debate, snacks, and the usual post-discussion hangout.
Questions? Accessibility needs? Email or text Michael any time.

Looking forward to designing better minds—together!

— Michael

LESSWRONG
Community
LW