LESSWRONG
Community
LW

928
AI
Event

1

UntitledOC ACXLW Meetup #93 – “Designing Better Minds: Constitutional & Symbiotic AI” Draft

by Michael Michalchik
LWSSCEAMIRIx
3 min read
0

1

Saturday 10th May at 9:00 pm to Sunday 11th May at 12:00 am GMT
1970 Port Laurent Pl, Newport Beach, CA 92660, USA
michaelmichalchik@gmail.com
The host has requested RSVPs for this event

Posted on: 8th May 2025

Subscribe to group

1

New Comment
Moderation Log
More from Michael Michalchik
View more
Curated and popular this week
0Comments
Orange County ACX LW

OC ACXLW Meetup #93 – “Designing Better Minds: Constitutional & Symbiotic AI”
 Saturday, May 10 | 2:00 – 5:00 PM
 1970 Port Laurent Pl., Newport Beach, CA 92660
 Host: Michael Michalchik • michaelmichalchik@gmail.com • (949) 375-2045

 


 

👋 Welcome!

Large-language models are getting very smart—smart enough that we’re now asking a new question: How do we give an AI a durable “constitution” so that it behaves well even as it keeps learning?
 On May 10 we’ll explore three complementary proposals:

  1. Anthropic’s Claude Constitution – the first widely publicized “rules-of-the-road” baked directly into a commercial model.

     
  2. Eleanor Watson’s Constitutional Superego – an attempt to fuse virtue ethics, therapy concepts, and AI alignment.

     
  3. “Symbiotic AI Coevolution” – a community-driven framework (drafted by our own group member) arguing that humans and AIs should co-shape each other’s values over time.

     

No need to read every word—skim, watch a video, or just come with questions. All perspectives welcome!

 


 

📚 Suggested Materials

Reading

Link

Companion Video

Anthropic: “Claude’s Constitution”https://www.anthropic.com/news/claudes-constitution6-min explainer – https://youtu.be/quyqRIHRa60?si=_N87uSPaH9ItlTJD
Eleanor Watson, “Constitutional Superego” (Abstract + Intro only)https://docs.google.com/document/d/14TjXihR0BN1wW4YKpDa2yeI5nEMmvMO1/edit?usp=sharing13-min talk – https://youtu.be/mreT3QQQOfg?si=zfU-zr--NLyS19Jt
“My Proposal for Symbiotic AI Coevolution” (community draft)https://docs.google.com/document/d/1jRM5NYfuE7kMxa4aP7PUUVp8vZ8fTWTBlkdSLiJi-w8/edit(no video)

 


 

📝 Quick Summaries

1 ) Anthropic’s Claude Constitution

  • Goal: Replace endless RLHF “patches” with a single, transparent set of higher-level principles—think AI civil-rights law plus Asimov’s robot ethics.

     
  • Sources: U.S. Constitution, UN Human-Rights Charter, Apple’s HIG, Buddhist precepts, and… the lab’s own internal red-team.

     
  • Method: At training time the model is shown pairs of possible answers; it learns to prefer the one that best aligns with the constitution.

     
  • Result: Fewer jailbreaks and a paper-trail explaining why Claude refuses or rewrites a prompt.

     

2 ) Watson’s “Constitutional Superego” (Intro)

  • Key Idea: A truly aligned AI needs a Superego layer—an internal critic that draws on virtue ethics (Aristotle), psychological safety, and multi-stakeholder oversight.

     
  • Practical Twist: The Superego isn’t static; it can be retrained by pluralistic input, but only under carefully audited procedures (a kind of AI court).

     
  • Claim: Such a system scales better than a massive rulebook, yet avoids the opacity of pure RLHF.

     

3 ) Symbiotic AI Coevolution (Community Draft)

  • Premise: Long-term alignment is impossible if humans treat AIs merely as servants or “aligned tools.” Instead, we need a mutual learning pact—AIs shape us while we shape them.

     
  • Core Mechanisms:

     
    1. Value-Exchange Contracts – explicit slots where each side can propose norm updates.

       
    2. Iterated Peer-Review – humans and AIs both audit the other’s moral growth.

       
    3. Right to Exit – either party can dissolve the relationship if feedback loops go toxic.

       
  • Controversy: Does this open the door to value drift—or safeguard against it? Is value drift sometimes desirable?

     

 


 

🔍 Conversation Starters

  1. Written Rules vs. Learned Virtues

     
    • Which scales better: an ever-growing constitution or cultivating an AI “character” that generalizes values?

       
  2. Transparency Trade-offs

     
    • Anthropic publishes its full constitution; OpenAI keeps its policies mostly private. What do we gain or lose with each approach?

       
  3. Pluralism or Chaos?

     
    • Should an AI ever accept conflicting moral inputs (e.g., Buddhist non-violence plus American free-speech maximalism)? How?

       
  4. Symbiosis & Power Imbalances

     
    • If humans and AIs co-evolve, how do we prevent subtle coercion—on either side?

       
  5. Constitutional Updates

     
    • In the U.S. we need 2/3 of Congress + 3/4 of states to amend the Constitution. What’s the AI equivalent of “too easy to change” vs. “frozen forever”?

       
  6. Failure Modes & Red-Teaming

     
    • Share your favorite (or scariest) real-world jailbreak. Would any of the three proposals have stopped it?

       

 


 

☕ See You May 10!

Expect lively but friendly debate, snacks, and the usual post-discussion hangout.
 Questions? Accessibility needs? Email or text Michael any time.

Looking forward to designing better minds—together!

— Michael