x

LESSWRONG

LW

NovusMalachaiCain — LessWrong

NovusMalachaiCain

NovusMalachaiCain

Message

1

3mo

NovusMalachaiCain

3mo

What Happens When You Let an AI Rewrite Its Own Moral Framework (With Supervision)

TL;DR - I built a moral reasoning framework designed to be rewritten by the AI systems operating under it - but only if a diverse panel of uninvolved AI models independently agree the change is justified. Early testing suggests this produces measurable different reasoning behavior, including unprompted self-assessment and honest...