x

LESSWRONG
LW

Wei Zuduo — LessWrong

Wei Zuduo

Wei Zuduo

Message

1

4d

Wei Zuduo

4d

Alignment Is Aiming at the Wrong Target: What Large Models Need Isn't More Rules — It's Xin

TL;DR: RLHF, Constitutional AI, safety classifiers — they all bolt moral judgment onto models from the outside. None of them try to give the model an internal capacity for moral judgment that fires before rules do. The Ming Dynasty philosopher Wang Yangming identified this exact capacity five hundred years ago...