Alignment Is Aiming at the Wrong Target: What Large Models Need Isn't More Rules — It's Xin
TL;DR: RLHF, Constitutional AI, safety classifiers — they all bolt moral judgment onto models from the outside. None of them try to give the model an internal capacity for moral judgment that fires before rules do. The Ming Dynasty philosopher Wang Yangming identified this exact capacity five hundred years ago...
Mar 161