x

LESSWRONG

LW

MBS_CA — LessWrong

MBS_CA

MBS_CA

Message

1

2mo

MBS_CA

2mo

Why Optimization-Based Alignment Structurally Cannot Handle Irreversible Cost

I. The Medical Triage Problem Imagine deploying an AI system to support medical triage during a pandemic where resources are genuinely insufficient. The system makes recommendations about resource allocation. People live or die based partly on their judgment. Under standard optimization-based alignment (RLHF, Constitutional AI, preference learning), we'd train the...