Why Optimization-Based Alignment Structurally Cannot Handle Irreversible Cost
I. The Medical Triage Problem Imagine deploying an AI system to support medical triage during a pandemic where resources are genuinely insufficient. The system makes recommendations about resource allocation. People live or die based partly on their judgment. Under standard optimization-based alignment (RLHF, Constitutional AI, preference learning), we'd train the...
Feb 171