When Alignment Succeeds by Compressing Humans: On Predictability, Reference Drift, and Epistemic Blindness in AI Governance
Summary Most alignment research asks: How do we align AI systems to human values? This post raises a prior question: What if the process of alignment itself is actively reshaping the human reference it aims to align to? I argue that contemporary alignment—when coupled with algorithmic governance and large-scale optimization—risks...
Jan 31