x

LESSWRONG

LW

Mun Meo — LessWrong

Mun Meo

Mun Meo

Message

1

4mo

Mun Meo

4mo

When Alignment Succeeds by Compressing Humans: On Predictability, Reference Drift, and Epistemic Blindness in AI Governance

Summary Most alignment research asks: How do we align AI systems to human values? This post raises a prior question: What if the process of alignment itself is actively reshaping the human reference it aims to align to? I argue that contemporary alignment—when coupled with algorithmic governance and large-scale optimization—risks...

If All Human Behavior Becomes Predictable Under Alignment Pressure, What Exactly Is AI Being Aligned To?

Summary This post questions a hidden assumption in current alignment and evaluation work: that reducing human behavioral unpredictability is either neutral or desirable. I argue that if alignment regimes (algorithmic governance, reward shaping, large-scale monitoring) compress human agency to the point where behavior is statistically predictable, then “human values” themselves...