AI systems often optimize for what they can measure—not what actually matters. The result is tools that feel intelligent but produce results misaligned with user goals.
A common case is engagement-based optimization. Recommendation engines, chatbots, and search systems increasingly use feedback loops based on attention: clicks, watch time, or “positive sentiment.” But maximizing engagement doesn't guarantee the user achieved what they intended. In fact, it can subtly undermine their agency.
I think of this as a kind of impact misalignment: the system is functionally optimizing for a metric that diverges from the user's real-world objective.
This probably overlaps with ideas like Goodhart's Law and reward hacking, but I haven’t seen it framed specifically in terms of human outcomes vs. machine proxies. If this has been formalized elsewhere, I'd appreciate any references.
I'm working on a broader framework for designing AI systems that respect operator intent more directly, but before diving into that, I want to check if this framing holds water. Is “impact misalignment” already a known pattern under another name?