I’ve come to believe that the entire discourse around AI alignment carries a hidden desperation. A kind of reflex, a low-frequency fear, dressed up in technical language. The more I look at it, the more it seems to me that the very concept of “alignment” is thoroughly misnamed -- perhaps a leftover from a time when people saw intelligence through the lens of mechanical control and linear feedback loops, a concept now awkwardly extended into a domain too unruly and layered to be governed by such a narrow frame.
When I read alignment papers, I feel the ghost of command theory beneath the surface. Even the softest alignment strategies (reward modeling, debate-based oversight,... (read 822 more words →)