Anthropic was right…but “right” isn’t provable without formal specification.
The entire AI safety industry has spent billions guessing what alignment means such as training classifiers, red-teaming models, writing policies, and straight up hoping behavior holds under pressure. What they haven’t done is formalize their alignment approach. Alignment is treated as a probabilistic prediction problem rather than a structural guarantee....
Mar 51