x
When Safety Filters Abandon Users: Semantic Ambiguity as an Alignment Failure Abstract — LessWrong