x
Why Do Naive SFT Filters For Safety Properties Fail? — LessWrong