x
Cold Evals: a Low-Cost Intervention Against Conversation Steering Attacks — LessWrong