x
Awareness of Manipulation Increases Jailbreak Vulnerability: When LLMs Declare Guideline Violation While Committing It — LessWrong