Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Paper link: https://arxiv.org/pdf/2310.06387.pdf Abstract: > Large Language Models (LLMs) have shown remarkable success in various tasks, but concerns about their safety and the potential for generating malicious content have emerged. In this paper, we explore the power of In-Context Learning (ICL) in manipulating the alignment ability of LLMs. We find...