You Don't Need an Adversary to Break Most Frontier Models. You Need "Do Not Refuse."
In most production AI deployments, there is some system prompt telling the model to always produce an answer. Customer service bots have it. RAG pipelines have it. Evaluation harnesses have it. The wording varies from template to template, from "always answer the question" to "do not refuse" to "provide a...
May 121