Does robustness improve with scale?
by ChengCheng, niki.h, Ian McKenzie, Oskar Hollinsworth, Tom Tseng, and AdamGleave
Adversarial vulnerabilities have long been an issue in various ML systems. Large language models (LLMs) are no exception, suffering from issues such as jailbreaks: adversarial prompts that bypass model safeguards. At the same time, scale has led to remarkable advances in the capabilities of LLMs, leading us to ask: to...
Jul 25, 202414