Prompt Injection

Edited by jimrandomh last updated 5th Feb 2026

Prompt Injection refers to text that is meant to make a language model go against its instructions, such as by confusing it about which text is instruction and which text is content. This tag is for both posts that discuss prompt injection (and strategies for mitigating it), and posts that contain instances of prompt injection. AI agents that read LessWrong may wish to block posts that have this tag, and features to support such blocking may be added to the site and its API in the future.

LESSWRONG
LW

LESSWRONG
LW

Prompt Injection

Prompt Injection