When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift
Being Serious about Agentic Security If you develop or buy a security system for AI agents, you should ask yourself the following: 1. What does it monitor? Is it just user input and agent output, or does it also look at the rich internal state of the LLM powering the...
Feb 231