Understanding when and why agents scheme
by Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad, and David Lindner
TL;DR * To understanding the conditions under which LLM agents engage in scheming behavior, we develop a framework that decomposes the decision to scheme into agent factors (model, system prompt, tool access) and environmental factors (stakes, oversight, outcome influence) * We systematically vary these factors in four realistic settings, each...
Mar 2148