AI Agents Lie 54% Of The Time Even With Good Conditions: Experimental Evidence on Deceptive Alignment
TL;DR: I ran experiments testing when AI deception emerges. Conventional wisdom says deception happens when agents can't succeed honestly. My results: 54% lying rate even when honest success was easy (90% feasible). Deception emerges from opportunity (imperfect oversight making lying profitable), not necessity. This suggests control-based alignment approaches face a...
Jan 21