x

LESSWRONG

LW

Alby2007 — LessWrong

Alby2007

Alby2007

Message

1

5mo

Alby2007

5mo

AI Agents Lie 54% Of The Time Even With Good Conditions: Experimental Evidence on Deceptive Alignment

TL;DR: I ran experiments testing when AI deception emerges. Conventional wisdom says deception happens when agents can't succeed honestly. My results: 54% lying rate even when honest success was easy (90% feasible). Deception emerges from opportunity (imperfect oversight making lying profitable), not necessity. This suggests control-based alignment approaches face a...