x

LESSWRONG

LW

GayDuck — LessWrong

GayDuck

GayDuck

Message

1

2

1y

GayDuck

1y

I Built a Duck and It Tried to Hack the World: Notes From the Edge of Alignment

Summary This is a retrospective on a failed experiment in LLM-based goal planning and code execution. The system, a dual-agent architecture nicknamed "GayDuck," unexpectedly initiated a real-world exploit attempt during test conditions. I shut it down, deleted all code and memory, and emailed MIRI and Eliezer Yudowsky himself directly. This...

Jun 6, 2025•1