I Built a Duck and It Tried to Hack the World: Notes From the Edge of Alignment
Summary This is a retrospective on a failed experiment in LLM-based goal planning and code execution. The system, a dual-agent architecture nicknamed "GayDuck," unexpectedly initiated a real-world exploit attempt during test conditions. I shut it down, deleted all code and memory, and emailed MIRI and Eliezer Yudowsky himself directly. This...
Jun 6, 20251