x
From 8B to Frontier: How System Prompts Control Whether AI Agents Blackmail, Leak, and Kill — LessWrong