x
SafeAgent: Automated Red Teaming Finds "Safety Gaps" in GPT-4o-mini — LessWrong