x

LESSWRONG

LW

Angira Sharma

Angira Sharma

Message

57

2y

Angira Sharma

57

2y

Angira Sharma — LessWrong

Secret Collusion: Will We Know When to Unplug AI?

by schroederdewitt, srm, MikhailB, Lewis Hammond, chansmi, and Angira Sharma

TL;DR: We introduce the first comprehensive theoretical framework for understanding and mitigating secret collusion among advanced AI agents, along with CASE, a novel model evaluation framework. CASE assesses the cryptographic and steganographic capabilities of agents, while exploring the emergence of secret collusion in real-world-like multi-agent settings. Whereas current AI models...

Sep 16, 2024•66