schroederdewitt

Secret Collusion: Will We Know When to Unplug AI?

TL;DR: We introduce the first comprehensive theoretical framework for understanding and mitigating secret collusion among advanced AI agents, along with CASE, a novel model evaluation framework. CASE assesses the cryptographic and steganographic capabilities of agents, while exploring the emergence of secret collusion in real-world-like multi-agent settings. Whereas current AI models...

Sep 16, 202466

schroederdewitt

schroederdewitt

Secret Collusion: Will We Know When to Unplug AI?

Take SCIFs, it’s dangerous to go alone

Detecting collusion through multi-agent interpretability

Detecting collusion through multi-agent interpretability

schroederdewitt

Detecting collusion through multi-agent interpretability

Detecting collusion through multi-agent interpretability

Secret Collusion: Will We Know When to Unplug AI?

Take SCIFs, it’s dangerous to go alone

Secret Collusion: Will We Know When to Unplug AI?

Take SCIFs, it’s dangerous to go alone

Detecting collusion through multi-agent interpretability

Detecting collusion through multi-agent interpretability