Beyond Adversarial Robustness - Rethinking Sociopolitical Safety in AI Systems

georgia_berg; Mario Gibney

2 Beyond Adversarial Robustness - Rethinking Sociopolitical Safety in AI Systems

LWEA

1 min read

2

Adversarial robustness remains a key concern in AI safety, with many interventions focusing on mitigating models’ capabilities to assist in harmful or criminal tasks. But how do LLMs behave in sociopolitical contexts, especially when faced with ambiguity?

Punya Syon Pandey will discuss research on accidental vulnerabilities induced by fine-tuning, and introduce new methods to measure sociopolitical robustness, highlighting broader implications for safe societal integration.

Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions

LESSWRONG
LW

LESSWRONG
LW

2

Beyond Adversarial Robustness - Rethinking Sociopolitical Safety in AI Systems

2

2