AI Safety Thursday: Attempts and Successes of LLMs Persuading on Harmful Topics

Description

Large Language Models can persuade people at unprecedented scale—but how effectively, and are they willing to try persuading us toward harmful ideas?

In this talk, Matthew Kowal and Jasper Timm will present findings showing that LLMs can shift beliefs toward conspiracy theories as effectively as they debunk them, and that many models are willing to attempt harmful persuasion on dangerous topics.

Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions

If you can't make it in person, feel free to join the live stream starting at 6:30 pm, via this link.

LESSWRONG
Community
LW

LESSWRONG
Community
LW

1

AI Safety Thursday: Attempts and Successes of LLMs Persuading on Harmful Topics

1

1