AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

Juliana Eberschlag; Mario Gibney

2 AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

1 min read

2

Description

Can AI agents misbehave while carrying out actions autonomously? At this event, Giles Edkins will guide us through a look at and critique some research by Anthropic that demonstrates blackmail and other phenomena when an agent is threatened with shutdown or reprogramming.

Event Schedule
6:00 to 6:30 - Food & Networking
6:30 to 7:30 - Main Presentation & Questions
7:30 to 8:00 - Discussion

If you can't make it in person, feel free to join the live stream at 6:30 pm, via this link.

LESSWRONG
LW

LESSWRONG
LW

2

AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

2

2