LESSWRONG
Community
LW

AI
Event

2

AI Safety Thursdays: Agentic Misalignment: How LLMs could be insider threats

by Juliana Eberschlag, mariogibney
1 min read
0

2

Thursday 24th July at 10:00 pm GMT
Toronto, ON, Canada
Register

Posted on: 30th Jun 2025

Subscribe to group

2

New Comment
Moderation Log
Curated and popular this week
0Comments

Description

Can AI agents misbehave while carrying out actions autonomously? At this event, Giles Edkins will guide us through a look at and critique some research by Anthropic that demonstrates blackmail and other phenomena when an agent is threatened with shutdown or reprogramming.

​​​Event Schedule
6:00 to 6:45 - Food & Networking
6:45 to 8:00 - Main Presentation & Questions
8:00 9:00 - Discussion

Trajectory Labs (Toronto AI Safety)