Description
How do large language models actually work on the inside? Annie presents on new research from Anthropic's Transformer Circuits team that opens up the "black box" of Claude 3.5 Haiku, revealing the computational mechanisms behind everything from multi-step reasoning to poetry planning.
Using a new methodology called attribution graphs, we'll explore how models handle multiple languages, exhibit concerning behaviors like jailbreaks, and sometimes engage in unfaithful reasoning.
Event Schedule
6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation
8:00 to 9:00 - Breakout Discussions
Posted on: