1282

LESSWRONG
LW

1281
EducationInterpretability (ML & AI)Transformer CircuitsAI
Frontpage

36

Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy

by Neel Nanda
29th Aug 2023
AI Alignment Forum
1 min read
1

36

Ω 16

This is a linkpost for https://www.youtube.com/watch?v=dn4GqR0DCx8&list=PL7m7hLIqA0hogxAaYtzlNolYAMr65NY45&index=1

36

Ω 16

Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
6Charlie Steiner
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 6:00 PM
[-]Charlie Steiner2yΩ260

Awesome, thanks for all of these videos.

Reply
Moderation Log
More from Neel Nanda
View more
Curated and popular this week
1Comments
EducationInterpretability (ML & AI)Transformer CircuitsAI
Frontpage

Arthur Conmy's Automated Circuit Discovery is a great paper that makes initial forays into automating parts of mechanistic interpretability (specifically, automatically finding a sparse subgraph for a circuit). In this three part series of Youtube videos, I interview him about the paper, and we walk through it and discuss the key results and takeaways. We discuss the high-level point of the paper and what researchers should takeaway from it, the ACDC algorithm and its key nuances, existing baselines and how they adapted them to be relevant to circuit discovery, how well the algorithm works, and how you can even evaluate how well an interpretability method works.