LESSWRONG
is fundraising!
Tags
LW
$

Transformer Circuits

EditHistorySubscribe
Discussion (1)
Help improve this page
EditHistorySubscribe
Discussion (1)
Help improve this page
Transformer Circuits
Random Tag
Contributors
Posts tagged Transformer Circuits
3
33Finding Neurons in a Haystack: Case Studies with Sparse Probing
Ω
wesg, Neel Nanda
2y
Ω
6
2
134An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Ω
Neel Nanda
6mo
Ω
16
2
114Interpreting OpenAI's Whisper
EllenaR
1y
13
2
106200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Ω
Neel Nanda
2y
Ω
0
2
69Finding Sparse Linear Connections between Features in LLMs
Ω
Logan Riggs, Sam Mitchell, Adam Kaufman
1y
Ω
5
2
50How to Think About Activation Patching
Ω
Neel Nanda
2y
Ω
5
2
44Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Ω
Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah, Vlad Mikulik
1y
Ω
3
2
36Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
Ω
Neel Nanda
1y
Ω
1
2
34200 COP in MI: Exploring Polysemanticity and Superposition
Ω
Neel Nanda
2y
Ω
6
2
33200 COP in MI: Interpreting Algorithmic Problems
Ω
Neel Nanda
2y
Ω
2
2
30A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)
Ω
Neel Nanda
2y
Ω
15
2
20A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2
Ω
Neel Nanda
2y
Ω
0
2
16200 COP in MI: Analysing Training Dynamics
Ω
Neel Nanda
2y
Ω
0
2
16200 COP in MI: Looking for Circuits in the Wild
Ω
Neel Nanda
2y
Ω
5
2
16Understanding the tensor product formulation in Transformer Circuits
Tom Lieberum
3y
2
Load More (15/37)
Add Posts