LESSWRONGTags
LW

Transformer Circuits

EditHistorySubscribe

Help improve this page

EditHistorySubscribe

Help improve this page

Transformer Circuits

Contributors

Posts tagged Transformer Circuits

3

33Finding Neurons in a Haystack: Case Studies with Sparse Probing

wesg, Neel Nanda

1y

5

2

112Interpreting OpenAI's Whisper

8mo

10

2

104200 Concrete Open Problems in Mechanistic Interpretability: Introduction

1y

0

2

68Finding Sparse Linear Connections between Features in LLMs

Logan Riggs, Sam Mitchell, Eccentricity

5mo

5

2

50How to Think About Activation Patching

1y

5

2

44Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah, Vlad Mikulik

10mo

3

2

36Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy

9mo

1

2

34200 COP in MI: Exploring Polysemanticity and Superposition

1y

6

2

33200 COP in MI: Interpreting Algorithmic Problems

1y

2

2

30A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)

2y

15

2

20A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2

1y

0

2

16200 COP in MI: Looking for Circuits in the Wild

1y

5

2

16Understanding the tensor product formulation in Transformer Circuits

2y

2

2

16200 COP in MI: Analysing Training Dynamics

1y

0

2

13200 COP in MI: Techniques, Tooling and Automation

1y

0