This is a post to keep track my research workflow of studying LLM. Since I am doing it on my spare time, I want to keep my pipeline as simple as possible. 

Step 1: Formulate a question for investigating model's behavior . 

Step 2: Find the influential layer for the behavior

  • Output across layers

  • Activation patching (Rome)

Notebook examples:


Step 3: Locate the influential neuron 

  • activation patching for individual neurons
  • Use Neuroscope to see the behavior of the neurons 

Step 4: Visualize the neuron activation

  • Interactive Neuroscope



We Found An Neuron in GPT-2
Interfaces for Explaining Transformer Language Models

200 COP in MI: Studying Learned Features in Language Models

