LESSWRONG
LW

Interpretability (ML & AI)AI
Frontpage

6

Labelling, Variables, and In-Context Learning in Llama2

by Joshua Penman
3rd Aug 2024
1 min read
0

6

This is a linkpost for https://colab.research.google.com/gist/JoshuaSP/0b26dab14c618d0325faf2236ebe8825/variables-and-in-context-learning-in-llama2.ipynb#scrollTo=bFn7VFxFLwda
Interpretability (ML & AI)AI
Frontpage

6

New Comment
Moderation Log
More from Joshua Penman
View more
Curated and popular this week
0Comments

Hi LessWrong! This is my first LessWrong post sharing my first piece of mechanistic interpretability work.

I studied in-context learning in Llama2. The idea was to look at when we associate two concepts in the LLM's context — an object (e.g. "red square"), and a label (e.g. "Bob"), how is that information transmitted through the model?

I found several interesting things. In this toy example, I found that:

  • information about the association is passed by reference not by value — in other words, what is passed is a pointer to "this information is here", and then later that information is loaded
  • the reference position is not a token position but rather information about the semantic location (i.e. "the third item in the list") of the information in question. I suspect later heads actually load a "cloud" of data around the location, and I suspect that this is mediated by punctuation or other markers of structure. (also related to why frontier models are so sensitive to prompt structure)