(You can sign up here if you haven't already.) This is the first of my analyses of the deception chess games. The introduction will describe the setup of the game, and the conclusion will sum up what happened in general terms; the rest of the post will mostly be chess...
This is a link-post for the residual stream viewer, which can be found here. It's an online tool whose goal is to make it easier to do interpretability research by letting you easily look at directions within the residual stream. It's still in a quite early/unpolished state, so there may...
(Edit: This post is out of date, and understates ChatGPT's chess abilities. gpt-3.5-turbo-instruct, when prompted correctly, plays consistently legal moves at around the 1800-2000 Elo level. Thanks to commenter GoteNoSente for pointing this out and to ClevCode Ltd for writing a wrapper. See this related GitHub by ClevCode.) There are...
tl;dr: This post starts with a mystery about positional embeddings in GPT2-small, and from there explains how they relate to previous-token heads, i.e. attention heads whose role is to attend to the previous token. I tried to make the post relatively accessible even if you're not already very familiar with...
In the context of transformer models, the "positional embedding matrix" is the thing that encodes the meaning of positions within a prompt. For example, given the prompt: Hello my name is Adam the prompt would generally be broken down into tokens as follows: ['<|endoftext|>', 'Hello', ' my', ' name', '...
TL;DR: There are anomalous tokens for GPT3.5 and GPT4 which are difficult or impossible for the model to repeat; try playing around with SmartyHeaderCode, APolynomial, or davidjl. There are also plenty which can be repeated but are difficult for the model to spell out, like edTextBox or legalArgumentException. A couple...