Oh shoot, yea. I'm probably just looking at the rotary embeddings, then. Forgot about that, thanks
I'm pretty confused; this doesn't seem to happen for any other models, and I can't think of a great explanation.
Has anyone investigated this further?
Here are graphs I made for GPT2, Mistral 7B, and Pythia 14M.
3 dimensions indeed explain almost all of the information in GPT's positional embeddings, whereas Mistral 7B and Pythia 14M both seem to make use of all the dimensions.
Is all the money gone by now? I'd be very happy to take a bet if not.
tldr: I’m a little confused about what Anthropic is aiming for as an alignment target, and I think it would be helpful if they publicly clarified this and/or considered it more internally.
Mostly, I want to avoid a scenario where Anthropic does the default thing without considering tough, high-level strategy questions until the last minute. I also think it would be nice to do concrete empirical research now which lines up well with what we should expect to see later.
Thanks for reading!