LESSWRONG
LW

Sean Osier

Posts

Sorted by New

13LLMs Universally Learn a Feature Representing Token Frequency / Rarity

1y

5

34Mathematical Circuits in Neural Networks

3y

4

Wikitag Contributions

Comments

Sorted by

Newest

LLMs Universally Learn a Feature Representing Token Frequency / Rarity

Sean Osier1y10

Very true. If a token truly never appears in the training data, it wouldn't be trained / learned at all. Or similarly, if it's only seen like once or twice it ends up "undertrained" and the token frequency feature doesn't perform as well on it. The two least frequent tokens in the nanoGPT model are an excellent example of this. They appear like only once or twice and as a result don't get properly learned, and as a result end up being big outliers.

Reply

Mathematical Circuits in Neural Networks

Sean Osier3y20

Thanks for the comment! I didn't get around to testing that, but that's one of the exact things I had in mind for my "Next Steps" #3 on training regimens that more reliably produce optimal, interpretable models.

Reply

Mathematical Circuits in Neural Networks

Sean Osier3y20

Interesting, I'll definitely look into that! Sounds quite related.

Reply

13LLMs Universally Learn a Feature Representing Token Frequency / Rarity

1y

5

34Mathematical Circuits in Neural Networks

3y

4