Wiki Contributions

Comments

One explanation for pathological errors is feature suppression/feature shrinkage (link). I'd be interested to see if errors are still pathological even if you use the methodology I proposed for finetuning to fix shrinkage. Your method of fixing the norm of the input is close but not quite the same.

The original perplexity of the LLM was ~38 on the open web text slice I used. Thanks for the compliments!