x

LESSWRONG
LW

Kushal Thaman

Posts

Sorted by New

43Incidental polysemanticity

Ω

2y

Ω

7

Wikitag Contributions

Comments

Sorted by

Newest

Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping

Kushal Thaman2y41

Thanks for the post! Do you think there is an amount of pretraining you can do such that no fine-tuning (on a completely non-complementary task, away from pre-trained distribution, say) will let you push out of that loss basin? A 'point of no return' s.t. even for very large values of LR and amount of fine-tuning you will get a network that is still LMC?

Reply

43Incidental polysemanticity

Ω

2y

Ω

7