LESSWRONG
LW

Kushal Thaman
40010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping
Kushal Thaman2y41

Thanks for the post! Do you think there is an amount of pretraining you can do such that no fine-tuning (on a completely non-complementary task, away from pre-trained distribution, say) will let you push out of that loss basin? A 'point of no return' s.t. even for very large values of LR and amount of fine-tuning you will get a network that is still LMC?

Reply
43Incidental polysemanticity
Ω
2y
Ω
7