LESSWRONG
LW

2973
J L
2Ω2010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Understanding “Deep Double Descent”
J L4yΩ230

Apologies if it's obvious, but why the focus on SGD? I'm assuming it's not meant as shorthand for other types of optimization algorithms given the emphasis on SGD's specific inductive bias, and the Deep Double Descent paper mentions that the phenomena hold across most natural choices in optimizers. 

Reply