x

LESSWRONG

LW

Chris Mingard — LessWrong

Chris Mingard

Chris Mingard

Message

31

1

10

5y

Chris Mingard

31

5y

Architecture-aware optimisation: train ImageNet and more without hyperparameters

A deep learning system is composed of lots of interrelated components: architecture, data, loss function and gradients. There is a structure in the way these components interact - however, the most popular optimisers (e.g. Adam and SGD) do not utilise this information. This means there are leftover degrees of freedom...

Apr 22, 2023•6