The Theory Behind Loss Curves — LessWrong