GD’s Implicit Bias on Separable Data — LessWrong