x
Why does gradient descent always work on neural networks? — LessWrong