DSLT 2. Why Neural Networks obey Occam's Razor — LessWrong