Penalize Model Complexity Via Self-Distillation — LessWrong