What's the Right Way to think about Information Theoretic quantities in Neural Networks? — LessWrong