Grokking, memorization, and generalization — a discussion — LessWrong