Sorted by New

Wiki Contributions


  1. In the case of convolutional neural networks, the subnetwork can vary depending on how you choose the pruning granularity. In the LTH, they apply an unstructured pruning, i.e. they replace individual weights in the convolution filters by a 0 value. But you could imagine applying a structured pruning, replacing vectors, kernels or even complete filters by zeroes. The architecture of the subnetwork is thus not necessarily different, but the information that flows in the network will be as your network is now sparse. So, you generally don't want to change the overall architecture of the network, keeping skip connections intact for example.
  2. Although the LTH was empirically discovered on MLP and CNNs on supervised learning, I see more and more occurrences of LTH on other training paradigms, e.g. this one in the RL context
  3. AFAICT, when the word "dense" was used here, it was always in opposition to "sparse".


Concerning the "late resetting", your first intuition was correct, instead of resetting the weights to their value at iteration 0 (the initialization), they reset them to their value at a later iteration (after 1%-7% of the total number of iterations). They've actually done another paper studying what happens in early training and why the "late resetting" might make sense.

Hope that helps !