In contrast, without manual optimization, a huge problem is that the neural networks are learning features limited in the training set (like the position of the digit) the do not apply to the testing set.
This makes regular DNNs really prone to position shifting. The CNN model, in my understanding, is really more like a work around, which is manually telling the network to separate two kinds of features from the raw pixels--the actual features and their locations--as feature maps, preventing the network being confused by the features shifting between different location.
However, this also means CNNs make the assumption that all the features are at the same size... (read more)
You are right.
In contrast, without manual optimization, a huge problem is that the neural networks are learning features limited in the training set (like the position of the digit) the do not apply to the testing set.
This makes regular DNNs really prone to position shifting. The CNN model, in my understanding, is really more like a work around, which is manually telling the network to separate two kinds of features from the raw pixels--the actual features and their locations--as feature maps, preventing the network being confused by the features shifting between different location.
However, this also means CNNs make the assumption that all the features are at the same size... (read more)