Pr. Geoffrey Hinton has pointed out that pooling-layers remove spatial feature information. But, essentially, does the process that last convolutional layer's features are flattened for FC layer makes the spatial information removed?
1 Answers
Your question text is not very clear but I try to give you what you need. Max-pooling layers give the CNNs spatial invariance ability through the layers due to the fact that they check the existence of something or not. If you stack them through multiple layers, your network will have a rough ability to be spatial invariance but this is not that much. Moreover, professor Hinton has stated somewhere that using pooling layers is a mistake and it's a disaster that they work; I didn't quote the exact words. If you want a network to be spatial invariance you should use spatial transformers which are differentiable modules and can be used in CNNs without any supervision. Take a look at here.
Finally, to answer your question, pooling layers somehow remove spatial information that is why capsnets are introduced.
About the fully connected layers, There are two main points. First the inputs, second, the outputs. In convolutional layers, the inputs of each layer is limited to the region which its is employed and for that input there is a single output. Consequently for a specific region, there is a single output which is responsible for that point. In MLPs, all the inputs go to a single neuron and for all the inputs there is just a single value. I guess this is why they don't keep the spatial information and are just used for classification tasks. Actually, they just try to classify the extracted features by the CNNs.
- 14,308
- 10
- 59
- 98