7

Can CNNs predict well if they are trained on canonical-like images but tested on a version of images that are little bit shifted?

I tried it using mnist dataset and found the contrary. The accuracy of the test set that was shifted was very low as compared to MLPs.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
Boris
  • 463
  • 1
  • 11
  • 18

3 Answers3

9

If you use max-pooling layers, they may be insensetive to small shifts but not that much. If you want your network to be able to be invariant to transformations, such as translations and shifts or other types of customary transformations, you have two solutions, at least as far as I know:

  • Increasing the size of data-set
  • Using spatial transformers

Take a look at What is the state-of-the art ANN architecture for MNIST and Why do convolutional neural networks work.

Thanks to one of our friends, another way is to use transfer learning after data-augmentation.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
3

Convolution is shift-equivariant except for border effects. Fully connected layers aren't.

Pooling (without subsampling/stride) can be seen as a kind of smoothing, and its output is often the same for many neighboring positions. Subsampling this (applying a stride) results in an operation that is fairly invariant to small shifts.

Global pooling is fully shift invariant, again except for border effects.

As for rotations: standard architectures aren't inherently robust to those.

isarandi
  • 141
  • 3
2

Try MS COCO dataset: it is VERY diverse, and try training the network for detection/segmentation. The best-performing networks like Mask R-CNN produce about 44% mAP on test data or 68% at 0.5 IoU. It handles all challenges, including rotation fairly well, but is quite hard to train.

Alex
  • 787
  • 6
  • 17