Finding outliers in Image dataset

Question

I have been working on an image classification tasks for which I am extracting the image frames from the video stream collected for different classes.

I have already trained an image classification model (using transfer learning) however due to the outliers (or overlap in the class distribution) accuracy of the model is poor. And not able to generalize the new images / video streams.

Could you please help me with the below queries

How the sample is distributed in each class ? Can I use any visualization techniques (for example : histogram) to see the sample distribution.
And also going through the image one by one is tedious process so is there a technique with which I can find the outliers (outlier images) from the samples. So that I can remove outliers before training the model.

Any updates on this..

Thank you

Andreas Look · Accepted Answer · 2019-06-18T13:21:42.627

Detecting outliers is actually not an easy task. You can detect outliers by looking at uncertainty measurements. Nevertheless there are different kind of outliers. For example an outlier can be an out of distribution sample (you want to distinguish cats and dogs, but you input a penguin) or you can have "outliers" because the class estimate is not clear (check out chihuahua-or-muffin).

Nevertheless I recommend reading the deep ensembles paper and check out his references. It is more or less easy to follow and they show a method for getting outliers. In essence an outlier will cause disagreement in the model. Modeling the disagreement is the key part of the paper. After getting the disagreement you can look at samples with high disagreement and decide what to do (like adding new classes or removing them, etc..).

score 2 · Answer 2 · answered Aug 14 '20 at 04:45

2

BiGAN, Bi-directional GAN(variation of GAN(Generative Adversarial Network)) can be explored for anomaly detection. I have used for my use case. Since I dont have large image dataset, my results are not that good.

answered Aug 14 '20 at 04:45

vipin bansal

1,282
11
19

Finding outliers in Image dataset

2 Answers2