Categorization of approaches to deal with imbalanced classes

Question

What is the best way to categorize the approaches which have been developed to deal with imbalance class problem?

This article categorizes them into:

Preprocessing: includes oversampling, undersampling and hybrid methods,
Cost-sensitive learning: includes direct methods and meta-learning which the latter further divides into thresholding and sampling,
Ensemble techniques: includes cost-sensitive ensembles and data preprocessing in conjunction with ensemble learning.

The second classification:

Data Pre-processing: includes distribution change and weighting the data space. One-class learning is considered as distribution change.
Special-purpose Learning Methods
Prediction Post-processing: includes threshold method and cost-sensitive post-processing
Hybrid Methods:

The third article:

The last classification also considers output adjustment as an independent approach.

Thanks in advance.

score 5 · Answer 1 · edited Sep 24 '18 at 21:41

The way I see it all three categorizations agree in many things. For example, all three have a category for pre-processing steps.

I would tend to mostly agree on the third categorization as its more generic and encompasses more things.

The data-level category includes any pre-processing steps dealing with class imbalance (e.g. over/under sampling).
The algorithm-level could be considered to include the second categories of the first two articles. Any change to the algorithm that deals with class imbalance would go here (e.g. class weighting).
Finally, a hybrid category for combining the two.

The only thing missing from the first two articles are the post-processing steps, which to be honest, aren't used in practice as often as the other.

1 Answers1