8

What is the best way to categorize the approaches which have been developed to deal with imbalance class problem?

This article categorizes them into:

  1. Preprocessing: includes oversampling, undersampling and hybrid methods,
  2. Cost-sensitive learning: includes direct methods and meta-learning which the latter further divides into thresholding and sampling,
  3. Ensemble techniques: includes cost-sensitive ensembles and data preprocessing in conjunction with ensemble learning.

The second classification:

  1. Data Pre-processing: includes distribution change and weighting the data space. One-class learning is considered as distribution change.
  2. Special-purpose Learning Methods
  3. Prediction Post-processing: includes threshold method and cost-sensitive post-processing
  4. Hybrid Methods:

The third article:

  1. Data-level methods
  2. Algorithm-level methods
  3. Hybrid methods

The last classification also considers output adjustment as an independent approach.

Thanks in advance.

ebrahimi
  • 1,305
  • 7
  • 20
  • 40

1 Answers1

5

The way I see it all three categorizations agree in many things. For example, all three have a category for pre-processing steps.

I would tend to mostly agree on the third categorization as its more generic and encompasses more things.

  • The data-level category includes any pre-processing steps dealing with class imbalance (e.g. over/under sampling).
  • The algorithm-level could be considered to include the second categories of the first two articles. Any change to the algorithm that deals with class imbalance would go here (e.g. class weighting).
  • Finally, a hybrid category for combining the two.

The only thing missing from the first two articles are the post-processing steps, which to be honest, aren't used in practice as often as the other.

Stephen Rauch
  • 1,831
  • 11
  • 23
  • 34
ItsMeMario
  • 151
  • 1
  • 3