6

I have read that the SMOTE package is implemented for binary classification. In the case of n classes, it creates additional examples for the smallest class. Can I balance all the classes by running the algorithm n-1 times?

sophros
  • 209
  • 2
  • 12
atos
  • 81
  • 1
  • 1
  • 5

4 Answers4

1


I am pretty sure that the SMOTE package in python can also be used for multi-class as well.
Just you can check its implementation for iris dataset-

http://contrib.scikit-learn.org/imbalanced-learn/stable/auto_examples/plot_ratio_usage.html
-Please correct me if I am wrong.

Ethan
  • 1,657
  • 9
  • 25
  • 39
kanav anand
  • 141
  • 5
1

Yes that is what SMOTE does, even if you do manually also you get the same result or if you run an algorithm to do that.

There are couple of other techniques which can be used for balancing multiclass feature. Attaching those 2 links for your reference.

Link 1

Link 2

Link 3 is having implementation of couple of oversampling techniques:

Link 3

ROSE also can be used for Oversampling.

Hope my answer is helpful. Do let me know of you have any additional questions.

Toros91
  • 2,392
  • 3
  • 16
  • 32
0

Not necessarily.

There is an option named sampling_strategy which accepts a dictionary that covers the classes and their desired oversampled values.

For example, let's assume we have 3 classes in the target column and we want to have 1K sample for each classes instead of relatively low volume data points for some of classes. The implementation is below:

d = {'A': 1000, 'B': 1000, 'C': 1000}
sm = SMOTE(sampling_strategy=d)
X_train, y_train = sm.fit_resample(X_train, y_train)
0

The SMOTE implementation provided by imbalanced-learn, in python, can also be used for multi-class problems.

Check out the following plots available in the docs:

enter image description here

enter image description here

Also, the following snippet:

from imblearn.over_sampling import SMOTE, ADASYN
X_resampled, y_resampled = SMOTE().fit_resample(X, y)
print(sorted(Counter(y_resampled).items()))
[(0, 4674), (1, 4674), (2, 4674)]

source: https://imbalanced-learn.readthedocs.io/en/stable/over_sampling.html#from-random-over-sampling-to-smote-and-adasyn

onofricamila
  • 101
  • 2