1

I am working on very imbalanced dataset, I used SMOTEENN (SMOTE+ENN) to rebalance it, the following test is made using Random Forest Classifier :

My train and Test score before using SMOTEENN:

print('Train Score: ', rf_clf.score(x_train, y_train))
print('Test Score: ', rf_clf.score(x_test, y_test))
Train Score: 0.92
Test Score: 0.91

After using SMOTEEN :

print('Train Score: ', rf_clf.score(x_train, y_train))
print('Test Score: ', rf_clf.score(x_test, y_test))
Train Score: 0.49
Test Score: 0.85

Edit

x_train,x_test,y_train,y_test=train_test_split(feats,targ,test_size=0.3,random_state=47)

scaler = MinMaxScaler() scaler_x_train = scaler.fit_transform(x_train) scaler_x_test = scaler.transform(x_test) X = scaler_x_train y = y_train.values

from imblearn.over_sampling import SMOTE from imblearn.under_sampling import EditedNearestNeighbours from imblearn.combine import SMOTEENN

oversample = SMOTEENN(random_state=101,smote=SMOTE(),enn=EditedNearestNeighbours(sampling_strategy='majority')) start = time.time() X, y = oversample.fit_resample(X, y) stop = time.time() print(f"Training time: {stop - start}s")

rf_model = RandomForestClassifier(n_estimators=200, class_weight='balanced', criterion='entropy', random_state= 0, verbose= 1, max_depth=2) rf_mod = OneVsRestClassifier(rf_model) rf_mod.fit(X, y)

Mimi
  • 65
  • 8

1 Answers1

0

You are probably not applying the same resampling technique to test dataset. If you put the logic into a imbalanced-learn's Pipeline, the appropriate resampling will be automatically handled for you.

Brian Spiering
  • 23,131
  • 2
  • 29
  • 113