0

I have a problem that produce different training score when using pipeline and manual.

MANUAL :

#standardize data    
sc=StandardScaler()
X_train[['age','balance','duration']] = sc.fit_transform(X_train[['age','balance','duration']])
X_test[['age','balance','duration']] = sc.transform(X_test[['age','balance','duration']])

#applying SMOTE X_oversampling , y_oversampling = over_sampling.SMOTE(random_state=42).fit_resample(X_train,y_train)

#modelling model_lr = LogisticRegression() model_lr.fit(X_oversampling, y_oversampling)

#evaluation y_pred = model_lr.predict(X_test) y_pred_train = model_lr.predict(X_oversampling) print(f'Train Accuracy Score : {round(accuracy_score(y_oversampling,y_pred_train),4)}') print(f'Test Accuracy Score : {round(accuracy_score(y_test,y_pred),4)}')

#result Train Accuracy Score : 0.835 Test Accuracy Score : 0.82

WITH PIPELINE :

pipeline_logreg = Pipeline([('sampling', over_sampling.SMOTE(random_state=42)),
                        ('logreg', LogisticRegression())])
pipeline_logreg.fit(X_train,y_train)

**the reason i dont include standard scaler in my pipeline because ive already done it manually from the code above (at #standardize data code)

#evaluation y_pred = pipeline_logreg.predict(X_test) y_pred_train = pipeline_logreg.predict(X_train print(f'Train Accuracy Score : {round(accuracy_score(y_train,y_pred_train),4)}') print(f'Test Accuracy Score : {round(accuracy_score(y_test,y_pred),4)}')

#result Train Accuracy Score : 0.8261 Test Accuracy Score : 0.82

So why the result is different on training accuracy? The test accuracy score was the same.

0 Answers0