Using SMOTE Train Model and Optimal Cutoff on Unbalanced Test Data

Asked May 12 '24 at 13:07

Active May 12 '24 at 15:34

Viewed 126 times

My original dataset has a binary dependent variable with 3% of the values being one. First, I split the original dataset into training and testing sets using an 80-20 split. Since it includes both numeric and binary independent variables, I am using SMOTENC (in R) on the training set to create a balanced training dataset. I generate a logistic regression model on this balanced dataset and use the F-Measure as the metric to determine the optimal cutoff.

However, what cutoff should I use on the test dataset?

Since it is unbalanced, using the optimal cutoff found from the balanced training dataset is disastrous.

edited May 12 '24 at 15:34

nwaldo

asked May 12 '24 at 13:07

CraigS

Using SMOTE Train Model and Optimal Cutoff on Unbalanced Test Data

0 Answers0