My original dataset has a binary dependent variable with 3% of the values being one. First, I split the original dataset into training and testing sets using an 80-20 split. Since it includes both numeric and binary independent variables, I am using SMOTENC (in R) on the training set to create a balanced training dataset. I generate a logistic regression model on this balanced dataset and use the F-Measure as the metric to determine the optimal cutoff.
However, what cutoff should I use on the test dataset?
Since it is unbalanced, using the optimal cutoff found from the balanced training dataset is disastrous.