2

My original dataset has a binary dependent variable with 3% of the values being one. First, I split the original dataset into training and testing sets using an 80-20 split. Since it includes both numeric and binary independent variables, I am using SMOTENC (in R) on the training set to create a balanced training dataset. I generate a logistic regression model on this balanced dataset and use the F-Measure as the metric to determine the optimal cutoff.

However, what cutoff should I use on the test dataset?

Since it is unbalanced, using the optimal cutoff found from the balanced training dataset is disastrous.

nwaldo
  • 500
  • 3
  • 13
CraigS
  • 21
  • 2

0 Answers0