0

I’m working with an imbalanced dataset to predict strokes, where the positive class (stroke occurrence) is significantly underrepresented. Initially, I used logistic regression, but due to the class imbalance, I switched to a Random Forest model. After applying techniques such as random oversampling and adjusting the classification threshold, I've managed to improve my recall to approximately 61.3%. However, I am still facing a high false positive rate (178 instances) in my confusion matrix, which negatively impacts precision (17.6%). What additional strategies can I explore to further enhance precision while maintaining a good recall?

2 Answers2

2

You can generally try:

  • using a metric that takes the imbalance into account
  • class weights
  • better algorithms (gbdts)
  • tunning the prediction threshold or using continuous prediction directly
Lucas Morin
  • 2,775
  • 5
  • 25
  • 47
-3

I will go with SMOTE if I were you !

Shaggy
  • 1
  • 1