I’m working with an imbalanced dataset to predict strokes, where the positive class (stroke occurrence) is significantly underrepresented. Initially, I used logistic regression, but due to the class imbalance, I switched to a Random Forest model. After applying techniques such as random oversampling and adjusting the classification threshold, I've managed to improve my recall to approximately 61.3%. However, I am still facing a high false positive rate (178 instances) in my confusion matrix, which negatively impacts precision (17.6%). What additional strategies can I explore to further enhance precision while maintaining a good recall?
Asked
Active
Viewed 45 times
2 Answers
2
You can generally try:
- using a metric that takes the imbalance into account
- class weights
- better algorithms (gbdts)
- tunning the prediction threshold or using continuous prediction directly
Lucas Morin
- 2,775
- 5
- 25
- 47