Imbalanced class in my dataset

Question

I’m working with an imbalanced dataset to predict strokes, where the positive class (stroke occurrence) is significantly underrepresented. Initially, I used logistic regression, but due to the class imbalance, I switched to a Random Forest model. After applying techniques such as random oversampling and adjusting the classification threshold, I've managed to improve my recall to approximately 61.3%. However, I am still facing a high false positive rate (178 instances) in my confusion matrix, which negatively impacts precision (17.6%). What additional strategies can I explore to further enhance precision while maintaining a good recall?

Lucas Morin · Answer 1 · 2024-10-29T15:39:40.323

2

You can generally try:

using a metric that takes the imbalance into account
class weights
better algorithms (gbdts)
tunning the prediction threshold or using continuous prediction directly

edited Oct 29 '24 at 15:39

answered Oct 29 '24 at 15:08

Lucas Morin

2,775
5
25
47

score -3 · Answer 2 · answered Oct 29 '24 at 12:54

-3

I will go with SMOTE if I were you !

answered Oct 29 '24 at 12:54

Shaggy

1
1

Imbalanced class in my dataset

2 Answers2