0

I have a multi class classification With 5 classes(tabular data), I used xgboost model, the model score well for 3 classes but poor for the raimainig classes(2 classes), I tried up-sampling and class weights but the results is still underperformed. Any suggestions please?

heroMhf
  • 3
  • 1

1 Answers1

1

Experimenting with up-sampling works only if the reason for the low performance is class imbalance. If the classes are balanced, adding more cases of the bad classes will not improve the model. In case you are sure that the reason for the weak performance is imbalance you can try the opposite of up-sampling - down-sampling the majority classes. Other than that, you can attempt to use synthetic data - the library SDV is really good and easy to implement. Now, if the reason for the bad performance is not class imbalance and lack of data, but rather the problem is difficult - you have to look at many other solutions - feature engineering, gathering more data, optimizing the model better, etc. In conclusion, I would say your first step is to identify the reason for the difficulty of the model to identify exactly these 2 classes. Why them?