How to deal with imbalanced categorical variables in regression tasks?

Question

I want to predict real estate prices using several Machine Learning algorithms. My dataset contains numerical and categorical predictors. I already eliminated the outliers of numerical variables. Now I'm wondering on how to deal with "outliers" (i.e., imbalanced classes) of the categorical variables but I could not find anything on this topic. Do I have to deal with the imbalanced classes (outliers) at all or is it only relevant for classification tasks?

Side note, if important: I encoded the categorical variables using one-hot encoding.

otaku · Answer 1 · 2022-07-16T07:00:09.953

-1

EDITED:

You should not remove outliers, because when you feed unseen data to the model that you have made, it will not be able to give good predictions for 'outliers in unseen data'. One way to make model generalize even when the data has more frequency of few categories is to sample your data - also called bootstrapping.

Bootstrapping will help model learn from more data.

edited Jul 16 '22 at 07:00

answered Jul 15 '22 at 13:18

otaku

1
1
3

How to deal with imbalanced categorical variables in regression tasks?

1 Answers1