If in a dataset we have missing values in both categorical and continuous variables, how can I deal with them by replacing with mode for the categorical variable and mean for the continuous variable?
Asked
Active
Viewed 358 times
0
-
How is this data stored? Can you provide any sample data? – sorak Mar 06 '18 at 18:19
2 Answers
0
When the missing data are missing at random, you could impute the missing values using multiple imputation.
For more information about multiple imputation, I would recommend the book Applied Missing Data by C.K. Enders (2010). It also has a great companion website.
For multiple imputation in R you could use the mice package. Here is the link to the package on CRAN, the link to the documentation, and the link to the article in the Journal of Statistical Software.
There are other packages for multiple imputation.
L. Bakker
- 147
- 1
- 13
0
You can try to use either fillna() or interpolate()
For more details about these two please refer my answer to this question in StackOverflow. link is: Missing values in Time Series in python
Yogesh Awdhut Gadade
- 2,498
- 24
- 19