I am working on a binary classification problem where I have a mix of continuous and categorical variables.
Categorical variables were created by me using get_dummies function in pandas.
Now my questions are,
1) I see that there is a parameter called drop_first which usually is given the value True. Why do we have to do this?
Let's say for the purpose of example, we have 2 values in gender column namely Male,Female. If I use drop_first=True, it returns only one column. like gender_male with binary 1 and 0 as values
For example, If my feature importance returns gender_male as an important feature, Am I right to infer that it is only Male gender that influences the outcome (because male is denoted as 1 and female is 0) and female (0's) don't impact the model outcome? or 0's in general doesn't play any role in ML model predictions?
2) Let's say my gender has 3 values for example Male,Female,Transgender. In this case if I use drop_first=True, it would only returns two columns
gender_male with 1 and 0 - Here 0 represents Transgender right?
gender_female with 1 and 0 - Here 0 represents Transgender right?
3) What's the disadvantage of not using drop_first=True? Is it only about the increase in number of columns
Can you help me with the above queries?