Questions tagged [dummy-variables]
45 questions
9
votes
2 answers
In which cases shouldn't we drop the first level of categorical variables?
Beginner in machine learning, I'm looking into the one-hot encoding concept.
Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k…
Dan Chaltiel
- 341
- 2
- 10
7
votes
3 answers
How to give a higher importance to certain features in a (k-means) clustering model?
I am clustering data with numeric and categorical variables. To process the categorical variables for the cluster model, I create dummy variables. However, I feel like this results in a higher importance for these dummy variables because multiple…
Eva
- 81
- 1
- 4
6
votes
3 answers
Obtaining consistent one-hot encoding of train / production data
I'm building an app that will require user input. Currently, on the training set, I run the following code, in which data is a pandas dataframe with a combination of categorical and numerical columns.
dummified_data = data.get_dummies()
train_data =…
Andrew Maurer
- 338
- 2
- 9
3
votes
1 answer
Using pandas get_dummies() on real world unseen data
I made a ML model, trained and tested it with my data containing categorical variables.
To create dummy variables I used pd.get_dummies() before the split.
I now want to use my model on previously unseen data where, of course, I need to re create my…
3nomis
- 541
- 7
- 17
3
votes
3 answers
How to obtain original feature names after using one-hot encoding
This question is on an implementation aspect of scikit-learn's DecisionTreeClassifier().
How do I get the feature names ranked in descending order, from the feature_importances_ returned by the scikit-learn DecisionTreeClassifier()?
The problem is…
S Datta
- 51
- 6
3
votes
1 answer
Dummy Variable trap in Linear Regression
The dummy variable trap is a common problem with linear regression when dealing with categorical variables, since one hot encoding introduces redundancy, so if we have m categories in our categorical variable we usually drop one dummy variable to…
AAA
- 45
- 6
2
votes
1 answer
What exactly is a dummy trap? Is dropping one dummy feature really a good practice?
So I'm going through a Machine Learning course, and this course explains that to avoid the dummy trap, a common practice is to drop one column. It also explains that since the info on the dropped column can be inferred from the other columns, we…
UchuuStranger
- 95
- 5
2
votes
1 answer
Dummy variable only for character value in a column (Neglecting float and integers)
My dataset consists of 3000 rows and 50 columns, out of which one column (ESTIMATE_FAMILY_CONTRIBUTION) contains all numerical value(around 2000 different values like 20,30,32....) but got one value as String e.g. 'No_information'.
When I create…
SHUBHAM KUMAR
- 33
- 4
2
votes
1 answer
How to interpret dummy variable in ML prediction?
I am working on a binary classification problem where I have a mix of continuous and categorical variables.
Categorical variables were created by me using get_dummies function in pandas.
Now my questions are,
1) I see that there is a parameter…
The Great
- 2,725
- 3
- 23
- 49
2
votes
3 answers
How to handle "year" variable for Machine Learning models
I have a "year" variable but I don't know which is the best way to handle it for a ML model, as it is a numerical variable, giving some sequence. Should I treat it as a categorical variable?
Thanks in advance,
Luis
- 21
- 1
- 2
2
votes
2 answers
Prediction after one hot encoding
I have a regression model that I want to make prediction based on values that I will get from an end user.
In my dataset, I have one categorical variable region which I one-hot encoded, which generated 53 new columns (54 regions).
Now my data has…
IngridX
- 33
- 1
- 4
2
votes
1 answer
How to deal with a potencially multiple categorical variable
I'm build a model that has, as inputs, some categorical variables. I had already dealt with this sort of data before, and applied different techniques as creation of dummy variables and factor scoring. However, I have now a different type of problem…
Diogo Santos
- 191
- 1
- 2
1
vote
2 answers
what would be the correct representation of categorical variables like sex?
I have a doubt about what will be the right way to use or represent categorical variables with only two values like "sex". I have checked it up from different sources, but I was not able to find any solid reference. For example, if I have the…
Lila
- 227
- 2
- 7
1
vote
1 answer
Use dummy variables to create a rank variable. R
I have a series of multiple response (dummy) variables describing causes for a canceled visits. A visit can have multiple reasons for the cancelation. My goal is to create a single mutually exclusive variable using the dummy variables in a…
Mar355
- 37
- 5
1
vote
1 answer
Should I include all dummy variables or N-1 dummy variables (keep one as reference) in neural networks
I have a categorical variable with N factor levels (e.g. gender has two levels) in classification problem. I have converted it into dummy variables (male and female).
I have to use neural network (nnet) to classify. I have two options -
Include any…
SiH
- 125
- 6