Highest Voted 'dummy-variables' Questions - Data Science Stack Exchange

9

votes

2 answers

In which cases shouldn't we drop the first level of categorical variables?

Beginner in machine learning, I'm looking into the one-hot encoding concept. Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k…

asked Mar 19 '19 at 19:55

Dan Chaltiel

341
2
10

7

votes

3 answers

How to give a higher importance to certain features in a (k-means) clustering model?

I am clustering data with numeric and categorical variables. To process the categorical variables for the cluster model, I create dummy variables. However, I feel like this results in a higher importance for these dummy variables because multiple…

machine-learning clustering feature-scaling dummy-variables

asked Apr 16 '19 at 08:33

Eva

81
1
4

6

votes

3 answers

Obtaining consistent one-hot encoding of train / production data

I'm building an app that will require user input. Currently, on the training set, I run the following code, in which data is a pandas dataframe with a combination of categorical and numerical columns. dummified_data = data.get_dummies() train_data =…

python pandas dummy-variables

asked Jun 18 '19 at 23:26

Andrew Maurer

338
2
9

3

votes

1 answer

Using pandas get_dummies() on real world unseen data

I made a ML model, trained and tested it with my data containing categorical variables. To create dummy variables I used pd.get_dummies() before the split. I now want to use my model on previously unseen data where, of course, I need to re create my…

python pandas preprocessing dummy-variables

asked Mar 12 '19 at 09:33

3nomis

541
7
17

3

votes

3 answers

How to obtain original feature names after using one-hot encoding

This question is on an implementation aspect of scikit-learn's DecisionTreeClassifier(). How do I get the feature names ranked in descending order, from the feature_importances_ returned by the scikit-learn DecisionTreeClassifier()? The problem is…

feature-selection decision-trees one-hot-encoding dummy-variables

asked Apr 29 '18 at 14:22

S Datta

51
6

3

votes

1 answer

Dummy Variable trap in Linear Regression

The dummy variable trap is a common problem with linear regression when dealing with categorical variables, since one hot encoding introduces redundancy, so if we have m categories in our categorical variable we usually drop one dummy variable to…

scikit-learn linear-regression one-hot-encoding dummy-variables

asked Oct 02 '22 at 23:25

AAA

45
6

2

votes

1 answer

What exactly is a dummy trap? Is dropping one dummy feature really a good practice?

So I'm going through a Machine Learning course, and this course explains that to avoid the dummy trap, a common practice is to drop one column. It also explains that since the info on the dropped column can be inferred from the other columns, we…

one-hot-encoding dummy-variables

asked Nov 23 '20 at 04:17

UchuuStranger

95
5

2

votes

1 answer

Dummy variable only for character value in a column (Neglecting float and integers)

My dataset consists of 3000 rows and 50 columns, out of which one column (ESTIMATE_FAMILY_CONTRIBUTION) contains all numerical value(around 2000 different values like 20,30,32....) but got one value as String e.g. 'No_information'. When I create…

logistic-regression dummy-variables

asked Jul 08 '20 at 13:47

SHUBHAM KUMAR

33
4

2

votes

1 answer

How to interpret dummy variable in ML prediction?

I am working on a binary classification problem where I have a mix of continuous and categorical variables. Categorical variables were created by me using get_dummies function in pandas. Now my questions are, 1) I see that there is a parameter…

machine-learning deep-learning data-mining feature-selection dummy-variables

asked Dec 29 '19 at 06:08

The Great

2,725
3
23
49

2

votes

3 answers

How to handle "year" variable for Machine Learning models

I have a "year" variable but I don't know which is the best way to handle it for a ML model, as it is a numerical variable, giving some sequence. Should I treat it as a categorical variable? Thanks in advance,

machine-learning data preprocessing data-science-model dummy-variables

asked Nov 26 '19 at 10:21

Luis

21
1
2

2

votes

2 answers

Prediction after one hot encoding

I have a regression model that I want to make prediction based on values that I will get from an end user. In my dataset, I have one categorical variable region which I one-hot encoded, which generated 53 new columns (54 regions). Now my data has…

dataset regression prediction dummy-variables

asked Jul 24 '19 at 19:33

IngridX

33
1
4

2

votes

1 answer

How to deal with a potencially multiple categorical variable

I'm build a model that has, as inputs, some categorical variables. I had already dealt with this sort of data before, and applied different techniques as creation of dummy variables and factor scoring. However, I have now a different type of problem…

feature-engineering categorical-data aggregation dummy-variables

asked Mar 25 '19 at 09:22

Diogo Santos

191
1
2

1

vote

2 answers

what would be the correct representation of categorical variables like sex?

I have a doubt about what will be the right way to use or represent categorical variables with only two values like "sex". I have checked it up from different sources, but I was not able to find any solid reference. For example, if I have the…

feature-selection dummy-variables

asked Jul 19 '21 at 15:25

Lila

227
2
7

1

vote

1 answer

Use dummy variables to create a rank variable. R

I have a series of multiple response (dummy) variables describing causes for a canceled visits. A visit can have multiple reasons for the cancelation. My goal is to create a single mutually exclusive variable using the dummy variables in a…

r ranking dummy-variables hierarchical-data-format

asked Jul 06 '21 at 15:42

Mar355

37
5

1

vote

1 answer

Should I include all dummy variables or N-1 dummy variables (keep one as reference) in neural networks

I have a categorical variable with N factor levels (e.g. gender has two levels) in classification problem. I have converted it into dummy variables (male and female). I have to use neural network (nnet) to classify. I have two options - Include any…

machine-learning neural-network dummy-variables

asked Nov 05 '20 at 17:47

SiH

125
6

Questions tagged [dummy-variables]