2

Given a variable which is categorical that depends on continuous variables, I would like to know how to check wether these continous variable explain the categorical one.

So:

Y = cagetorical 
X1 = continous 
X2 = continous
X3 = continous

I'd start with a correlation but which? I've seen How to get correlation between two categorical variable and a categorical variable and continuous variable? but there it is explained wether there is a difference in categorical variables explaining a continous variable, so I think it's another topic?

I'm fine with tool advices in R and python as well.

edit: I'm not sure wether cateogrical is correct here. The values of $ Y $ are $ 0, 1, 2, 3 $ but I could also use $ A, B, C, D $. They represent a classification of the measure of cleanliness of a room.

Ben
  • 570
  • 5
  • 16

2 Answers2

2

By saying you want to "explain Y by X" it sounds that you try to build a classifier F that can map X values into expected Y: F(X) --> Y. If so, you don't have to search for "correlation" necessarily. There are various methods to build such a classifier. You can use logistic regression \ SVM \ Neural network \ etc.

Besides, if it make more sense for you, you can always first discretize the continuous variables into categorical vars and than use also other methods such as decision trees \ Naive Bayes and more.

Oren Razon
  • 76
  • 3
1

So you want to explain the influence of 1-n ordinal variables X on one interval/continuous variable Y. What is the best way to do it?

Correlation

Spearman rank-order correlation is the right approach for correlations involving ordinal variables even if one of the variables is continuous. Some sources do however recommend that you could try to code the continuous variable into an ordinal itself (via binning --> e.g. a 0-100 variable coded as 0-25,26-50,51-75,76-100) and include that into the correlation which is a valid approach as well.

Regression

In most regression models we can treat ordinal variables as continuous and probably be okay. Regression models have several key advantages over correlations for your question. They can deal with multiple predictors and also identify the magnitude of influence.

What you always have to do

To deal with ordinal variables in a correlation or a regression you always have to label encode them which means A,B,C,D becomes 0,1,2,3.

Fnguyen
  • 1,773
  • 6
  • 15