0

Though a similar question is answered here , but I wanted to take a different approach. Assuming that I have a binary target variable 1/0 and a categorical variable Gender M/F. From this, I can have a proportion p1 for M with target 1 and p2 for F with target 1. N1 and N2 would be total number of M and F respectively. Is it prudent to run a test for H0: p1=p2 ? If H0 cannot be rejected, doesn't it indicate that Gender doesn't have any correlation with Target

Arindam
  • 101
  • 1

1 Answers1

1

Reading from your comments, it appears you can use more that one predictor for your target variable.

If you want to understand whether Gender (M/F) has a significant association, you should run a logistic regression using it together with other predictors. This would let you control for the impact of Gender all things equal, i.e. after you controlled for the effects of other variables. Logistic regression will return significant scores and standard errors for each variables. Alternatively, you could look at relative importance scores from tree-based models such as Random Forest or XGBoost. (However, Logistic regression is better for hypothesis testing.)

BUT: If you still want to stick with bivariate statistics, i.e. using just one predictor at a time, you can use simpler tools such as T-test or one-way ANOVA to understand whether belonging to one gender or the other makes a difference (statistically speaking).

Hope this helps, otherwise let me know.

Leevo
  • 6,445
  • 3
  • 18
  • 52