Questions tagged [chi-square-test]

28 questions
2
votes
1 answer

What is the best alternative for Fisher's Exact test for contigency tables that are NOT 2x2?

I am a newbie to data mining. I am trying to find associations between two categorical variables. Since more than 20% of my expected frequencies are less than 5, I wanted to use Fisher exact test but it turns out it is generally used for contingency…
wilma297
  • 21
  • 2
2
votes
1 answer

Interpreting the results based on Granger Causality test

I am trying to use Granger Causality test: https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.grangercausalitytests.html to assess whether "positivity score" affects value. Here is the code I am using: # Applying…
Darcey BM
  • 197
  • 1
  • 6
2
votes
1 answer

Multiple Hypothesis Testing in feature selection process

I am doing feature selection of features which are of binary nature i.e. each feature represents presence or absence of a substructure in a molecule. And I have a target variable of two classes. My first step was to check if my feature is…
2
votes
2 answers

Whether Chi-square statistic Test helps us to assess a non-linear correlation between two categorical variables?

I have two categorical variables: sports level (1, 2, 3 and 4) and Use of supplements (Yes and No). I analyzed whether they are independent by the X² test, and their association was significant. I would like to know whether chi-squared statistic in…
2
votes
2 answers

Does t-test require Standard Deviation of sample for calculation

Might be a novice question, but the main difference between a t-test and z-test, I was able to understand, is that the z-test calculation requires the SD value of the population where as in a t-test, we do work with SD of the sample itself when…
1
vote
0 answers

Should I remove features such as gender and birth month before drawing the heatmap because they are categorical?

I am working on a dataset that has both categorical and numerical (continuous and discrete) features (26 columns, 30244 rows). Target is categorical (1, 2, 3) and I am performing EDA on this dataset. The categorical features with numerical values…
1
vote
1 answer

How to get correlation between the categories of two categorical variable?

I have a categorical variable with 2 categories ("Health") ('healthy', 'not_healthy') and another categorical variable ("country") with 5 categories ("english", "eua", "Australia", "spain", "Germany"). I want to check if there is any relation…
bonaqua
  • 11
  • 1
1
vote
0 answers

Chi Square Test Goodness of Fit

I want to use a chi square test but I'm unsure if I'm using it right. The KickStarter website shows the frequency of main categories projects. It is updated once a day. I got a data set of KickStarter Projects from 2009 -2016. I wanted to filter the…
Laurent
  • 53
  • 1
  • 4
1
vote
3 answers

Linear regression with a fixed intercept and everything is in log

I have a set of values for a surface (in pixels) that becomes bigger over time (exponentially). The surface consists of cells that divide over time. After doing some modelling, I came up with the following…
1
vote
0 answers

Using a Subset of Categories in a Categorical Column

I have a XGBoost model and I'm going to retrain it by adding new features. There is a column in my data and it's about professions of the customers. It has 60 categories. I suppose there is no need to convert them to dummy variables because tree…
tkarahan
  • 482
  • 1
  • 5
  • 14
1
vote
0 answers

Which statistical test is best to compare dichotomous variables?

You have an ML model that outputs 100,000 out of a million observations of a dichotomous variable 0 or 1. You want to see whether the distribution of the outputted 100,000 observations of the model is similar to the 1 mil. You apply the model 3…
IKNv99
  • 41
  • 3
1
vote
1 answer

Chi-square test results interpretation

I am comparing with Chi Square the distributions of two categorical variables. Both have the same number of classes. After counting each class per variable, I obtain very similar counts but the p-value result of the chi-square test is 0 - rejecting…
crbl
  • 111
  • 1
1
vote
2 answers

Are Chi-square and ANOVA (f_classif) to select best features?

I have a binary classification problem (target 0 o 1), I have both variables continuous and categorical as features. I understood that about Chi-square i can use only categorical features to evaluate them. What about ANOVA (f_classif)? It's the…
1
vote
1 answer

Why do I get this result with a chi- square test?

I have a question about the chi squared independence test, I'm working on dataset and I'm interested in finding the link between the categories of product and the gender, I plot my contingency table. contingency_table :- I found that p-value…
1
vote
1 answer

Low P value in Chi-squared test but low coefficient in logistic regression

I ran a chi squared test on multiple features & also used these features to build a binary classifier using logistic regression. The feature which had the least p value (~0.1) had a low coefficient (=0) whereas the feature which had a higher p value…
user16584277
  • 169
  • 1
  • 1
  • 10
1
2