4

I have reaction time as a dependent variable and age as an independent variable. I want to do a linear mixed model analysis. My data is not normally distributed. Should I have to transform data? I tried multiple lambda values but my data is not getting transformed. However, the Q_Q plot has improved. Is it OK to only go by q_q plot instead of shapiro-wilk? I have a large dataset.

1 Answers1

7

There is no requirement for the data to be normally distributed in linear mixed models (and with regression models more generally).

Depending on your research question(s), you might like the conditional distribution of the response - that is, the errors - to be plausibly normally distributed in order for certain statistical inferences to be valid, but the distribution of the raw data certainly does not need to be normal. For the errors, we do not actually observe those. Instead we inspect the residuals, which can be considered an estimate of the errors.

I would highly recommend reading the following posts and answers over at Cross Validated for further details and explanations:

Where does the misconception that Y must be normally distributed come from?

What if residuals are normally distributed, but y is not?

How does linear regression use the normal distribution?

https://stats.stackexchange.com/questions/578454/understanding-the-assumptions-of-linear-regression-residuals-or-data-must-come

Does Linear regression needs target variable to be normally distributed. (GLM context)?

Normality Assumptions of the Linear Model

Why we prefer normal distribution of data in linear regression

Robert Long
  • 3,518
  • 12
  • 30