I have a logistic mixed model (lme4 package in R). I want to assess whether participants scores on the measures 'sumspq', 'sumpdi', and 'sumcaps' significantly affect the difference in performance between 2 conditions.
I first run the model:
performance ~ Condition*(sumspq+sumpdi+sumcaps)+ (1|participant)
Results show none of the interactions are significant (all ps >.05)
I check variance inflation factors and confirm that there is no multicollinearity. To double check, no separate simple regressions with the each predictor produces a significant result (e.g., performance ~ Condition*sumspq + 1|participant)
I also want to add covariates to the model to see if these influence the results (an exploratory analysis with limited hypotheses). Covariates are: Age, IQ, sumsens1, sumsens2
An automated stepwise procedure for mixed modelling (buildmer package) is used to find the optimal model with covariates included. In this procedure, the interactions of sumpdi, sumspq, sumcaps are forced to be kept in the model as the variables of interest. The model produced is:
performance ~ Condition*(sumspq+sumpdi+sumcaps+sumsens1)+ Age + IQ + (1|participant)
and has significant effects of Condition*sumsens1 (p=.007), age (p=.02), IQ (p=.04).
THE PROBLEM:
My goal is to ensure that I have selected the optimal model to report. I have noticed that the p-values of the covariates greatly vary based on the combination of covariates included in the model. This is despite there being no evidence of multicollinearity. How do I select the optimal model in terms of confidence in the stability of the estimates/p-values? I have tried using train/test cross-validation (caret package in R; 0.62 accuracy), but I realise this demonstrates the predictive power of the model (generalising to a new dataset) rather than finding the optimal (even out of poor-performing) models.