You are aiming to assess the impact of newly available variables—categorical and continuous—on a generalized linear model (GLM) already trained to predict claims frequency.
The comment suggests retraining the model by combining new and existing features, excluding highly correlated variables, and possibly engaging in feature engineering. This approach is sound. Rather than testing each variable in isolation, retraining allows you to evaluate their joint contribution in the context of your existing predictors.
Also, you're not measuring the marginal predictive power of a new variable on its own, but rather its conditional usefulness—that is, the incremental improvement it offers given the variables already in the model. Even highly informative variables may not improve predictive accuracy if their information is already captured by other covariates (Kuhn & Johnson, 2013). This concept is especially relevant in correlated data settings where new predictors offer redundant information.
Alternative strategies include residual modelling and model offsets. Residual modelling can be valuable when the original model must be preserved for operational or interpretative reasons. By modelling the residuals of the base GLM using the new features, you can capture additional signal not explained by the initial model. Offsets—where the linear predictor from the base model is incorporated as a fixed term in the new model—are commonly used when prior risk or exposure must be retained. This is particularly relevant in actuarial contexts, where models often account for prior exposure through an offset term such as $\log(\text{exposure})$.
For evaluation, retrain the augmented model using robust validation—ideally via repeated cross-validation or bootstrapping—and assess improvements in predictive performance using appropriate methods for count data, such deviance, Mean squared error (MSE), Scaled deviance, Calibration plots.
Metrics like AUC or Brier score are less appropriate unless you're reframing the problem as binary classification (e.g, whether any claim occurs). In such cases, transforming count predictions into binary labels may be justified, though this simplifies the underlying outcome distribution.
Summing up, retraining the GLM with all available features (appropriately filtered and engineered) is a statistically sound method. Ensure careful performance assessment and be mindful of multicollinearity. If predictive gains are minimal or inconsistent, it may indicate that the new variables are conditionally redundant.