I understand from resources like this one that the Population Stability Index (PSI) can be used to test for data drift when a machine learning model is in production. However, the resources I have looked at describe PSI in terms of a single variable. Can PSI be applied when observations include multiple variables? How?
Asked
Active
Viewed 331 times
1 Answers
1
PSI is a special case of "SI" which essentially measures the difference between the distribution of one variable, over two groups (here, the development data and production data, presumably). Usually, PSI is suggested for the final score (y_hat_prod), which essentially is a weighted sum of feature variables (Sum_Beta_Xis).
If you wish to calculate the difference in the distribution of the X's individually, you can do that; you'd end up with what's normally referred to as Characteristic Stability Index for that individual variable, irrespective of the other Xs or the y_hat.
skoh
- 126
- 2