1

Assume that I am building a churn prediction model, and I collect observational data of customers who registered in the last 12-18 months. Assume that 50% of customers churned. Customers who are predicted to churn are receiving more favorable treatment by the business in an attempt to reduce overall churn %. 18 months later, I analyze both the predictions and reality -- whether customers that were predicted to churn churned and vice versa. The model predicted 50% of the cohort of customers who registered during a particular time interval would churn, whereas in reality, 10% churned. It may be the case that the model is not drifting, but rather that the treatment is having an effect, thus lowering the churn %.

This difference between the prediction and reality could be caused by the following:

  1. The treatment which is positively affecting customer behavior is causing those customers who were likely to churn to not. In this case, there is no evidence of concept drift, but rather the change in distribution is caused directly by the model.

  2. The treatment is not affecting customer behavior but rather some confounding variable that wasn't accounted for is (e.g., the quality of service offered by competitors has diminished and is causing fewer (of our) customers to churn). In this case, I include the confounder to my data set, and rebuild the model on this new assumption. In this case, there is evidence of concept drift.

How can one distinguish between both in an attempt to attribute one to the reason the distribution has changed?

jaiyeko
  • 111
  • 5

1 Answers1

0

If you can wait and have the resources you can set up an A/B test where a group of customers predicted to churn are not given the treatment.

Another reasonable first step would be to perform some kind of error analysis. This could be manual or semi-automated where you select a sample of false positives from the past 18 months and list the favorable treatments they did or did not receive and try to spot any patterns.

(Assuming that your model predicts some kind of score or probability, you can also show a calibration plot for customers that did and did not receive a favorable treatment. If the probability calibration is unchanged for the customers that didn't receive the treatment that could hint at option 1.)

You would also want to check if your performance has degraded over time which would hint at option 2. (Assuming that the treatment has been implemented right away alongside the model, you should see an immediate drop in performance under option 1.)

Alternatively, if you can quantify or categorize the treatment they received, you can build some simple model like a logistic regression that includes your predicted churn outcome from 18 months ago + the treatment variables (+ other possible confounding variables) to predict the outcome and check their significance.

oW_
  • 6,502
  • 4
  • 29
  • 47