Before launching an A/B test, what are the methods to ensure that the population split in control and target group is random for a particular label say, purchase rate.
3 Answers
Just split them 50/50 (or another percentage, depending on what you want). Once you want to test a specific metric, make sure that if you can only generate that for a portion of the A group because of some specific property (ie you need information C for this metric) to compare it only to the people in group B that also have information C to avoid bias.
Example: I have 200,000 customers. I split them 50/50. I'm testing my new recommendation method against a random advertisement. However my recommender only works if they have searched in the last month. After getting back the results, I will have to check that against the control group filtered to have only people that have searched in the past month. People that have searched in the past month are much more likely to have a stronger kind of connection to you, so if you would check against the whole group you would overestimate the performance of your recommender.
Alternatively you could already split up the groups knowing whether they have information C or not, but you might want to test different things so splitting before hand can give you extra info, at the small cost that your groups might be slightly unequal, but for a big sample size this is not an issue at all.
- 9,448
- 37
- 52
In some cases it is useful to do stratified sampling. For example, imagine you have 4 different groups in your population each with differing n (e.g., A = 10k, B = 20k, C = 30k D = 40k). In creating your train/test you may want to ensure that each subset maintains the same proportions of subgroup. This can ensure better equivalence of train/test than pure random assignment (which could result in lopsided train/test).
- 1,216
- 8
- 19
You might want to provide some more information here. This is either very straightforward or incredibly difficult, depending on your needs.
@Brandon Loudermilk is dead on for recommending stratification, especially for something like purchase rate, prior product purchased, etc. This will get you a long ways, but there are a few places this is heavily dependent, and a lot of this boils down to experimental design:
- What are you hoping to test
- What defines success? A 10% improvement? A 0.001% improvement?
- How many observations do you expect in each group?
- Are you assigning users to groups on the fly, or are you designing this prior?
If you're doing this on the fly, for instance, if your tool can split over a given variable/quartile of that variable, it should place users into A or B depending on the strata they belong to. That would cause every using coming in to have a 50/50 chance of landing in the right bucket - you should never end up with only high spenders in one bucket.
That being said - if you can design this before the test, you should be able to ensure this split fairly easily by doing this prior to the test. However, you might want to look up blocking/experimental design if you have a complicated experimental setup. As a final note, remember that you cannot test for randomness - only test for signs against randomness.
- 96
- 1