0

You have an ML model that outputs 100,000 out of a million observations of a dichotomous variable 0 or 1. You want to see whether the distribution of the outputted 100,000 observations of the model is similar to the 1 mil. You apply the model 3 times let's call them A B and C and you get 3 sets of 100,000 observations. You want to test them to have the same distribution as the original 1,000,000. You apply a T test between the three of 100,000 and the 1,000,000 then a Chi Square between the 3 of 100,000 and the 1,000,000 and then a binomial test. The t test and the binomial test agree that B is similar to the whole set and A and C are different but Chi Square says that only C is similar. So, which of the models approximated the 1,000,000 rows best?

IKNv99
  • 41
  • 3

0 Answers0