1

The book Hands-On Machine Learning has a section on Out-of-Bag Evaluation related to Decision Trees, where it's stated that,

By default a BaggingClassifier samples m training instances with replacement (bootstrap=True), where m is the size of the training set. This means that only about 63% of the training instances are sampled on average for each predictor.

I am curious how they arrived at 63%. Here's what I have so far,

Let $X$ be a random variable representing the fraction of the training set sampled with replacement with size m. Hence, $X$ takes values $\frac{1}{m}, \frac{2}{m}, ..., 1$ Then, the PMF is as follows,

$$ P(X=\frac{i}{m}) = {m\choose i} * \frac{1}{m^m} $$

Then, the average number of training samples with replacement will be,

$$ E[X] = \sum_{i=1}^{m} {m\choose i} * \frac{1}{m^m} * \frac{i}{m} $$

Is this the right way to derive the 63%? I am not fully convinced the PMF is correct because the PMF doesn't seem to sum to 1.

$$ \sum_{i=1}^{m}P(X=\frac{i}{m}) = (\frac{2}{m})^m - \frac{1}{m^m} = \frac{2^m-1}{m^m} \text{(using Binomial Theorem)} $$

1 Answers1

1

Here the author explains:

If you randomly draw one instance from a dataset of size $m$, each instance in the dataset obviously has probability $\frac{1}{m}$ of getting picked, and therefore it has a probability $1 – \frac{1}{m}$ of not getting picked.

If you draw $m$ instances with replacement, all draws are independent and therefore each instance has a probability $(1 – \frac{1}{m})^m$ of not getting picked. Now let's use the fact that $e^x$ is equal to the limit of $(1 + \frac{x}{m})^m$ as $m$ approaches infinity.

So if $m$ is large, the ratio of out-of-bag instances will be about $e^{–1}\approx0.37$. So roughly $63\%$ $(1 – 0.37)$ will be sampled.