The book Hands-On Machine Learning has a section on Out-of-Bag Evaluation related to Decision Trees, where it's stated that,
By default a BaggingClassifier samples m training instances with replacement (bootstrap=True), where m is the size of the training set. This means that only about 63% of the training instances are sampled on average for each predictor.
I am curious how they arrived at 63%. Here's what I have so far,
Let $X$ be a random variable representing the fraction of the training set sampled with replacement with size m. Hence, $X$ takes values $\frac{1}{m}, \frac{2}{m}, ..., 1$ Then, the PMF is as follows,
$$ P(X=\frac{i}{m}) = {m\choose i} * \frac{1}{m^m} $$
Then, the average number of training samples with replacement will be,
$$ E[X] = \sum_{i=1}^{m} {m\choose i} * \frac{1}{m^m} * \frac{i}{m} $$
Is this the right way to derive the 63%? I am not fully convinced the PMF is correct because the PMF doesn't seem to sum to 1.
$$ \sum_{i=1}^{m}P(X=\frac{i}{m}) = (\frac{2}{m})^m - \frac{1}{m^m} = \frac{2^m-1}{m^m} \text{(using Binomial Theorem)} $$