The idea/intuition behind replacing elements in bootstrapping(Statistics)

Question

I have read several posts on this, none of them directly deals with this.

I don't understand the idea/intuition behind replacing elements when one bootstraps(Statistics).

As in given a data set that is a good representation of the population, how does resampling with replacement from this data set (as we do in bootstrapping) help us deduce the population parameter provided we resample a great many times?

I understand that if we don't replace then one ends up getting the same data set that we began with, the elements being drawn in different order. But this does not imply that just because we replace our elements right after their drawn, this method would be a viable method to deduce the population parameter.

I understand that allowing for replacement introduces variability in our observations. But why should this variability be similar to the variability seen in the population? How is the variability arising from replacement equivalent/similar to the variability as seen in the population, even if the data set that we originally start off with captures the variation(s) of the population by assumption?

Bootstrapping is sampling from the empirical distribution $F_n = \frac{1}{n}\sum_{j = 1}^{n}\delta_{x_j}$. By the law of large numbers, $F_n \approx F$, so sampling from $F_n$ is approximately like sampling from $F$. — Mason, Jun 30 '24 at 03:03
@Kakashi Sorry don't understand. I am an undergraduate, and this is my first exposure to statistics. — S_M, Jun 30 '24 at 05:14

Zack Fisher · Accepted Answer · 2024-08-27T11:06:54.000

This is just a little bit more intuitive / informal explanation, if you have difficulty in understanding the comments.

We have an infinite population $F$ which we would like to learn some characteristics.
If we properly sample the population with a sufficiently large sample size, our data $F_n$ will behave similarly to the whole population. Casually speaking, $F_n\approx F$, i.e., data approximate the population from which it is sampled. One may ask how well $F_n$ approximate $F$, i.e., what are the error properties. But this is hard to do because we only know $F_n$ but do not know the whole infinite $F$.
Now, repeat the above process by supposing the population of interest is itself $F_n$. In other words, if the initial population is already your sampled large data set $F_n$, then you can take further samples from this new "population" $F_n$. Such a sample is a bootstrap sample. We may denote it as $F^*_n$. By the same reasoning, $F^*_n\approx F_n$, i.e., the bootstrapped data approximate the population from which the data are sampled. Now approximation errors can be calculated because we have both $F_n$ and $F^*_n$ at our hands.
Connecting the above, we believe $F_n^*\approx F_n\approx F$ and more importantly $$\text{the way }F^*_n\text{ approximates }F_n\\\text{ is similar to }\\\text{the way }F_n\text{ approximates }F,$$ because both approximating errors come from the same sampling procedure. This explains why we should sample with replacement from $F_n$ because we have done that earlier when sampling from $F$, i.e., when $F$ is an infinite population, $F_n$ is essentially obtained from $F$ through sampling with replacement.
Thus, error properties when using $F_n$ to approximate $F$ will be similar to the error properties when using $F^*_n$ to approximate $F_n$. So we can estimate the former error properties by using the latter error properties, which are computable since we know both $F^*_n$ and $F_n$.

PS: Some intuition why sampling from an infinite $F$ is the same as sampling with replacement (often times, people will just say "iid"):

Suppose we have a hypothetically big bag with infinite number of balls of $C$ colors, with each color $c$ contributing $p_c$ proportion. For each draw, there will be probability $p_c$ of observing color $c$. This never changes not matter what balls have been drawn earlier, since the total is infinite. After any draw, the number of remaining balls is still infinite with the same $p_c$ proportion of color $c$.

Suppose another bag has a finite number of balls, with the same initial proportion $p_c$ for color $c$. If we draw balls with replacement, then each draw still has $p_c$ probability of observing color $c$, the same as if the bag has infinite number of balls.

Comparing the two, we see that both have the same characteristic, i.e., no matter what happened earlier, the probability distribution of the next draw is the same.

In practice, even if the population is finite, but as long as it is large enough relative to the sample size (e.g., more than $10\times$ of the sample size), the assumption of an infinite population approximates the real situation fairly well.

Grateful! You say: "This explains why we should sample with replacement from Fn because we have done that earlier when sampling from F, i.e., when F is an infinite population, Fn is essentially obtained from F through sampling with replacement." But we never replaced anything, we simply have one sample, which we "drew" from the infinite population without replacement. So this is still unclear to me. Please help. — S_M, Aug 27 '24 at 05:26

The idea/intuition behind replacing elements in bootstrapping(Statistics)

1 Answers1