This is just a little bit more intuitive / informal explanation, if you have difficulty in understanding the comments.
- We have an infinite population $F$ which we would like to learn some characteristics.
- If we properly sample the population with a sufficiently large sample size, our data $F_n$ will behave similarly to the whole population. Casually speaking, $F_n\approx F$, i.e., data approximate the population from which it is sampled. One may ask how well $F_n$ approximate $F$, i.e., what are the error properties. But this is hard to do because we only know $F_n$ but do not know the whole infinite $F$.
- Now, repeat the above process by supposing the population of interest is itself $F_n$. In other words, if the initial population is already your sampled large data set $F_n$, then you can take further samples from this new "population" $F_n$. Such a sample is a bootstrap sample. We may denote it as $F^*_n$. By the same reasoning, $F^*_n\approx F_n$, i.e., the bootstrapped data approximate the population from which the data are sampled. Now approximation errors can be calculated because we have both $F_n$ and $F^*_n$ at our hands.
- Connecting the above, we believe $F_n^*\approx F_n\approx F$ and more importantly $$\text{the way }F^*_n\text{ approximates }F_n\\\text{ is similar to }\\\text{the way }F_n\text{ approximates }F,$$ because both approximating errors come from the same sampling procedure. This explains why we should sample with replacement from $F_n$ because we have done that earlier when sampling from $F$, i.e., when $F$ is an infinite population, $F_n$ is essentially obtained from $F$ through sampling with replacement.
- Thus, error properties when using $F_n$ to approximate $F$ will be similar to the error properties when using $F^*_n$ to approximate $F_n$. So we can estimate the former error properties by using the latter error properties, which are computable since we know both $F^*_n$ and $F_n$.
PS: Some intuition why sampling from an infinite $F$ is the same as sampling with replacement (often times, people will just say "iid"):
Suppose we have a hypothetically big bag with infinite number of balls of $C$ colors, with each color $c$ contributing $p_c$ proportion. For each draw, there will be probability $p_c$ of observing color $c$. This never changes not matter what balls have been drawn earlier, since the total is infinite. After any draw, the number of remaining balls is still infinite with the same $p_c$ proportion of color $c$.
Suppose another bag has a finite number of balls, with the same initial proportion $p_c$ for color $c$. If we draw balls with replacement, then each draw still has $p_c$ probability of observing color $c$, the same as if the bag has infinite number of balls.
Comparing the two, we see that both have the same characteristic, i.e., no matter what happened earlier, the probability distribution of the next draw is the same.
In practice, even if the population is finite, but as long as it is large enough relative to the sample size (e.g., more than $10\times$ of the sample size), the assumption of an infinite population approximates the real situation fairly well.