3

My understanding of the bootstrap is that it gives us a method to understand the distribution of an estimator applied to a dataset.

I've read statements of the form "bootstrapping relies on the closeness of the empirical CDF for a sample of size $n$ to the true CDF".

But I wanted to understand the implication of using the bootstrap in a simple case.

Suppose I have a dataset of $N$ Bernoulli trials, with $n$ successes. I want to have an understanding of my uncertainty in $p$, the probability of success.

My understanding is that the bayesian approach to this would give us a pdf for $p$ of $$ P(p|N, n) = \frac{p^{n+ \alpha -1} (1-p)^{N-n+\beta-1}}{B(n+ \alpha, N-n+\beta)} $$ Where $\alpha$ and $\beta$ define the prior.

Naively, I guessed that maybe using the bootstrap on this type of data might give the same answer as a Haldane prior, of $\alpha=0, \ \beta=0$, since if $n=N$, both this and the bootstrap would require that $p=1$.

But when I wrote down what the bootstrap would predict for probabilities of $k$ successes, I get $$ P(p=k/N) = {N \choose k} (n/N)^{k} (1 - n/N)^{N-k} $$

This seems to take a totally different form than the Bayesian answer.

How should I understand both of these approaches? Am I totally misunderstanding the interpretation of bootstrapping in this case? Is there some Bayesian prior secretly implied by the use of the bootstrap in this example?

Steve
  • 83

2 Answers2

1

How should I understand both of these approaches?

Bayesian approach gives you pdf for p. Bootstrap procedure gives you the bootstrap distribution of the sample you already have.

Am I totally misunderstanding the interpretation of bootstrapping in this case?

Your derivation of boostrap distribution is exactly right. The problem is with your assumption that bootstrap and bayesian approach would converge on the same answer - they won't. Bootstrap does not provide any additional information, that's not possible. The only thing it tells you is how tight your estimate is. As you corretly stated: boostrap has specified coverage rate only when there is enough samples that emipirical CDF approximates original CDF well enough.

Bayesian approach gives correct coverage rate only if priors are correct.

Is there some Bayesian prior secretly implied by the use of the bootstrap in this example?

No.

0

Using the posterior distribution you've estimated using the bayesian approach, you can estimate a Credible Interval. The equation for the bootstrap which you wrote doesn't make sense to me though. Typically the bootstrap is used to estimate a Confidence Interval. This is done by resampling, rather than by computing in closed form.

dmh
  • 3,082
  • Isn't what I wrote for the bootstrap just the expression for sampling with replacement?

    Any confidence interval for $p$ that you'd get should be derivable through that, right?

    – Steve Oct 15 '23 at 18:38
  • I see, yes that's correct. In any case the two expressions are not comparable since the first is a pdf whose values are not probabilities. – dmh Oct 15 '23 at 19:02
  • But suppose I see the data, and I use the bootstrapped distribution as my posterior for the prediction of $k$ successes out of $N$.

    Does this imply something about the implicit prior of the bootstrap?

    – Steve Oct 15 '23 at 20:40