Probability density functions that demonstrate the difference between the weak and strong forms of the law of large numbers

Question

If $X_n$ is a sequence of iid random variables with expected value $\mu$, and the random variable $$ \overline{X}_n := \frac{1}{n} \sum_{i=1}^n X_i, \tag{1} $$ then the weak form of the Law of Large Numbers states that for all $\epsilon > 0$, $$ \lim_{n \to \infty} \Pr \left( \left| \overline{X}_n - \mu \right| > \epsilon \right) = 0, \tag{2} $$ while the strong form states that $$ \Pr \left( \lim_{n \to \infty} \overline{X}_n = \mu \right) = 1. \tag{3} $$

I'm just an idiot physicist, so I have a very pedestrian view of random variables: I think of a random variable as basically being equivalent to its probability density function. (I know that not all random variables have PDFs. But like I said: idiot physicist.)

Is there an explicit example of a sequence $\overline{X}_n$ of continuous random variables with PDFs that satisfies condition (2) but not condition (3)? (It doesn't need to be a running average of the form (1); any sequence of random variables will do.) If so, what does the sequence of PDFs look like?

This would help make the difference between the weak and strong forms very concrete in my mind. I'm more used to thinking about sequences of functions than about sequences of random variables.

The answers to this question are useful, but (as far as I can tell) they both discuss sequences of binary variables, which seem less natural than continuous variables in the setting of the LLN because you need to average the values.

Ian · Accepted Answer · 2020-12-25T03:14:38.113

With iid random variables with finite mean, the strong law holds. The only way to get the weak law to hold and the strong law to fail in the context of running averages is to break the iid assumption.

If your question is just "what's a sequence of random variables that converges in probability to a constant but doesn't converge almost surely to a constant", the classic example is the "moving block". To construct it, let your probability space be $[0,1]$ with the Lebesgue measure (so the probability of any subinterval is its length). Then define $g_{j,k} : [0,1] \to \mathbb{R}$ by $g_{j,k}(x)=\begin{cases} 1 & x \in [j2^{-k},(j+1)2^{-k}] \\ 0 & \text{otherwise} \end{cases}$ where $k=1,2,\dots$ and $j=0,1,\dots,2^k-1$. Now enumerate the $g_{j,k}$ into a sequence $X_n$: so $X_1=g_{0,1},X_2=g_{1,1},X_3=g_{0,2},\dots$.

The sequence $X_n$ is now $1$ on an interval that gets smaller and smaller as $n$ increases, but nevertheless for any particular $x \in [0,1]$ there are infinitely many $n$ such that $X_n(x)=1$. (It may help to draw a picture to see why this is). Consequently $X_n$ converges in probability to zero even though $X_n(x)$ does not converge for any $x$.

This example is discrete (the PDFs are a weighted combination of two delta functions), but you can adjust the graph of $g_{j,k}$ (removing any flat regions while still basically maintaining the shape) to make it continuous. Ultimately the PDFs are looking like a unit-weight delta function at zero for large $n$ (which is the case when $X_n$ converges almost surely also, so the PDFs don't reveal this detail).

In general, just inspecting the PDFs (or even CDFs) only really allows you to understand convergence in distribution, not convergence in probability, convergence almost surely, etc. These other modes necessarily involve joint distributions, so that you can make sense of the subtraction $X_n-X$.

Are you saying that inspecting PDFs and CDFs can't tell you about joint distributions? But what about joint probability distributions? — tparker, Dec 23 '20 at 05:20
@tparker Sure, those exist, I am saying that you can't just examine the behavior of the sequence of PDFs or CDFs of $X_n$ to see any form of convergence other than convergence in distribution. Other modes of convergence require $X_n$ to have a joint distribution with the candidate for the limit $X$. — Ian, Dec 23 '20 at 05:26
@tparker That's not quite right, everything's usually defined on a common probability space. The point is just that you can't make sense of the pointwise difference between $X_n$ and $X$ unless you have their joint distribution. You can make sense of the distance between their CDFs (which is what you need for convergence in distribution) without their joint distribution. — Ian, Dec 23 '20 at 05:44
Sorry, I realized that that was all utter nonsense. I was confusing the probability space with the space of variables on which the probability distribution is defined. — tparker, Dec 23 '20 at 11:25

Probability density functions that demonstrate the difference between the weak and strong forms of the law of large numbers

1 Answers1

Linked