Concentration around the mean in Cramer random model

Question

This question is related to: Probabilistic prime number theorem

The Cramér random model is a random model ${{\mathcal P}}$ that is a random subset of the natural numbers, suh that each natural number ${n > 2}$ has an independent probability of ${\frac{1}{\log n}}$ of lying in ${{\mathcal P}}$, show that almost surely, the quantity $\frac{1}{x/\log x} |\{n \leq x: n \in {\mathcal P}\}|$ converges to one as ${x \rightarrow \infty}$.

The hint is to use Chebyshev’s inequality to get $S_x = |\{n \leq x: n \in {\mathcal P}\}|$ close to $\sum_{n \leq x} \frac{1}{\log n}$, which one can show in turn to be somewhat close to $x /\log x$.

Question: Let $S_x = |\{n \leq x: n \in {\mathcal P}\}| = 1 + \sum_{2 < n \leq x} 1_{{\mathcal P}}(n)$, where the $1_{{\mathcal P}}(n)$ are independent Bernoulli random variables with parameters $1 / \log n$. By Chebyshev's inequality, we have

$\displaystyle {\bf P}(|\sum_{2 < n \leq x} 1_{{\mathcal P}}(n) - \sum_{2 < n \leq x} \frac{1}{\log n}| \geq \varepsilon) \leq \frac{1}{\varepsilon^2} \sum_{2 < n \leq x} \frac{1}{\log n}(1 - \frac{1}{\log n})$.

Also, it can be shown from integration by parts that

$\displaystyle \sum_{2 < n \leq x} \frac{1}{\log n} = \int_{3}^x \frac{dt}{\log t} + O(1) = \frac{x}{\log x} + \frac{x}{\log^2 x} + O(\frac{x}{\log^3 x}) + O(1)$.

I am not sure how to proceed with these bounds though, any help will be appreciated.

score 1 · Answer 1 · answered Apr 12 '24 at 08:19

1

You don't need to do any very precise calculations, simply note that the variance of any Bernoulli random variable is at most its expectation, and hence the same applies to any sum of independent Bernoulli random variables.

Thus, if $S_n=\sum_{i=1}^n X_i$ with $X_i$ independent Bernoulli, then $$P(|S_n-E(S_n)|\geq\delta E(S_n))\leq \frac{1}{\delta^2E(S_n)}.$$ In your case this is essentially tight: the ratio of $E(S_n):\operatorname{Var}(S_n)$ tends to $1$, so you can't get a meaningfully better bound from Chebyshev.

Unfortunately this isn't good enough: it gives you convergence in probability (since for any fixed $\delta$ the bound tends to $0$) but not almost sure convergence, which is what you were asked for. To get almost sure convergence you want the bound to be summable, for Borel-Cantelli, but $\sum_{n=2}^{\infty}\frac{\log n}{\delta^2n}=\infty$.

Hoeffding's inequality is strong enough to give almost sure convergence.

answered Apr 12 '24 at 08:19

Especially Lime

46,692

It just occurred to me that Chebyshev's inequality should suffice for our purpose since the terms $\displaystyle \frac{1}{\log n}(1 - \frac{1}{\log n})$ are summable by the root test. – shark Apr 16 '24 at 00:33
@shark the root test doesn't tell you anything since $\lim_{n\to\infty}\sqrt[n]{\frac{1}{\log n}(1-\frac{1}{\log n})}=1$. – Especially Lime Apr 16 '24 at 08:04
But they clearly aren't summable by comparison with the harmonic series. – Especially Lime Apr 16 '24 at 08:06
You are right, root test conveys no information due to $0^0=1$. The author did state explicitly that Markov’s inequality will show that the number of Cramer random model primes is comparable to $\sum_{3 \leq n \leq x} 1/\log n$, but will not get concentration around the mean; for that one needs a tool such as Chebyshev’s inequality (or an even stronger concentration inequality). I will consult him regarding how Chebyshev suffices in this case. – shark Apr 17 '24 at 05:29
@shark Chebyshev does give concentration about the mean, i.e. convergence in probability. You only need something stronger if you want almost sure convergence. – Especially Lime Apr 17 '24 at 08:21
Thank you, indeed a stronger concentration of measure than Chebyshyev’s inequality will be needed here, this seems to be resolvable by using the fourth moment. – shark Apr 25 '24 at 05:51

shark · Answer 2 · 2024-04-26T22:09:34.977

Edit: I managed to write up a solution using the moment method which I post below, verifications and suggestions are welcomed.

For all $n > 2$, let $X_n = 1_{{\mathcal P}}(n)$ be independent Bernoulli random variables with parameters $1 / \log n$. Take $x$ to be a natural number, and let $(Y_{n,x})_{n,x > 2: n \leq x}$ be the triangular array given by $Y_{n, x} := \frac{1}{x / \log x} X_n$.

Consider the partial sums $S_x = \frac{\sum_{2 < n \leq x} X_n}{x / \log x} = Y_{3,x} + \dots + Y_{x,x}$. For convenience, we normalise each $X_n$ to have zero mean, by replacing $X_n$ by $X_n - 1 / \log n$, so that $S_x$ also gets replaced by

$\displaystyle S_x - \frac{\sum_{2 < n \leq x} 1 / \log n}{x / \log x} = S_x - \frac{\int_{3}^x dt / \log t + O(1)}{x / \log x} = S_x - (1 + O(1 / \log x))$,

where we use integration by parts for the last inequality. We can then expand

$\displaystyle {\bf E}|S_x|^4 = {\bf E}|Y_{3,x} + \dots + Y_{x,x}|^4 = \sum_{2 < i, j, k, l \leq x} {\bf E}|Y_{i,x}Y_{j,x}Y_{k,l}Y_{l,x}|^4$.

The independence hypothesis leaves only a few quadruples ${(i,j,k,l)}$ for which ${{\bf E} Y_{i,x} Y_{j,x} Y_{k,x} Y_{l,x}}$ could be non-zero: the three cases ${i=j \neq k=l}, {i = k \neq j = l}, {i=l \neq j=k}$ where each of the indices ${i,j,k,l}$ is paired up with exactly one other index; and the diagonal case ${i=j=k=l}$. If for instance ${i=j \neq k=l}$, then

$\displaystyle {\bf E}Y_{i,x} Y_{j,x} Y_{k,x} Y_{l,x} = {\bf E}Y_{i,x}^2Y_{k,x}^2 = ({\bf E}Y_{i,x}^2)({\bf E}Y_{k,x}^2) \leq (\frac{\log x}{x})^2 \times (\frac{\log x}{x})^2$.

Similarly for the cases ${i=k \neq j=l}$ and ${i=l \neq j=k}$, which gives a total contribution of at most ${3x(x-1) (\frac{\log x}{x})^4}$ to ${{\bf E} |S_x|^4}$. Finally, when ${i=j=k=l}$, then ${\bf E}Y_{i,x} Y_{j,x} Y_{k,x} Y_{l,x} \leq (\frac{\log x}{x})^4$, and there are at most ${x}$ contributions of this form to ${{\bf E} |S_x|^4}$. We conclude that

$\displaystyle {\bf E} |S_x|^4 \leq {3x(x-1) (\frac{\log x}{x})^4} + x(\frac{\log x}{x})^4 = \frac{(3x - 2)\log^4 x}{x^3}$

and hence by Markov's inequality,

$\displaystyle {\bf P}(|S_x| > \varepsilon) \leq \frac{(3x - 2)\log^4 x}{\varepsilon^4{x^3}}$ for any $\varepsilon > 0$.

Remove the normalisation, we conclude that

$\displaystyle {\bf P}(|S_x - (1 + O(1 / \log x))| > \varepsilon) \leq \frac{(3x - 2)\log^4 x}{\varepsilon^4{x^3}}$ for any $\varepsilon > 0$.

By the comparison test, the right-hand side is summable in $x$. The Borel-Cantelli lemma thus implies that $S_x$ converges almost surely to one as $x \to \infty$, as desired.

Concentration around the mean in Cramer random model

2 Answers2

Linked