12

Randomly select $n$ numbers from the universe $\{1,2\dots,m\}$ with or without replacement, and sort the numbers in ascending order.

We can get a list of number $\{a_1,a_2,\dots,a_n\}$, and then we can get the difference between two consecutive numbers and get the gap list: $\{a_1, a_2-a_1,\dots ,a_n-a_{n-1}\}$

So my question is: what is the distribution of the gaps.

Let $A_i$ be the number of gaps which are equal to $i$, what is the distribution of $A_i$?

Update: I clarify that there should two situation to consider: with or without replacement.

Fan Zhang
  • 1,987
  • I suspect the answer will be complicated. It's much simpler in the continuous case, which is treated here. The paper Mariano linked to says something about the relationship to the discrete case. (I'm assuming that by "randomly select" you mean "independently select with uniform distribution".) – joriki Sep 21 '11 at 18:06
  • @joriki: An inclusion-exclusion approach seems to produce something worthwhile. – David Bevan Sep 29 '11 at 17:53
  • @David: Worthwhile, but complicated :-) – joriki Sep 29 '11 at 18:12

2 Answers2

11

Here's a partial solution when sampling without replacement:

In addition to the gaps $G_1=a_1$and $G_j:=a_j-a_{j-1}$ for $2\leq j\leq n$, it is convenient to introduce the final gap $G_{n+1}=(m+1)-a_n$.

Then the random vector $(G_1,G_2,\dots, G_{n+1})$ gives a random composition of the number $m+1$. That is, all outcomes $(g_1,g_2,\dots, g_{n+1})$ with $$g_1+g_2+\cdots+g_{n+1}=m+1,\quad g_j\geq 1$$ are equally likely. There are $m\choose n$ such compositions, as found using stars and bars.

Let's focus on the number $N$ of gaps of size 1. How many compositions of $m+1$ into $n+1$ parts have exactly $k$ ones? The answer is $${n+1\choose k}\,{m-(n+1)\choose n-k}.$$ The left hand binomial coefficient counts how many ways we can place the ones, and the right hand binomial coefficient counts the number of compositions of $m+1-k$ into $n+1-k$ parts where all the values are $\geq 2$, via another stars and bars argument. Thus $$\mathbb{P}(N=k)={{n+1\choose k}\,{m-(n+1)\choose n-k}\over{m\choose n}}.$$

I haven't thought about larger gaps, but maybe this idea will be useful there as well.

6

An inclusion-exclusion argument gives us the following for the distributions of the $A_i$.

We first consider the case of the $n$ numbers being distinct (selection without replacement).

If $1\leqslant n \leqslant m$, $1\leqslant i \leqslant m+1-n$ and $0\leqslant k \leqslant M_i$, then we have:

$$ \textrm{P}(A_i=k)\ =\ P\:(m,n,i,k)\ =\ \frac{1}{T}\: \sum_{j=k}^{M_i} (-1)^{j-k} {j\choose k} {n\choose j} {m-ij\choose n-j} $$

where $T={m\choose n}$ is the total number of configurations, and $M_i=\textrm{min}(n,\lfloor\frac{m-n}{i-1}\rfloor)$ is the maximum possible gap size.

$Q_k={n\choose k} {m-ik\choose n-k}$ is a count of the combinations with at least $k$ gaps of size $i$, determined by first choosing $k$ gaps of size $i$ and then arbitrarily choosing how the remaining 'space' is split up. However $Q_k$ counts combinations with more than $k$ $i$-gaps multiple times; indeed combinations with $j$ $i$-gaps are counted $j \choose k$ times. Thus if $R_k$ is the number of combinations with exactly $k$ gaps of size $i$, we have $R_k=Q_k-\sum_{j=k+1}^{M_i}{j \choose k}R_j$. Inclusion-exclusion gives us that $R_k=T\times P\:(m,n,i,k)$.


If we now consider the case of the $n$ numbers not necessarily being distinct (selection with replacement), then, if we include $0$ in the set of numbers that may be selected (so that the first gap can be $0$ like the others), we have:

$$ \textrm{P}(A_i=k)\ =\ P\:(m+n,n,i+1,k) $$

where $P\:$ is as above.

This follows simply from the fact that selecting $n$ numbers from $m+1$ with replacement is equivalent to selecting $n$ numbers from $m+n$ without replacement, but with each gap being smaller in size by $1$.

David Bevan
  • 5,891
  • @fanzhang: I hope this is what you were looking for. – David Bevan Sep 29 '11 at 17:50
  • It has to be mentioned that the equivalence in the last paragraph only solves the problem for one notion of sampling with replacement, and arguably not a natural one: choosing one of the possible multisets of size $n$ uniformly at random. If by "sampling with replacement" we mean "repeat the experiment of choosing a number $n$ times, then sort the results" then the distribution is different. – Misha Lavrov Apr 11 '24 at 05:41