For $k\in1...n$, each $Y_k$ is sampled uniformly and independently from $X$ with replacement. Define a multinomial distribution with 3 mutually exclusive events whose probabilities sum to 1:
- $Y_k<X_j$, which occurs with probability $\frac{j-1}{n}$
- $Y_k=X_j$, which occurs with probability $\frac{1}{n}$
- $Y_k>X_j$, which occurs with probability $\frac{n-j}{n}$
To illustrate, consider $X=\{1,2,3,4,5\}$. Using $j=3$, a sample resulting in $Y=\{1,5,5,3,5\}$ corresponds to the following multinomial random vector, $z$:
\begin{array}{c|c|c}
Y_k<X_{\{j\}} & Y_k=X_{\{j\}} & Y_k>X_{\{j\}}\\\hline
z_1=1 & z_2=1 & z_3=3
\end{array}
The probability of occurrence of $z=[1,1,3]$ is
$$\frac{n!}{z_1!z_2!z_3!}\bigg(\frac{j-1}{n}\bigg)^{z_1}\bigg(\frac{1}{n}\bigg)^{z_2}\bigg(\frac{n-j}{n}\bigg)^{z_3}$$
$$=\frac{5!}{1!1!3!}\bigg(\frac{3-1}{5}\bigg)^1\bigg(\frac{1}{5}\bigg)^1\bigg(\frac{5-3}{5}\bigg)^3=0.1024$$
Let $Z$ denote the set of multinomial vectors resulting from possible realizations of $Y$ that meet the condition $Y_{\{i\}}=X_{\{j\}}$. The count of $Y_k<X_{\{j\}}$ (or $z_1$) in $Z$ can take the values $0...i-1$, and the count of $Y_k=X_{\{j\}}$ (or $z_2$) in $Z$ can take the values $(i-z_1)...(n-z_1)$, resulting in $|Z|=i\cdot(n-i+1)$.
The probability that the $j^{\text{th}}$ smallest element of $X$ is also the $i^{\text{th}}$ smallest element of $Y$ is
$$P(Y_{\{i\}}=X_{\{j\}})=\sum_{l=0}^{i\cdot(n-i+1)}P(Z_l)$$
Illustrating with $n=5$, $i=2$, and $j=3$ (showing counts of each event):
\begin{array}{c|c|c|c|c}
l & Y_k<X_{\{j\}} & Y_k=X_{\{j\}} & Y_k>X_{\{j\}} & P(Z_l)\\\hline
1 & 0 & 2 & 3 & 2.56\cdot10^{-2} \\\hline
2 & 0 & 3 & 2 & 1.28\cdot10^{-2} \\\hline
3 & 0 & 4 & 1 & 3.20\cdot10^{-3} \\\hline
4 & 0 & 5 & 0 & 3.20\cdot10^{-4} \\\hline
5 & 1 & 1 & 3 & 1.02\cdot10^{-1} \\\hline
6 & 1 & 2 & 2 & 7.68\cdot10^{-2} \\\hline
7 & 1 & 3 & 1 & 2.56\cdot10^{-2} \\\hline
8 & 1 & 4 & 0 & 3.20\cdot10^{-3}
\end{array}
As an R function:
pnij <- function(n, i, j) {
if (j == 1) {
eq <- i:n
gt <- (n - i):0
sum(exp(gt*log(n - j) - lgamma(eq + 1) - lgamma(gt + 1)))*exp(lgamma(n + 1) - n*log(n))
} else if (j == n) {
lt <- 0:(i - 1)
eq <- n:(n - i + 1)
sum(exp(lt*log(j - 1) - lgamma(lt + 1) - lgamma(eq + 1)))*exp(lgamma(n + 1) - n*log(n))
} else {
lt <- rep(0:(i - 1), each = n - i + 1)
gt <- rep((n - i):0, i)
eq <- n - lt - gt
sum(exp(lt*log(j - 1) + gt*log(n - j) - lgamma(lt + 1) - lgamma(eq + 1) - lgamma(gt + 1)))*exp(lgamma(n + 1) - n*log(n))
}
}
The following snippet demonstrates the function with a sanity check using $n=5$. First, enumerate all $n^n$ permutations of 1:5, with repetition. After sorting each permutation, tabulate the count of each value 1:5 over all permutations.
library(RcppAlgos) # for permuteGeneral
library(Rfast) # for rowSort and colTabulate
n <- 5L
get counts of each value 1:n at each position of the full set of sorted
permutations with repetition
x <- t(colTabulate(rowSort(permuteGeneral(n, n, TRUE))))
probabilities
x/n^n
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.67232 0.24992 0.06752 0.00992 0.00032
#> [2,] 0.26272 0.40032 0.24992 0.08032 0.00672
#> [3,] 0.05792 0.25952 0.36512 0.25952 0.05792
#> [4,] 0.00672 0.08032 0.24992 0.40032 0.26272
#> [5,] 0.00032 0.00992 0.06752 0.24992 0.67232
calculate the same probabilities using pnij
y <- diag(0, n)
for (i in 1:n) for (j in 1:n) y[i, j] <- pnij(n, i, j)
y
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.67232 0.24992 0.06752 0.00992 0.00032
#> [2,] 0.26272 0.40032 0.24992 0.08032 0.00672
#> [3,] 0.05792 0.25952 0.36512 0.25952 0.05792
#> [4,] 0.00672 0.08032 0.24992 0.40032 0.26272
#> [5,] 0.00032 0.00992 0.06752 0.24992 0.67232
all.equal(x/n^n, y)
#> [1] TRUE
For $n=20$:
n <- 20L
y <- diag(0, n)
system.time(for (i in 1:n) for (j in 1:n) y[i, j] <- pnij(n, i, j))
#> user system elapsed
#> 0 0 0
image(1:n, 1:n, y, xlab = "i", ylab = "j")
