6

The problem:

  • To generate a list of size $n$,
  • Containing unique integers,
  • Sampled uniformly in the range $\left[0,m\right)$,
  • In $O(n)$ time, except that:
    • Assuming $m$ is bounded by some word-size, $\left|m\right|$, the specific time should be $O(n\cdot\left|m\right|)$, as one cannot do better than this.

Apologies if this is a duplicate, if you find one, feel free to point it out.


EDIT: to clarify, the question implies that we are concerned the complixity in terms of bit-operations. (See logarithmic cost model).

Realz Slaw
  • 6,251
  • 33
  • 71

4 Answers4

4

Here is a solution in $O(n\log n)$ (with high probability). We consider two cases: $n\log n \geq m$ and $n\log n \leq m$. In the first case, we choose a random permutation of $[0,m)$ and take only the first $n$ elements. This takes time $O(m) = O(n\log n)$. In the second case, we maintain a balanced binary search tree (or equivalent), adding to it random elements from $[0,m)$ one by one, checking for duplicates each time. In expectation we need to try at most $1/\log n = o(1)$ extra times for each element, so the expected running time of this algorithm is $O(n\log n)$. In fact, the running time is $O(n\log n)$ also with high probability.

We can obtain an expected $O(n)$ solution by replacing the balanced binary search tree with a hash table, and changing the cutoff to $n \geq m/2$ versus $n \leq m/2$.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
4

Another approach is to use format-preserving encryption (e.g., a Feistel cipher) to build a pseudorandom permutation on the domain $\{0,1,\dots,m-1\}$, then encrypt the sequence $0,1,\dots,n-1$ and output the encrypted sequence. The randomness of this will be dependent upon cryptographic assumptions, and it might or might not perform as well as the other alternatives in practice.

If we use the Feistel cipher construction, then I would expect the running time of each encryption to be $O(|m|)$, so the running time to generate the full sequence to be $O(n \cdot |m|)$. However, expressing asymptotic runtime in this way might be a bit misleading, as it assumes one can build a PRF on $\{0,1,\dots,m-1\}$ whose running time is $O(|m|)$. That is indeed possible under suitable cryptographic assumptions (e.g., that AES is secure), but it does require those unproven assumptions. So while this is an approach you could try if you want this for a practical purpose, it might not be very useful if your goal is to prove a theorem about computational complexity.

D.W.
  • 167,959
  • 22
  • 232
  • 500
4
  1. Make a binary tree/trie, starting with nothing in the trie
  2. Pick a uniform random number over $\left[0,m\right)$
  3. Pad the number to $|m|$ bits, adding leading zeros if necessary
  4. Insert the number into the binary tree/trie, one bit at a time where a 0 bit means "left" and a 1 bit means "right", inserting nodes as necessary:

  5. Every node in the binary tree/trie will keep track of the number of nodes underneath it in the binary tree/trie; it should be easy to maintain these tracking/values with no added complexity:

  6. Based on the previous step, it is possible to compute how many unused-numbers are possible under each branch in the tree: Given the depth of the node and the height of the tree, and the "number of nodes underneath it" from the previous step, it is possible to calculate the number of "unused numbers" on each branch:

  7. Using those numbers as weights, the next number should be computed by a weighted random selection of the branches. If a branch has a "unused numbers" weight of 0, then it will have zero chance of being selected, ensuring uniqueness.

  8. Go to step 3 and repeat until $n$ numbers are selected.

Complexity (feel free to correct):

  • Each insertion is $O(|m|)$, the depth of the binary tree/trie
  • Maintaining the weights adds no complexity
  • Choosing the random branches/bits would itself require a weighted random selection, for each branch/bit
  • A simple weighted random selection could take $O(|m|)$ time, given a $O(1)$ fair coin tossing method
  • Computing each number would therefore take |m| bit selections * O(|m|) time per selection * n integers = $O(n\cdot \left|m\right|^2)$
Realz Slaw
  • 6,251
  • 33
  • 71
1

I stumbled upon this question looking for a more practical answer, so here is a more hands-on solution using Java code and 32-bit integers as example. The basic idea is to take an array $\left[0, 1, ... m-1\right]$, shuffle it, and return the first $n$ entries. The go-to shuffling algorithm is Fisher-Yates which can be stopped early, making our solution even more efficient. This leads to the following code:

int[] randIntsUniqDense( RandomGenerator rng, int m, int n )
{
  assert m >= n;
  var samples = IntStream.range(0,m).toArray();
  // Partial Fisher-Yates shuffle
  for( int i=0; i < n; i++ ) {
    int j = rng.nextInt(i,m);
    // swap i <-> j
    int sample = samples[j];
    samples[j] = samples[i];
    samples[i] = sample;
  }
  // Return requested number of samples
  return Arrays.copyOf(samples,n);
}

The implementation above requires $\mathcal{O}(m)$ operations due to the large samples array that we have to build. But it turns out we can do better. Instead of keeping track of the entire samples array, we can only keep track of the changes using a Hash Table. The following randIntsUniqSparse() method behaves identical to randIntsUniqDense() while only requiring $\mathcal{O}(n)$ operations:

int[] randIntsUniqSparse( RandomGenerator rng, int m, int n )
{
  assert m >= n;
  var results = new int[n];
  var changes = new HashMap<Integer,Integer>();

// Partial Fisher-Yates shuffle for( int i=0; i < n; i++ ) { int j = rng.nextInt(i,m); // swap i <-> j var sample = changes.remove(j); if( sample == null ) sample = j;

if( i &lt; j ) {
  var displaced = changes.remove(i);
  if( displaced == null )
      displaced = i;
  changes.put(j,displaced);
}

results[i] = sample;

}

return results; }

To accurately answer the question, we now have to check that randIntsUniqSparse can be generalized to arbitrary-length integers, using at most $\mathcal{O}(n*log(m))$ operations. The answer is yes, assuming that rng.nextInt(i,m) is an $\mathcal{O}(log(m))$ operation. That assumption may not be valid for pseudo-RNGs if the desired bits of randomness increase with $m$. Instead, You could use a Hardware RNG.

DirkT
  • 1,021
  • 2
  • 13