2

When looking through a few questions at StackOverflow¹ that all ask for algorithms to select k distinct random numbers out of N, I've become confused about how to compare the answers in terms of time and space complexity.

  • each algorithm has two parameters, N and k
  • and returns k elements

In terms of Big O notation, all the algorithms worth speaking of are O(N) overall.

But there are still major differences between them:

  • Some require ~k steps while others ~N
  • Some can return elements one by one while others only return them all at the end.
  • Most require all the k (or even N) elements to be present in memory at once, but some are able to return each one in sequence and discard it.

Now to unambiguously put these distinctions when describing each algorithm's complexity? More specifically, I'm interested in the standard way it is done in formal papers so as not to reinvent the wheel.


¹https://stackoverflow.com/questions/196017/unique-non-repeating-random-numbers-in-o1/16097246, https://stackoverflow.com/questions/158716/how-do-you-efficiently-generate-a-list-of-k-non-repeating-integers-between-0-and-n, https://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values, https://stackoverflow.com/questions/54059/efficiently-selecting-a-set-of-random-elements-from-a-linked-list

1 Answers1

2

You can report:

  • The running time (e.g., $O(k)$ vs $O(n)$),

  • The space requirements (e.g., $O(1)$, $O(k)$, etc.), and

  • Whether the algorithm is a streaming (online) algorithm or a batch algorithm. A batch algorithm returns the entire answer at once. A streaming or online algorithm returns numbers one at a time. You might also be interested in the notion of "delay"; for instance, a $O(\log n)$ delay algorithm returns algorithms one at a time, taking at most $O(\log n)$ steps (running time) to output each number.

Since there are two variables here, $n$ and $k$, if you report a running time like $O(k)$, it may be helpful to mention explicitly that the constant is independent of $n$.

Formal papers aren't always as formal as you might think. They'll use standard notation and terminology where they exist, but if standard notions don't cover it, they'll just use the English language to describe the situation, or they may introduce some new terminology and provide a definition for it. That's the great thing about human language: it provides a way to express arbitrary new concepts, even ones we don't already have a pre-existing standard name for.

D.W.
  • 167,959
  • 22
  • 232
  • 500