4

Given a list of real numbers $p_1, \dots, p_n$, I am looking for a most efficient algorithm to sort this list in a "probabilistic ascending order", meaning that $p_i < p_j$ implies that it is likely for $i$ to be placed before $j$, but not certain. In principle, every permutation is a possible output, but the less sorted the permutation is, the less likely it is to occur.

The best solution I could come up with is to modify selection sort. Instead of selecting the minimal element in every step, you select a random element with probability proportional to $\frac{1}{p_i}$. This has quadratic complexity of course, so I was wondering if there are better alternatives.

Andreas T
  • 635
  • 3
  • 13

2 Answers2

5

One of the popular models for biased permutations is the Mallows model, dating to a paper of Mallows from 1957. Lu and Boutilier, quoting Doignon et al., give the following recipe for sampling a permutation according to the Mallows distribution, given a parameter $0 < \phi \leq 1$:

  1. Start with the permutation 1.
  2. Insert 2 into position 1 with probability $\frac{\phi-\phi^2}{1-\phi^2}$, and into position 2 with probability $\frac{1-\phi}{1-\phi^2}$.
  3. Insert 3 into positions 1,2,3 with probabilities $\frac{\phi^2-\phi^3}{1-\phi^3},\frac{\phi-\phi^2}{1-\phi^3},\frac{1-\phi}{1-\phi^3}$, respectively.
  4. Insert $4,\ldots,n$ in an analogous manner.

When inserting $x$ into position $i$, what you do is shift positions $i,\ldots,x-1$ one step forward, and insert $x$ in the resulting empty spot.

The probability to obtain a permutation $\pi$ is proportional to $\phi$ raised to the number of inversions in $\pi$ (the Kendall $\tau$ distance between $\pi$ and the identity permutation).

Another popular model is the Plackett–Luce model from 1959. There are other models, for example Tallis–Dansie.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
-1

The expected running time of quicksort with random pivot is actually $O(n\log n)$.

One can show that you cannot achieve better expected runtime, by any randomized algorithm that correctly sorts any list of $n$ elements. This is shown by Yao's principle, which states that the worst case expected running time of any randomized algorithm, is no better than the expected running time of the best deterministic algorithm for any input distribution.

This gives you the power to, instead of considering coin tossing algorithms, fix some input distribution, and ask what is the best expected running time of any deterministic algorithm relative to that distribution (i.e. it relates randomize complexity to distributional complexity).

Now, since we can fix any distribution, you can look at the uniform distribution over all possible inputs, and ask what is the best expected running time a deterministic algorithm achieves, relative to the uniform distribution. A lower bound proof in this setting is closer in spirit to the regular lower bound proof for deterministic sorting that you know, you can find it in the following lecture notes.

The idea is similar, you consider a comparison based sorting algorithm as a binary tree, where each node is a query of the form $x<y$, and each leaf in the tree represents a different permutation. The number of comparisons required to sort some input $a_1,...,a_n$ is given by the height of the leaf corresponding to the sorted permutation $a_{i_1},...,a_{i_n}$. Since the number of leaves is $n!$, you know the depth is $\Omega(n\log n)$. The difference in the distributional case is that you need to bound the average leaf height of a binary tree with $n!$ leaves (not the depth). Now, proceed to show that the average leaf height is minimized by a balanced binary tree, in which the average leaf height is $\Omega(\log n!)$. To see why you can focus on balanced trees, show that if the tree is unbalanced somewhere, you can "fix it", and this operation only decreases average leaf height.

I only dealt with algorithms that correctly sort all inputs, but using Yao's principle for Monte-Carlo algoirthms, you can extend this to your case.

Ariel
  • 13,614
  • 1
  • 22
  • 39