11

You are given an array of length $n$. Each element of the array belongs to one of $K$ classes. You are supposed to rearrange the array using minimum number of swap operations so that all elements from the same class are always grouped together, that is they form a contiguous subarray.
For example: $$ \begin{align*} &[2, 1, 3, 3, 2, 2] \longrightarrow [2, 2, 2, 1, 3, 3], \text{ or} \\ &[2, 1, 3, 3, 2, 2] \longrightarrow [1, 2, 2, 2, 3, 3], \text{ or} \\ &[2, 1, 3, 3, 2, 2] \longrightarrow [3, 3, 2, 2, 2, 1]. \end{align*} $$ Three other valid arrangements remain.

What is this problem called in literature? Is there an efficient algorithm for it?

D.W.
  • 167,959
  • 22
  • 232
  • 500
Marko Bukal
  • 141
  • 4

2 Answers2

7

Note: It is a hardness proof, and I think there are practical algorithms like integer programming, etc.

Given a BIN_PACKING instance where you want to pack $K$ numbers $n_1,\ldots,n_K$ into $L$ bins of size $m_1,\ldots,m_L$, and it is ensured that $\sum n_i=\sum m_j=N$, then we could design a instance of your problem as follows:

  • There are $K+(N+1)(L-1)$ classes;
  • The first $K$ classes have size $n_1,\ldots,n_K$ respectively, and each of the rest classes have size $N+1$;;
  • The array is partitioned into slots of size: $$m_1,(N+1)^2,m_2,(N+1)^2,m_3,\ldots,(N+1)^2,m_L$$ where each slot of size $(N+1)^2$ is packed with $N+1$ classes, arranged contiguously, and the rest are arbitrarily arranged.

Now a key observation is that it is meaningless to keep at least one class in a $(N+1)^2$ slot unmoved and move other ones (because it won't change the size of a 'bin'). So the original bin packing is available if and only if the minimum number of swaps is no larger than $N$. Since BIN-PACKING is known to be strongly NP-complete, your problem is NP-hard.

Wei Zhan
  • 1,183
  • 7
  • 16
2

I also suspect this is NP-hard, but in the absence of an idea for a proof, here are a couple of quickly computable lower bounds that might be useful for checking optimality of a heuristic solution, or pruning a branch-and-bound search.

Let class $i$ contain $n_i$ elements. In any valid solution, class $i$ must begin at some position $j$. Thus we can compute a lower bound $L_i$ on the cost of "fixing" class $i$ by trying all possible starting positions $j$, counting the number of non-$i$ elements in the length-$n_i$ block beginning at position $j$ (each of these positions will require a swap), and taking the minimum. This $L_i$ can be computed for any $i$ in $O(n)$ time using a sliding window approach, for $O(Kn)$ time overall. Two overall lower bounds are then:

  1. Take the maximum over all $L_i$. Tight for $K=2$, probably very weak for large $K$.
  2. Sum all $L_i$ and divide by 2, rounding up. This is valid because any swap can fix at most 2 incorrect positions.

In your example these bounds both give 1 (0.5 can be rounded up in the latter case), which is of course loose.

j_random_hacker
  • 5,509
  • 1
  • 17
  • 22