Find an optimal ordering

Question

I came across this problem and am struggling to find a way to approach it. Any thoughts would be greatly appreciated!

Suppose we are given a matrix $\{-1, 0, 1\}^{n\ \times\ k} $, for example,

$$\begin{bmatrix} 1 & 0 & 1 & 0 & -1 \\ -1 & 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 & -1 \\ -1 & -1 & 0 & 1 & 1 \\ 1 & 0 & 0 & 0 & -1 \end{bmatrix}$$

Without trying every single permutation, find an ordering of columns $c_i$ that maximises the number of rows for which the first non-zero element is $1$.

For the example above, one such ordering (it's not unique!) is $(c_3, c_4, c_1, c_2, c_5)$, i.e.,

$$\begin{bmatrix} 1 & 0 & 1 & 0 & -1 \\ 0 & 0 & -1 & 0 & 1 \\ 1 & 0 & 0 & 1 & -1 \\ 0 & 1 & 1 & -1 & 1 \\ 0 & 0 & 1 & 0 & -1 \end{bmatrix}$$

Here, for $4$ out of $5$ rows the first non-zero element is $1$.

j_random_hacker · Answer 1 · 2019-03-28T14:32:08.007

This problem, which I'll call CO for Column Ordering, is NP-hard. Here's a reduction from the NP-hard problem Vertex Cover (VC) to it:

Decision problem forms of VC and CO

Let the input VC instance be $(V, E, k)$. It represents the question: "Given the graph $(V, E)$, is it possible to choose a set of at most $k$ vertices from $V$ such that every edge in $E$ is incident on at least one chosen vertex?" We will construct an instance $(A, k')$ of CO that represents the question: "Given the matrix $A$ with elements in $\{-1, 0, 1\}$, is it possible to permute the columns of $A$ such that a 1 appears before a -1 on at least $k'$ rows?" These two problems are stated in decision problem form, whereby the answer to each is either YES or NO: formally speaking, it is this form of a problem that is NP-complete (or not). It is not too difficult to see that the more natural optimisation problem form stated in the OP's question is roughly equivalent in terms of complexity: binary search on the threshold parameter can be used to solve the optimisation problem using a decision problem solver, while a single invocation of an optimisation problem solver, followed by a single comparison, is enough to solve the decision problem.

Constructing an instance of CO from an instance of VC

Let $n=|V|$ and $m=|E|$. We will build a matrix $A$ with $(n+1)m + n$ rows and $n+1$ columns. The top $(n+1)m$ rows will be formed of $m$ blocks of $n+1$ rows each, with each block representing an edge that needs to be covered. The bottom $n$ rows contain vertex "flags", which will cause a column (corresponding to a vertex) to incur a fixed cost if it is included in the left-hand side of the CO solution (corresponding to a vertex being included in the vertex cover of the VC solution).

For each vertex $v_i$, create a column in which:

among the top $(n+1)m$ rows, the $j$-th block of $n+1$ rows all contain a +1 when edge $e_j$ is incident on $v_i$, and 0 otherwise, and
the bottom $n$ rows are all 0 except for the $i$-th, which is -1.

Create one more "fence" column that consists of $(n+1)m$ copies of -1, followed by $n$ copies of +1.

Finally, set the threshold $k'$ for the constructed CO instance: $(n+1)m + n - k$. In other words, we allow at most $k$ rows in which a -1 appears before a +1. Let's call this number of violating rows the "cost" of a CO solution.

Proof

The correspondence between a solution to the CO instance and a set of vertices in the original VC instance is: Every column to the left of the fence corresponds to a vertex that is in the set, and every column to the right of the fence corresponds to a vertex that is not.

Intuitively, the -1s in the top of the "fence" column force the selection of a subset of columns to be placed to its left that together contain +1s in all these positions -- corresponding to a subset of vertices that are incident on every edge. Each of these columns that appears to the left of the "fence" has a -1 on a distinct row somewhere in the bottom $n$ rows, incurring a cost of 1; the +1s in the bottom of the "fence" ensure that all columns placed to its right incur no such cost.

Clearly a VC solution using at most $k$ vertices yields a solution to the constructed CO instance with cost at most $k$: Just order the columns corresponding to vertices in the vertex cover arbitrarily, followed by the fence column, followed by all remaining columns in any order.

It remains to show that a solution to the CO instance with cost at most $k$ corresponds to a vertex cover with at most $k$ vertices.

Suppose to the contrary that there exists a solution to the CO instance with cost at most $k$ that leaves some row in the top $(n+1)m$ rows with a -1 before a +1. This row belongs to a block of $(n+1)$ rows corresponding to a particular edge $uv$. Every row in this block in the original instance $A$ is identical by construction; permuting columns may change these rows, but does not affect the fact that they are identical. Thus each of these $n+1$ identical rows has a -1 before a +1 in the solution, implying a cost of at least $n+1$. But $k \le n < n+1$: contradiction.

Since each of the $m$ blocks of rows in the top $(n+1)m$ rows have a +1 before a -1, each of the corresponding edges is covered by a vertex corresponding to a column to the left of the fence: that is, this subset of vertices constitutes a vertex cover. Since none of the top $(n+1)m$ rows have a -1 before a +1, the only place where cost can accrue in the solution is in the bottom $n$ rows, from columns placed to the left of the fence. Each such column has cost exactly 1, so given that the cost is at most $k$, there must be at most $k$ such columns, and hence at most $k$ vertices in the cover.

Finally, it's clear that the CO instance can be constructed in polynomial time from the VC instance, meaning that if a polynomial-time algorithm existed for solving CO, any VC instance could also be solved in polynomial time by first constructing a CO instance as described above and then solving it. Since VC is NP-hard, CO is too.

Optidad · Answer 2 · 2019-03-19T09:46:26.050

I don't know if there is actually a polynomial solution. Nevertheless based on Pål GD's comment, you can build a simplification function. The initial matrix is simplificated as you build the output sequence $S$.

function simplification:
while(true)
    if any row i$ has no 1 or no -1 left, remove it
    if any column j has no -1 then,
       remove it and put j on the leftmost available position in S,
       remove all rows where column j has 1.
    if any column j has no 1 then, 
       remove it and put j on the rightmost available position in S.
    if no modification has been done on this loop, break

Then you have to do a complete exploration of the combinatorics using iteratively the function pick:

function pick(k):
    put column k on the leftmost available position in S
    remove any row where column k is -1 or 1

After every pick you can do a simplification to possibly reduce the number of possibilities to explore. I suggest to explore greedily starting with the column having the less -1, thus you may reach a lower bound making a stop criterion.

On the given example, the first simplification gives (as Pål GD explained in comment)

$S[0] = c3$, remove r1, r3
$S[1] = c4$, remove r4
$S[2] = c2$ this let you with a simple matrix to explore. $$\begin{bmatrix} -1 & 1 \\ 1 & -1 \\ \end{bmatrix}$$

I think a matrix making this method quite unefficient would have exactly one 1, and one -1 per row/column, something like $$\begin{bmatrix} -1 & 1 & 0 & 0 & 0 & 0 \\ 1 & -1 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 0 & 0 \\ 0 & 0 & 1 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 1 \\ 0 & 0 & 0 & 0 & 1 & -1 \\ \end{bmatrix}$$

Nevertheless, the simplification still spare about half the exploration steps. And this type of matrix can be split in several independant sub-matrix.

Find an optimal ordering

2 Answers2

Decision problem forms of VC and CO

Constructing an instance of CO from an instance of VC

Proof