Does a subcubic algorithm exist for the following problem?

Question

Given a symmetric real $n \times n$ matrix $A=(a_{ij})$, is there an algorithm which computes the sum $$\sum_{i,j,k}\max(a_{ij},a_{ik},a_{jk})$$ over all $1\leq i<j<k\leq n$ with time-complexity better than $O(n^3)$?

Kaban-5 · Accepted Answer · 2018-11-15T22:57:41.103

There exists a quite practical approach that works in $O(n^3 / w)$ time, where $w$ is the number of bits in the processor word. The main idea is that you iterate over the elements of the matrix one by one in the increasing order (break ties arbitrarily) and "switch them on". Consider the moment when the largest element of some triple $a_{ij}, a_{ik}, a_{jk}$ is switched on. For simplicity, let's assume that the said element is $a_{ij}$. It is natural to add the value of the triple to the answer now, when the last element is switched on. So we have to count the number of possible $k$'s such that $a_{ik}$ and $a_{jk}$ are already switched on (that would be the number of triples, here $a_{ij}$ is the largest element, so they were completely switched on just now). Here we can speed up the naive $O(n)$ implementation by using bit optimisation.

For details you can refer to the following implementation in C++11 that should work for $n \leqslant 5000$, $|a_{ij}| \leqslant 10^9$ (it is not very optimized; however, it still beats naive summation for $n = 5000$ by large margin at least on my machine).

// code is not very elegant, 
// but should be understandable
// here the matrix a has dimensions n x n
// a has to be symmetric!
int64_t solve (int n, const vector<vector<int32_t>> &a)
{
        std::vector<boost::dynamic_bitset<int64_t>> mat
        (n, boost::dynamic_bitset<int64_t>(n));

        vector<pair<int, int>> order;
        for (int j = 1; j < n; j++)
        for (int i = 0; i < j; i++)
            order.emplace_back(i, j);
        sort(order.begin(), order.end(),
            [&] (const pair<int, int> &l, const pair<int, int> &r) 
            {return a[l.first][l.second] < a[r.first][r.second];});

        int64_t ans = 0;
        for (const auto &position : order)
        {
            int i, j;
            tie (i, j) = position;
            mat[i][j] = mat[j][i] = 1;
            // here it is important that conditions 
            // mat[i][i] = 0 and mat[j][j] = 0 always hold
            ans += (mat[i] & mat[j]).count() * int64_t(a[i][j]);
        }

        return ans;
}

If you consider using bit optimisations cheating, you can use four russians method to the same result here, yielding a $O(n^3 / \log n)$ algorithm, which should be less practical (because $w$ is pretty large on most modern hardware) but is theoretically better. Indeed, let's choose $b \approx \log_2 n$ and keep each row of the matrix as an array of $\lceil \frac{n}{b} \rceil$ integers from $0$ to $2^b - 1$, where the $i$-th number in the array corresponds to the bits of the row ranging from $ib$ inclusive to $\min(n, (i + 1)b)$ exclusive in $0$-indexation. We can precalculate the scalar products of every two such blocks in $O(2^{2b} b)$ time. Updating a position in the matrix is fast because we are changing only one integer. To find the scalar product of rows $i$ and $j$ just iterate over arrays corresponding to that rows, look up scalar products of the corresponding blocks in the table and sum up the obtained products.

The above paragraph assumes that operations with integers $\leqslant n$ take $O(1)$ time. It is quite common assumption, because it usually does not actually change the comparative speed of the algorithms (for example, if we don't use that assumption, brute force method actually works in $O(n^3 \log n)$ time (here we measure time in bit operations) if $a_{ij}$ take integer values with absolute values at least up to $n^{\varepsilon}$ for some constant $\varepsilon > 0$ (and otherwise we can solve the problem with $O(n^{\varepsilon})$ matrix multiplications anyway); however the four russians method suggested above uses $O(n^3 / \log n)$ operations with numbers of size $O(\log n)$ in that case; so it makes $O(n^3)$ bit operations, which is still better than brute force despite the change of the model).

The question about existence of $O(n^{3 - \varepsilon})$ approach is still interesting, though.

Techniques (bit optimisations and four russians method) presented in this answer are by no means original and are presented here for completeness of the exposition. However, finding a way to apply them was not trivial.

Does a subcubic algorithm exist for the following problem?

1 Answers1