13

If I have two matrices $A$ and $B$, of dimensions $1000\times2$ and $2\times1000$, respectively, and want to compute $(AB)^{5000}$, it's more efficient to first rewrite the expression as $A(BA)^{4999}B$ and only then evaluate numerically, because $AB$ is of dimension $1000\times1000$ but $BA$ is of dimension $2\times2$.

I want to solve a generalized version of this problem. Is there a reasonably efficient algorithm (not brute force) to optimize an expression containing:

  • Free matrix variables of known dimensions
  • Products of arbitrary subexpressions
  • Arbitrary subexpressions raised to natural power

... so that it takes the least amount of work to evaluate numerically, after substituting the free matrix variables with concrete matrix values?

The matrix chain multiplication problem is a special case of my problem.


Edit:

This is a tentative answer. It seems intuitively right to me, but I have no proof that it's correct. If it turns out to be correct, I'm still interested in the proof. (If it's not correct, of course, please do correct me.)

For every product raised to a power, say, $(A_1 A_2 \ldots A_k)^n$, consider every cyclic permutation of the factors:

  • $(A_1 A_2 \ldots A_k)^n$
  • $A_1 (A_2 \ldots A_k A_1)^{n-1} A_2 \ldots A_k$
  • $A_1 A_2 (A_3 \ldots A_k A_1 A_2)^{n-1} A_3 \ldots A_k$
  • ...
  • $A_1 A_2 \ldots A_{k-1} (A_k A_1 A_2 \ldots A_{k-1})^{n-1} A_k$

... recursively. Each power is to be calculated using exponentiation by squaring (obviously), and all other products are to be calculated using the optimal order returned by the matrix chain multiplication algorithm.


Edit:

The idea outlined in my previous edit is still somewhat nonoptimal. The exponentiation by squaring algorithm actually evaluates expressions of the form $K A^n$ or $A^n K$, where $K$ isn't necessarily the identity matrix. But my algorithm doesn't consider the possibility of using the exponentiation by squaring algorithm with $K$ not equal to the identity matrix.

isekaijin
  • 405
  • 2
  • 9

2 Answers2

3

Disclaimer: The following method has not been rigorously proven to be optimal. An informal proof is provided.

The problem reduces to finding the most efficient ordering when considering the square of the product.

For example, when looking at e.g. $(ABC)^{50}$, we only need to optimally solve $(ABC)^2$ since this expands to $ABCABC$. No useful ordering information is added by concatenating $ABC$ again. The intuition here is that since the problem of optimal ordering can be solved bottom-up, higher orderings consisting of more elements using the same matrices are irrelevant.

Finding the best ordering of $ABCABC$ reduces to the Matrix Chain Multiplication problem. After finding an optimal ordering, apply exponentiation to the triplet (n-tuple generally) in the ordering.

As an e.g., if the optimal ordering for the square is $A(B(CA))BC$, the solution to the initial problem is $A(B(CA))^{49}BC$.

In summary:
1) The first step in solving $(A_1 A_2 \cdots A_n)^m$ is to solve $(A_1 A_2 \cdots A_n)^2$.
2) Solving $(A_1 A_2 \cdots A_n)^2$ is best approached as an instance of the Matrix Chain Multiplication problem.
3) Using the n-tuple ordering $G$ from the solution in (2) will give us the solution to (1) as some flavor of $A_1 \cdot A_2 \cdot G^{m-1} \cdot A_n$ (note that any other groupings from solving (2) should be applied as well).

Informal proof
Considering the simplest case using two matrices, $(AB)^n$, we note that $A$ and $B$ have dimension $X \times Y$ and $Y \times X$ respectively. Any product using $A$ and $B$ has one of the following dimensions:

$X \times Y$
$Y \times X$
$Y \times Y$
$X \times X$

We have either $X < Y$ or $Y ≤ X$.

Assumption 1a): $X < Y$
$AB$ has dimension $X \times X$, and this ordering is guaranteed to be optimal from a bottom-up approach. Any other configuration of $A$ and $B$ is either equally good, or worse. Thus, the problem is optimally solved as $(AB)^n$.

Assumption 1b): $Y ≤ X$
$BA$ has dimension $Y \times Y$. This is the optimal ordering for all products involving $A$ and $B$. Thus, the solution is optimally found as $A(BA)^{n-1}B$.

This concludes the proof, and we have only looked at the two orderings found in $ABAB$, the square problem.

Using more matrices, the argument is similar. Perhaps an inductive proof is possible? The general idea is that solving the MCM for the square will find the optimal size for the operations with all involved matrices considered.

Case study:

julia> a=rand(1000,2);
julia> b=rand(2,1000);
julia> c=rand(1000,100);
julia> d=rand(100,1000);
julia> e=rand(1000,1000);

julia> @time (a*b*c*d*e)^30;
  0.395549 seconds (26 allocations: 77.058 MB, 1.58% gc time)

# Here I use an MCM solver to find out the optimal ordering for the square problem
julia> Using MatrixChainMultiply
julia> matrixchainmultiply("SOLVE_SQUARED", a,b,c,d,e,a,b,c,d,e)
Operation: SOLVE_SQUARED(A...) = begin  # none, line 1:
    A[1] * (((((A[2] * A[3]) * (A[4] * (A[5] * A[6]))) * (A[7] * A[8])) * A[9]) * A[10])
  end
Cost: 6800800

# Use the ordering found, note that exponentiation is applied to the group of 5 elements
julia> @time a*(((((b*c)*(d*(e*a)))^29*(b*c))*d)*e);
  0.009990 seconds (21 allocations: 7.684 MB)

# I also tried using the MCM for solving the problem directly
julia> @time matrixchainmultiply([30 instances of a,b,c,d,e]);
  0.094490 seconds (4.02 k allocations: 9.073 MB)
matteyas
  • 46
  • 4
-1

If you want to calculate the product of n matrices $A_1$ to $A_n$ in the best possible time, you can easily calculate how many operations are needed to calculate the product of $A_i$ to $A_j$ for all 1 ≤ i ≤ j ≤ n in $O (n^3)$ steps.

gnasher729
  • 32,238
  • 36
  • 56