Computing the smallest eigenvalue of a positive definite matrix $\bf{A}$ without using $\bf{A^{-1}}$

Question

I want to numerically compute the smallest eigenvalue of an $n \times n$ positive definite matrix $\bf{A}$.

Why I want to avoid working with $\bf{A^{-1}}$

I know that I can apply power iteration to $\bf{A^{-1}}$, but I wish to avoid working with this inverse matrix.

In my application, $\bf{A}$ is implicitly expressed as $\bf{A} = \bf{K'K}$ with $\bf{K} = \bf{L^{-1}R}$, where

$\bf{L}$ is an $m \times m$ lower triangular band matrix;
$\bf{R}$ is an $m \times n$ band matrix of full column rank $(m > n)$.

While $\bf{L}$ and $\bf{R}$ are sparse, $\bf{K}$ and $\bf{A}$ are fully dense. Given that both $m$ and $n$ can be large, I want to explicitly form neither $\bf{K}$ nor $\bf{A}$.

A bad luck is that $m \neq n$, so that $\bf{R}$ is not square and there is no convenient factor form as $\bf{A^{-1}} = R^{-1}LL'R^{-1\prime}$.

What I have tried

With $\bf{A}$ structured as above, it is computationally efficient to compute $\bf{Av}$ for any vector $\bf{v}$ using sparse linear algebra routines. It is easy to compute $\bf{A}$'s largest eigenvalue $\mu$ using power iteration:

$\bf{v_0} = (\frac{1}{n}, \ldots, \frac{1}{n})$;
$\bf{u_0} = \bf{Av_0}$;
$\lambda_0 = \bf{v_0'u_0}$;
for $k = 1, 2, \ldots$ till convergence of $\{\lambda_k\}$
1. $\bf{v_k} = \bf{u_{k - 1}} / \|\bf{u_{k - 1}}\|$;
2. $\bf{u_k} = \bf{Av_k}$;
3. $\lambda_k = \bf{v_k'u_k}$;
return $\lambda_k$.

To find the smallest eigenvalue, I apply this algorithm to $\bf{B} = \bf{A - \mu\bf{I}}$, inspired by this thread: https://math.stackexchange.com/a/271876/407465. Here is an R program to implement this method.

PowerIter <- function (A, mu = 0) {
  n <- nrow(A)
  ## spectral shift
  A <- A - diag(mu, n)
  ## power iteration
  v.old <- rep.int(1 / n, n)
  u.old <- A %*% v.old
  d.old <- sum(v.old, u.old)
  k <- 0L
  repeat {
    v.new <- u.old * (1 / sqrt(sum(u.old ^ 2)))
    u.new <- A %*% v.new
    d.new <- sum(v.new * u.new)
    if (abs(d.new - d.old) < abs(d.old) * 1e-8) break  ## test relative error
    d.old <- d.new
    u.old <- u.new
    k <- k + 1L
  }
  cat("convergence after", k, "iterations.\n")
  d.new
}

What goes wrong?

I found that this suggested algorithm (via spectral shift) does not seem numerically stable. I composed two toy examples, one being successful, the other being problematic.

A successful example

\begin{equation} \bf{A} = \begin{pmatrix} 2 & -1 & 0\\ -1 & 2 & -1\\ 0 & -1 & 2 \end{pmatrix} \end{equation}

With $|\bf{A} - \lambda\bf{I}| = (2 - \lambda)[(2 - \lambda)^2 - 2]$, the three eigenvalues are $2 + \sqrt{2} \approx 3.4142136$, $2$ and $2 - \sqrt{2} \approx 0.5857864$.

The algorithm is a success for this matrix. Here is the output of an R session.

A <- matrix(c(2, -1, 0, -1, 2, -1, 0, -1, 2), nrow = 3)
d.max <- PowerIter(A)
convergence after 7 iterations.
[1] 3.414214
d.min <- PowerIter(A, d.max) + d.max
convergence after 1 iterations.
[1] 0.5857864

A problematic example

Now let's construct $\bf{A} = \bf{Q'DQ}$, where $\bf{D} = \textrm{diag}(d_1, d_2, d_3)$ are eigenvalues and $\bf{Q}$ is a rotation matrix

\begin{equation} \bf{Q} = \begin{pmatrix} 0.36 & 0.48 & -0.8\\ -0.8 & 0.6 & 0\\ 0.48 & 0.64 & 0.6 \end{pmatrix} \end{equation}

I fixed $d_1 = 1$, $d_2 = 0.01$ and tried three choices for $d_3$: $10^{-3}$, $10^{-5}$ and $10^{-7}$. Here is the output of an R session.

test <- function (d) {
  Q <- matrix(c(0.36, -0.8, 0.48, 0.48, 0.6, 0.64, -0.8, 0, 0.6), nrow = 3)
  A <- crossprod(Q, d * Q)
  d.max <- PowerIter(A)
  d.min <- PowerIter(A, d.max) + d.max
  c(min = d.min, max = d.max)
}
test(c(1, 1e-2, 1e-3))
convergence after 3 iterations.
convergence after 298 iterations.
min         max
0.001000543 1.000000000
test(c(1, 1e-2, 1e-5))
convergence after 3 iterations.
convergence after 279 iterations.
min         max
1.04883e-05 1.00000e+00
test(c(1, 1e-2, 1e-7))
convergence after 3 iterations.
convergence after 279 iterations.
min          max
5.860836e-07 1.000000e+00

Computation of the smallest eigenvalue is slow and becomes increasingly inaccurate as $\bf{A}$ gets less well conditioned (but it is still far from being ill-conditioned!).

My questions

Is there a mathematical justification for such observation?
How can I modify this algorithm for better numerical stability?
If the algorithm can not be improved, can someone suggest another efficient algorithm?
If no better algorithm exists, any lower bound for the smallest eigenvalue? Gershgorin Circle Theorem is not good as it is too loose a bound.

I have some vague memories that Arnoldi does something like what you want. But I don't remember any details, so I can't say that it is what you're after. — Arthur, Jan 27 '22 at 07:52
One could also say that you want to compute the square of the smallest singular value of $K=L^{-1}R=U\Sigma V'$. So it also makes sense to look for procedures to compute select singular values in a sparse context. — Lutz Lehmann, Jan 27 '22 at 10:07
@LutzLehmann You are absolutely correct. SVD of $\bf{K}$ is more numerically stable than eigen decomposition of $\bf{K'K}$ (which doubles the condition number). But in the dense matrix setting I found that SVD is more expensive (time-consuming), so I did not think toward SVD here. Nevertheless, you are right. It may be worth doing some search about computing the largest/smallest singular value, either in dense or sparse scenario. — Zheyuan Li, Jan 28 '22 at 00:38

score 4 · Accepted Answer · answered Jan 27 '22 at 20:08

We are given the matrix $A$ where $$A = B^T B$$ and $B = L^{-1} R$ and $R$ is a tall matrix of full column rank. There is little difficulty associated with solving linear systems of the form $$Ax = f.$$ If $B = US$ is a $QR$ factorization of $B$, then $S$ is nonsingular because $B$ has full rank. It follows that $$A = B^T B = S^T U^T U S = S^T S.$$ The problem of solving $Ax=f$ reduces to solving $$S^T y = f, \quad S x = y.$$ In short, we have the ability to do the inverse iteration, i.e., apply the power method to $A^{-1}$.

Is there a mathematical justification for such observation?

The convergence of the power method is controlled by the ratio of the largest to the second largest eigenvalue. Suppose that $A$ is symmetric with eigenvalues $\lambda_j$ that have been ordered such that $$ |\lambda_1| > |\lambda_2| \geq |\lambda_{n-1}| \geq |\lambda_n|.$$ Let $v_j$ denote an eigenvector of $A$ with respect to $\lambda_j$ and let $x = \sum_{j=1}^n r_j v_j$ denote any starting vector. Then $$ A^k x = \sum_{j=1}^n r_j \lambda_j^k v_j = \lambda_1^k \sum_{j=1}^n r_j \left(\frac{\lambda_j}{\lambda_1} \right)^k v_j = \lambda_1^k \left( r_1 v_1 + \sum_{j=2}^n r_j \left(\frac{\lambda_j}{\lambda_1} \right)^k v_j \right).$$ We see that if $r_1 \not = 0$ and if $|\lambda_1| \gg |\lambda_2|$, then $A^k x$ will converge rapidly towards an eigenvector of $A$ corresponding to the eigenvalue $\lambda_1$.

In the cited experiments the matrix $A$ is replaced with $A - \lambda_\max I$. In the "bad" case, $A$ is symmetric positive definite with eigenvalues $1$, $10^{-2}$ and $10^{-k}$ for $k \in \{3,5,7\}$. The power method functions well for the matrix $A$ because $$10^{-2} \ll 1.$$ The shift ruins everything because the shifted matrix $A-I$ has eigenvalues $0$, $-0.99$ and $-1+10^{-5}$. Convergence is slow because $0.99 \approx 1-10^{-k}$.

How can I modify this algorithm for better numerical stability?

The algorithm suffers from subtractive cancellation in the final operation that is supposed to correct for the initial shift. If we are limited to working precision, then there is nothing that can be done to recover from that.

If the algorithm can not be improved, can someone suggest another efficient algorithm?

If we want faster convergence, then there are at least three distinct choices

The inverse iteration, i.e., the power method applied to $A^{-1}$.
The inverse subspace iteration with Ritz acceleration which is a generalization of the power method. There is a good discussion of this algorithm in the standard textbook "Matrix Computation" by Gene Golub and Charles van Loan. They call it orthogonal iteration.
The trace minimization algorithm (TraceMin) by Sameh and Wisniewski. This is a link to the original paper.

In all three cases, we do not have to explicitly form the matrix $A^{-1}$, but we need the ability to solve linear systems of the form $Ax=f$.

If no better algorithm exists, any lower bound for the smallest eigenvalue?

This is by far the hardest of the four questions. There is good convergence theory for the algorithms cited above and you can get estimates from this theory and your intermediate results. It is an open problem to determine a good lower bound for the smallest eigenvalue of a general symmetric positive definite matrix in using a reasonable amount of time, storage and energy. If it could be done, then the practical implications would be enormous.

(1) A beautiful explanation on the slow convergence of shifted power iteration for example 2. Thanks! (2) When adding back $\mu$ (the dominant eigenvalue of $\bf{A}$, positive) to the computed eigenvalue $\lambda$ (the dominant eigenvalue of $\bf{A} - \mu\bf{I}$, negative), there is indeed a danger of loss of significant digits if $\mu \approx |\lambda|$. This will be the case when $\bf{A}$'s smallest eigenvalue is close to 0. In general, calculating $x - y$ when $x$ and $y$ are close causes cancellation error. Will this be a workaround $(\frac{x}{y} - 1)y$? — Zheyuan Li, Jan 28 '22 at 00:20
(3) Thanks for your list of references. Happy reading time! (4) The inverse power iteration which works with $\bf{A^{-1}}$ is the most numerically stable way as far as I could tell. Although we never need to explicitly form this inverse matrix, we still need to be able to solve linear system $\bf{Ax} = \bf{b}$ as you said. You suggested QR factorization on $\bf{B}$ (or $\bf{K}$ in my original notation), but isn't that expensive for my application? I will have to explicitly form $\bf{K} = \bf{L^{-1}R}$ first, right? By doing this I lose sparsity. — Zheyuan Li, Jan 28 '22 at 00:30
(5) In fact, if I can explicitly form $\bf{K}$, I can form $\bf{A = K'K}$ and do a Cholesky factorization on $\bf{A}$. This is faster than QR factorization. (6) Lots of random thoughts... Will some sort of preconditioning help? I found this: https://www.netlib.org/utk/people/JackDongarra/etemplates/node413.html but haven't digested it. (7) I also heard of randomized matrix factorization but haven't had a chance to look into it. Will it be promising to do a randomized SVD on $\bf{K}$? (8) This is my first time to carefully examine eigen problems. I find it overwhelming, with so many algorithms. — Zheyuan Li, Jan 28 '22 at 00:45
The expression $\frac{x}{y}-1$ cancels catastrophically when $x \approx y$ simply because $\frac{x}{y} \approx 1$. You need to form the matrix $K = L^{-1}R$ explicitly in order to compute the $QR$ factorization of $K$. This will destroy the sparsity of $L$ and $R$, but it very rare that we can get what we want for free. — Carl Christian, Jan 28 '22 at 09:13
Certainly, you can form $A=K'K$ explicitly and then do a Cholesky factorization. The cost is similar to the QR option, but deciding which is less sensitive to rounding error is not a simple question. In general, we work with orthogonal transformations and solve triangular linear systems when we can because these kernels are very reliable. — Carl Christian, Jan 28 '22 at 09:13
The netlib node concerns shifted inverse iteration. It is derived from the power method. It is frequently successful, but convergence hinges on the distribution of the eigenvalues of the matrix. I cannot comment on the quality of randomized algorithms. Certainly, there are many algorithms, but in reality they are all based on the same few ideas. — Carl Christian, Jan 28 '22 at 09:45

Computing the smallest eigenvalue of a positive definite matrix $\bf{A}$ without using $\bf{A^{-1}}$

Why I want to avoid working with $\bf{A^{-1}}$

What I have tried

What goes wrong?

convergence after 7 iterations.

[1] 3.414214

convergence after 1 iterations.

[1] 0.5857864

convergence after 3 iterations.

convergence after 298 iterations.

min max

0.001000543 1.000000000

convergence after 3 iterations.

convergence after 279 iterations.

min max

1.04883e-05 1.00000e+00

convergence after 3 iterations.

convergence after 279 iterations.

min max

5.860836e-07 1.000000e+00

My questions

1 Answers1

Linked