4

The Conjugate Gradient algorithm (CG) is an iterative Matrix-vector multiplication based scheme for solving equations like $$Ax = b$$

Without having to calculate an explicit inverse $A^{-1}$ or some factorization of $A$.

I wonder if we have more complicated things we would want to calculate, for example $$(A^{-1}+C_1)(B^{-1}+C_2) x$$

Wherever we do encounter $M^{-1}v$ (if the matrix $M$ fulfills the requirements posed by the CG algorithm) we can calculate this by the CG algorithm.

While we could be certain that this approach shall work... Is it efficient or wasteful?

Do there exist some framework or expansion for CG which takes such things into account?

By linearity if we encounter $( {M_1}^{-1} + {M_2}^{-1} )v$ we can expand it like so : $${M_1}^{-1}v + {M_2}^{-1}v$$ and calculate the terms independently of each other.

If we encounter $(M_1)^{-1}(M_2)^{-1}v$ we can calculate either $$(M_1)^{-1}(M_2^{-1}v)$$ in other words a two-step calculation, or we can do $(M_2M_1)^{-1}v$ where the matrix to be inverted can be treated either lazily decoupled inside of the CG algorithm or be explicitly calculated before. Which would be most efficient?

mathreadler
  • 26,534
  • 1
    This is generally wasteful because the inner solves need to be solved to an equal or tighter tolerance than the outer solve at each outer iteration. There are "flexible" krylov methods such as flexible gmres (fgmres) that may work, but will take a lot of memory because they require storing the whole krylov subspace. Another option is to define auxiliary variables to get rid of the inverses, and solve the resulting larger block system using a krylov method. The larger system will typically be indefinite, so minres is usually best. – Nick Alger Jun 02 '22 at 06:16
  • In which of the examples above will I need to run one whole CG for each CG iteration of the other? If not in the examples, can you give another example where it will be needed? – mathreadler Jun 02 '22 at 16:26
  • Yes, you will have to run the inner CG to a tight tolerance if the outer method is not flexible. I've expanded upon the auxiliary variables/block system idea in an answer below. – Nick Alger Jun 02 '22 at 21:48

1 Answers1

3

If you do an inner-outer CG method, where CG is used for the inner inverses within a CG iteration for the outer inverse, then achieving an overall error to be less than some tolerance $\epsilon$, requires you to do the inner solves to within tolerance $\epsilon$, for all outer iterations. This is typically quite expensive, and so it is not commonly done.

There are some "flexible" Krylov methods such as flexible GMRES (fGMRES), which allow you some flexibility in the tolerance for the inner solve. I've had poor results with this and do not recommend it.

I would recommend, instead, defining auxiliary variables to remove inverses, then doing a Krylov method on the overall block system that results. For example, in the equation $$(A^{-1} + C_1)(B^{-1} + C_2)x=b$$ let us define $y:= B^{-1} x$, and $z:=A^{-1}(y + C_2 x)$. Multiplying these auxiliary variables through to clear inverses allows us to write our system as follows: \begin{align*} z + C_1 y + C_1 C_2x &= b \\ B y &= x \\ A z &= y + C_2 x \end{align*} or equivalently $$ \begin{bmatrix} C_1 C_2 & C_1 & I \\ -I & B & 0 \\ -C_2 & -I & A \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} b \\ 0 \\ 0 \end{bmatrix} $$ and now this bigger 3x3 block system may be solved with a standard Krylov method like GMRES.

Typically, if the original system is symmetric and you are smart about how you define your variables and order the blocks, you can make the expanded block system symmetric, and so you can use MINRES, which saves memory as compared to GMRES. However, the way I have done it in the example above is not symmetric.

Nick Alger
  • 19,977
  • Yes, nice. This is perfect. Also due to the method of constructing the block matrix, we can permute with sign shifts to get $I$ blocks on diagonal. Thank you, Nick. – mathreadler Jun 03 '22 at 13:45
  • 1
    Block Gauss-Seidel might be a good preconditioner as well. I.e., choose an ordering so that the diagonal blocks are invertible, and replace all blocks above (or below) the diagonal with zero so that the preconditioner is block lower triangluar. Applying the inverse of the preconditioner to a vector amounts to solving a block triangular system – Nick Alger Jun 03 '22 at 16:58
  • I just realized the value of choosing a good preconditioner when I tried CG on the normal equations to this system. It did often get a quite bad condition numbers. I will check out Gauss Seidel. – mathreadler Jun 04 '22 at 19:54
  • 1
    @mathreadler For the particular example above, you might try solving $(I+A C_1)y = A b$ for $y$, then solving $(I+BC_2)x = B y$, in sequence, each with an iterative method. The identity matrices will help the conditioning. There are many ways to create algebraically equivalent block systems, and it is an art to find a good one for a given problem. – Nick Alger Jun 04 '22 at 23:11
  • I was just about to add a small Tikhonov regularization term for the block system, but maybe it will behave even better if I split it into smaller like you propose. – mathreadler Jun 04 '22 at 23:48
  • I could not find how to construct a precondition matrix using Gauss-Seidel scheme. I implemented the scheme itself and later tried using $L$ and $L^T$ as factors for octave pcg as I thought some triangular factor couple would be nice for the quick inverse properties. It gives me better convergence than using same pcg function with no preconditioner at all, but I suspect a better one can be derived. Can you point me in some direction? – mathreadler Jun 10 '22 at 20:24
  • 1
    The three main preconditioners in this class are Jacobi, Gauss-Seidel, and Successive over-relaxation (SOR). Jacobi uses the diagonal of the matrix as the preconditioner. Gauss-Seidel uses the triangular part of the matrix as the preconditioner, including the diagonal (either the lower or upper triangle). SOR uses the strict lower triangle, strict upper triangle, and diagonal in a specific combination. Here are some references: https://en.wikipedia.org/wiki/Jacobi_method https://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method https://en.wikipedia.org/wiki/Successive_over-relaxation – Nick Alger Jun 10 '22 at 20:48
  • 1
    Block versions of these preconditioners are the same, except the fundamental elements are blocks of the matrix instead of entries. For example, if you have a block 2x2 matrix $\begin{bmatrix}A & B \ C & D\end{bmatrix}$, block Gauss-Seidel would use the lower block triangle $\begin{bmatrix}A & 0 \ C & D\end{bmatrix}$. To apply the inverse of this lower block triangle to a block vector $(b, c)$, you do triangular backsubstitution, but in blocks. First you solve $Ax=b$ for $x$, then you solve $Dy = c - Cx$ for $y$. The solution is the block vector $(x,y)$ – Nick Alger Jun 10 '22 at 20:52