10

I would like to compute the Schur complement $A-B^TC^{-1}B$, where $C = I+VV^T$ (diagonal plus low rank). The matrix $A$ has $10^3$ rows/columns, while $C$ has $10^6$ rows/columns. The Woodbury formula yields the following expression for the Schur complement:

$$A-B^TB+B^TV(I+V^TV)^{-1}V^TB.$$

This expression can be evaluated without storing large matrices, but the result suffers from numerical inaccuracies. Is there a numerically more stable way of computing the Schur complement, without high memory requirements?

amWhy
  • 210,739
LinAlg
  • 20,093

1 Answers1

5

Let us decompose the matrix $VV^T$ into its eigenvectors. Thus we have, $VV^T = \Sigma u_iu_i^T$ where $u_i$ is defined where $ u_i = \sqrt[2]\lambda_i \alpha_i$ where $\lambda_i$ is an eigenvalue of $VV^T$ and $\alpha_i$ is the corresponding eigenvector. Since the matrix $VV^T$ is positive definite its eigenvectors would be orthogonal meaning that $u_i^Tu_j= \sqrt[2]\lambda_i\alpha_i^T\alpha_j\sqrt[2]\lambda_j = 0$ when $i \neq j$.Applying Sherman-Morrison formula, for a $u_1$ we get - $(I+u_1u_1^T)^{-1} = I^{-1} - \frac{u_1u_1^T}{1 + u_1^Tu_1}$. This expression simplifies to $(I+u_1u_1^T)^{-1} = I - \frac{u_1u_1^T}{1 + \lambda_1}$ where $\lambda_1$ is the eigenvector corresponding to $u_1$. Now consider $(I^{-1}+ u_1u_1^T +u_2u_2^T) = (I+u_1u_1^T)^{-1} + \frac{(I - \frac{u_1u_1^T}{1 + \lambda_1})u_2u_2^T(I - \frac{u_1u_1^T}{1 + \lambda_1})}{1 + u_2^Tu_2}$. Since $u_1^Tu_2 =0$, $u_2u_2^Tu_1u_1^T = 0$. Hence the numerator simplifies as $u_2u_2^T$. Thus, $(I + u_1u_1^T + u_2u_2^T)^{-1} = I - \frac{u_1u_1^T}{1+\lambda_1} - \frac{u_2u_2^T}{1+\lambda_2}$. By induction it can be proven that $(I + \Sigma u_iu_i^T)^{-1} = I - \Sigma \frac{u_iu_i^T}{(1 + \lambda_i)}$. This can be used to compute the inverse of $(I + VV^T)$ with sufficient numerical stability. Regarding the final computation I think multiplication is inevitable. The time complexity of inversion by the given method is still $O(n^3)$ where $VV^T$ has dimensions $nxn$. So this would be the time complexity of the overall operation as well. But the memory required is $O(n^2)$. Also the net inaccuracy would be proportional to $\Sigma \Delta \lambda_i$ where $\Delta \lambda_i$ is the error in computing the eigenvalues.

Kushal Sharma
  • 529
  • 3
  • 8
  • I calculate conditional mean and conditional variance of Multivariate Normal: conditioned on n-1 length random vector, to produce 1 by 1 conditional mean and variance. On each successive instance, need to do this when 1 row and column is added to covariance matrix. Covariance matrix may be EXTREMELY ill-conditioned. I use backsolves instead of matrix inversions, but find I need ~ octuple precision to get one digit correct, Conditional variance can be negative in double precision. Any idea on how to do this in double precision stably enough? SW octuple precision is ~ 1000 times slowdown. – Mark L. Stone Oct 08 '16 at 23:45
  • See https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions for "textbook" formulas for conditional mean and conditional covariance. Conditional covariance is a Schur Complement. Conditional mean has the same two first of three terms in 3 term products used for both. Numerical stability after add many rows and columns is very important. Could potentially do marginally stable updates, with periodic or as needed complete stable redos to limit error accumulation. If you can give good answer, I'll make it a question with 150 point bounty. – Mark L. Stone Oct 08 '16 at 23:51