9

I have a symmetric matrix $A$, which has zeroes all along the diagonal i.e. $A_{ii}=0$.

I cannot change the off diagonal elements of this matrix, I can only change the diagonal elements. I need this matrix to be positive definite.

One way to ensure this is as follows:

Let $\lambda'$ by the absolute value of the most negative eigenvalue and transform $A\mapsto A + \lambda'I_{na}$. Notice this leaves the off-diagonal elements unchanged, but now it is positive definite:

$(A+\lambda'I_{na})x_{i}=(\lambda_{i}+\lambda')x_{i}=\lambda_{i}^{new}x_{i}$,

where $(x_{i},\lambda_{i})$ are an eigenpair. The eigenvalues of the new matrix formed by adding $\lambda'$ to the diagonal are all positive.

I fear that this solution is sub-optimal - in my application I want to add only as much as I need to and no more. I would like to know if I can ensure positive definiteness by more generally performing

$A+D$

where $D$ is some diagonal matrix.

[Application: Statistics. $A$ is related to the covariance of some augmented data. I want it to be as small as possible so as to reduce the leverage of the missing data.

EDIT: Changed $X_{a}^{T}X_{a}$ to $A$. I was ahead of myself, once I get my desired positive definite matrix I want to set $A_{pd}=A+D=X^{T}_{a}X_{a}$ and take the cholesky decomposition to get $X_{a}$.

Michael Grant
  • 20,110
Mael
  • 793
  • 3
    The smallest not necessarily diagonal matrix you can add to make the matrix $X_a^TX_a$ positive semidefinite is $S=\sum -\lambda_ix_ix_i^T$ where $(x_i,\lambda_i)$ range over only the negative eigenpairs of $X_a^TX_a$. What you are looking for is a diagonal matrix $D$ that "dominates" $S$, in the sense that $D-S$ is positive semidefinite. Not sure if there is a closed-form solution, but I believe it can be solved numerically using semidefinite programming. –  Feb 05 '14 at 19:57
  • 3
    Any matrix of the form $X^TX$ is already positive semidefinite. Furthermore, if it has a zero on the diagonal, that the whole row and column of that zero can contain only zeroes. Hence, your matrix $X_a^TX_a$ is a zero matrix. Assuming this was not your intention, please amend the question, so we can provide a proper answer. – Vedran Šego Feb 05 '14 at 20:28
  • You are right, please ignore $X^{T}X$ and just consider it a matrix $A$. Please see my edit in main post. – Mael Feb 05 '14 at 20:55
  • 1
    Please, can you also clarify what do you mean by "I want it to be as small as possible"? Is it in terms of some norm (in $2$-norm, for example, the $\lambda$-shift you have described is optimal; see N. J. Higham. "Computing the polar decomposition - with applications", SIAM J.Sci. Statist. Comput., 7(4):1160–1174, Oct. 1986.) or how else do you define this? – Vedran Šego Feb 05 '14 at 23:17
  • @VedranŠego thanks for your Higham paper, that provided me with a nice result. Ideally I would like to minimize $\log \det (A+D)$ for D diagonal subject to A+D positive semi definite. – Mael Feb 13 '14 at 16:41
  • 1
    @Lindon Your shift will produce a singular semidefinite matrix, so $\log \det(A+D) = \log 0 = -\infty$. Can't get smaller than that. – Vedran Šego Feb 13 '14 at 18:33
  • oh of course, thanks, sorry for the dumb question. – Mael Feb 13 '14 at 18:40

2 Answers2

4

Yes, as Rahul stated in the comments, this is a semidefinite program, and a relatively straightforward one at that. In fact, it's very similar to the so-called educational testing problem: $$\begin{array}{ll}\text{maximize} & \textstyle\sum_i D_{ii} \\ \text{subject to} & \Sigma - D \succeq 0 \\ & D \succeq 0 \end{array}$$ In the ETP, $\Sigma$ is already positive semidefinite, and you're subtracting as large of a diagonal matrix as possible without changing that. In contrast, your problem is $$\begin{array}{ll}\text{minimize} & \textstyle\sum_i D_{ii} \\ \text{subject to} & \Sigma + D \succeq 0 \\ & D \succeq 0\end{array}$$ But not surprisingly the methods for solving these problems are extremely similar. Of course, this assumes that you like $\sum_i D_{ii}$ as a measure of how much you are perturbing the matrix. You could also consider $\max_i D_{ii}$, $\sum_i D_{ii}^2$, etc.; as long as the measure remains convex, the problem is readily solved.

Any solver supporting semidefinite programming can handle this. Some software I wrote for MATLAB, CVX, makes this easy; so would similar software YALMIP (also for MATLAB) and CVXOPT for Python. In CVX, your model looks like this:

n = size(A,1);
cvx_begin sdp
    variable d(n)
    minimize(sum(d))
    subject to
        A + diag(d) >= 0
        d >= 0
cvx_end
Michael Grant
  • 20,110
  • Cheers Michael (+1 btw). A first look at SDP is quite overwhelming and I was trying to find an example very close to my own, so introducing me to the educational testing problem was a big help. I'm going to try your code out and will let you know the results :) – Mael Feb 06 '14 at 17:31
  • Ah, Michael, an interesting result. For my 8x8 matrix there is one negative eigenvalue and that is -922.8676. I ran your code and the output (rounded) I get for D is d =(0, 919,918, 924, 922, 918, 926, 934), which, except for 0, is pretty much what I was originally doing by adding the identity multiplied by the absolute value of the most negative eigenvalue. I guess I don't know where the 0 is coming from, that requires some thought. It seems Vedran Šego 's response that the lambda-shit is optimal is consistent with the output of your code (except the 0). – Mael Feb 06 '14 at 18:00
  • 1
    Any chance you could just offer up your 8x8 matrix values here somewhere? I would be happy to run CVX here on it and see what the deal is. – Michael Grant Feb 06 '14 at 20:23
  • Hey Michael, cheers, I've uploaded a matlab matrix file which can be found here http://lindonslog.com/example_code/Amatrix I haven't tried it out yet on a matrix that has more than one negative eigenvalue. That result I still have yet to look at. Also, I'm wondering why you have that d>=0 as a constraint? All I care about is that A+D being positive definite – Mael Feb 06 '14 at 23:32
  • Ah, well, of course, your first row and column are all zeros. So of course there's no need to add anything to the $(1,1)$ entry. – Michael Grant Feb 06 '14 at 23:36
  • Well this is interesting. If you just look at the bottom $7\times 7$ sub-matrix, and run the above CVX model, you'll find that sum(d) is very, very close to $-7\cdot\lambda_{\text{min}}(X)$. As precisely as I can compute it, it's within $0.9997$ of it. I really don't have any intuitive reason why it should be the same, but it looks like adding the scaled identity, in this case, really isn't all that suboptimal. – Michael Grant Feb 06 '14 at 23:44
  • Hi Michael, I would like to implement a slight change with cvx. How would I write maximize sum(1./d) subject to the same constraints? – Mael Oct 23 '14 at 02:22
  • 1
    Can't be done with CVX, as that's not a convex model. Maximizations require concave objectives. – Michael Grant Oct 23 '14 at 03:19
2

This is one minimal adjustment to $A$ to make it positive definite, and you get the $LDL^\top$ decomposition in the process:

In computing $L$ and $D$ at the step where you calculate $D_{jj}$, if that value is too small (smaller than some pre-selected $\epsilon>0$), set it to $\epsilon$ and continue.

You need only add just enough to to ensure $D_{jj}$ is positive at each step.

Now $LDL^\top=A+E$ where $E$ is a diagonal matrix with positive entries. If you pre-pivot your matrix $A$ to put the least diagonally dominant rows at the bottom (and columns to the right) you may get better results than without that preconditioning.

Here is some crude MATLAB code that does the trick:

function [L,D] = modifiedLDLT(A,epsilon)
    % http://mathforcollege.com/nm/mws/gen/04sle/mws_gen_sle_txt_cholesky.pdf

    n = size(A,1);

    % check for valid (square, symmetric) input
    m = size(A,2);
    assert(m==n);
    for i=1:n
        for j=i:n
            assert(A(i,j)==A(j,i), 'not symmetric');
        end
    end
    assert(epsilon>0);

    L = zeros(n,n);
    D = zeros(n,n);

    for j=1:n
        Lsum = 0;
        for k=1:(j-1)
            Lsum = Lsum + L(j,k)*L(j,k)*D(k,k);
        end
        D(j,j) = A(j,j)-Lsum;
        if D(j,j) < epsilon
            D(j,j) = epsilon;
        end

        L(j,j) = 1;
        for i=(j+1):n
            Lsum = 0;
            for k=1:(j-1)
                Lsum = Lsum + L(i,k)*D(k,k)*L(j,k);
            end
            L(i,j) = 1/(D(j,j)) * (A(i,j)-Lsum);
        end
    end
end %function
Jeff Snider
  • 2,837
  • This is great! No sense in solving an SDP (which would do tens of Choleskys) unless you're not OK with this particular measure of closeness. – Michael Grant Feb 06 '14 at 04:04
  • It's a nice idea Jeff, I coded it up but it doesn't work. At least, not as far as I can tell. For small epsilon it is not stable, and by small I mean epsilon=0.1 or even epsilon=10. Elements get really large and LL* differs on the off-diagonal elements from A, which fails my criteria. For epsilon=10, the off-diagonal elements of LL* and A are the same, but LL* has a huge diagonal, in which case my initial solution of adding the Identity times the absolute value of the most negative eigenvalue is much better. I've provided some code: http://www.lindonslog.com/example_code/stackexchange.r – Mael Feb 06 '14 at 17:28
  • OK, well, if it doesn't work, then it's not as great as I thought :-) I am familiar with "self-correcting" Cholesky code that provides a sort of "incomplete factorization" for matrices that are just barely indefinite, or appear that way due to roundoff errors in the Cholesky. In fact such code can be very useful for solving semidefinite programs. But perhaps the approach fails if you need to make large modifications. – Michael Grant Feb 06 '14 at 20:26
  • 1
    I'm unsure why it wouldn't work. Admittedly when I used it I wasn't concerned with keeping the off-diagonals unchanged, but it did give a distinctly diagonal $D$. I will write some code of my own to check that it works, or confirm that it doesn't. – Jeff Snider Feb 07 '14 at 02:43
  • I have confirmed that while it "works" in the sense of giving a diagonal $D$, the values of $D$ seem to grow exponentially as you proceed down the diagonal, and it ends up being much larger than simply adding $-\lambda_i I$ for the most negative eigenvalue $\lambda_i$. Now I will dig into my old records and see if I can recover precisely what I was doing previously. – Jeff Snider Feb 08 '14 at 20:06
  • Hey Jeff, yeah that is the result that I was getting. Were you able to make any progress on this? – Mael Feb 13 '14 at 16:45
  • It appears I was doing an $LDL^T$ decomposition rather than Cholesky. I don't have the details of the algorithm handy, but there are plenty of resources on LDLT available online. The essence is the same as my answer: when you find an element of $D$ that would be smaller than $\epsilon$, just set it to $\epsilon$ and continue. If you need more specific details, please let me know. – Jeff Snider Feb 13 '14 at 19:41
  • I've written some quick MATLAB code to confirm that it works, and using this technique in the LDLT decomposition does perform much better than for the Cholesky. If you pre-pivot your matrix so the least "absolutely diagonally dominant" rows are at the bottom you get pretty good results. There is a trade space in how you add to the diagonal -- adding the most negative eigenvalue to every diagonal element gives a different answer than adding larger values only to some rows, and both can produce P.D. matrices. – Jeff Snider Feb 13 '14 at 20:09
  • Hi Jeff, thank you for providing some code! At first I chose epsilon=0.1 and while matlab did not spew out any errors, it did not to what it was supposed to, namely LDLT was not equal to A+E for E some diagonal matrix. I provided an example here: http://www.lindonslog.com/example_code/matlaboutput.txt Choosing epsilon large enough (=10 in the above example) did work. The solution it provided, however, was worse than the lambda-shift or the SPD method. Reducing epsilon in order to get a better solution and it becomes highly unstable, like cholesky. Can you provide any way to assess stability? – Mael Jun 25 '14 at 20:05