17

I have a (biological) computational system that outputs square matrices. Sometimes, these matrices are diagonal-like, with higher values at and around the diagonal.

I would like to have some summary measure on how "much diagonal" a matrix is, so that I can batch-process hundreds of outputs and score them on how much the higher entries cluster in and around the diagonal.

Any ideas of some standard approach that I can generalise?

lourencoj
  • 273
  • What do the rows and columns represent? Are the matrix entries counts of events? This affects the choice. – Tad Aug 11 '15 at 01:32
  • They are frequencies of events! – lourencoj Aug 11 '15 at 08:20
  • 1
    Do the rows and columns represent values or just categories? In the first case you get credit for being "near" the diagonal; in the second case you're either on it or you're not. – Tad Aug 11 '15 at 11:15
  • It should get credit for being close to the diagonal – lourencoj Aug 11 '15 at 14:49

2 Answers2

20

Given that your entries are frequencies, and you want to give credit for being "close" to the diagonal, a natural approach is to compute the correlation coefficient between the row and column. That is, suppose your matrix is built as follows: repeatedly generate a pair of numbers $x$ and $y$, and increment the count of the matrix entry at position $(x,y)$. If you think of $x$ and $y$ as samples of random variables $X$ and $Y$ respectively, then the sample correlation coefficient $r$ of $X$ and $Y$ lies between $-1$ and $1$. It is $1$ if $X$ and $Y$ are perfectly correlated, $-1$ if they are perfectly anticorrelated. The point is that $X$ and $Y$ are perfectly correlated (in this case, equal) precisely when the matrix is diagonal, strong correlation means the matrix entries tend to be near the diagonal.

This is robust: the correlation coefficient is unchanged if you scale the matrix (and the formula turns out to make sense even if your entries are nonnegative real numbers).

If you adapt the formulas in the above reference to this situation, they take the following form. Let $A$ be a $d\times d$ matrix; let $j$ be the $d$-long vector of all ones, and let $r=(1,2,\ldots,d)$ and $r_2=(1^2,2^2,\ldots,d^2)$. Then:

$$\begin{align} n &= j A j^T \textrm{ (the sum of the entries of $A$) }\\ \Sigma x &= r A j^T\\ \Sigma y &= j A r^T\\ \Sigma x^2 &= r_2 A j^T\\ \Sigma y^2 &= j A r_2^T\\ \Sigma xy &= r A r^T\\ r &= \frac{n\, \Sigma xy -\Sigma x\, \Sigma y}{\sqrt{n\, \Sigma x^2 - (\Sigma x)^2}\sqrt{n\, \Sigma y^2 - (\Sigma y)^2}} \end{align}$$

Some examples:

Diagonal matrix: $\left( \begin{array}{cccc} 1. & 0. & 0. & 0. \\ 0. & 5. & 0. & 0. \\ 0. & 0. & 30.5 & 0. \\ 0. & 0. & 0. & 3.14159 \\ \end{array} \right): \quad r=1.000000$

Diagonally dominant matrix: $\left( \begin{array}{ccc} 6 & 1 & 0 \\ 1 & 5 & 2 \\ 1 & 3 & 6 \\ \end{array} \right): \quad r=0.674149$

Uniformly distributed on $[0,1]$: $\left( \begin{array}{cccc} 0.2624 & 0.558351 & 0.249054 & 0.484223 \\ 0.724561 & 0.797153 & 0.689489 & 0.273023 \\ 0.462727 & 0.119412 & 0.911981 & 0.636588 \\ 0.089544 & 0.160899 & 0.910123 & 0.549202 \\ \end{array} \right): \quad r=0.233509$

Tridiagonal: $\left( \begin{array}{ccccc} 2 & 1 & 0 & 0 & 0 \\ 1 & 3 & 2 & 0 & 0 \\ 0 & 2 & 3 & 4 & 0 \\ 0 & 0 & 1 & 2 & 3 \\ 0 & 0 & 0 & 1 & 1 \\ \end{array} \right): \quad r=0.812383$

Tad
  • 6,794
  • Hey Tad. This is great! This will be our approach. Thanks so much. – lourencoj Aug 12 '15 at 21:49
  • @Tad, I've got a similar task to measure the degree of "diagolizedness" but my matrix can be rectangular, not just square. What would you say? Can you expand your solution for that case? – ttnphns Aug 27 '15 at 10:52
  • @ttnphns You need to be clear about what you mean by "diagonalness" in this context. – Tad Aug 27 '15 at 11:03
  • The concentration of values (greeting their magnitude, too), close to the main diagonal (from top-left corner), as before. Fully diagonal matrix will be when all nonzero elements lie on it. – ttnphns Aug 27 '15 at 11:34
  • 2
    If you adapt the formulas in the above reference... I must say that your adaptation is not transparent at all. Just to mention, $n$ in correlation coefficient is the number of observations (X,Y pairs). In your formulas, it is suddenly the sum of values in the matrix. To me, your approach remains unclear (albeit it could be perfect). Can you elucidate your computations using the theoretical (the 1st, not the 2nd) formula of $r$ of the wikipedia, the formula using the means? – ttnphns Aug 27 '15 at 12:33
  • @ttnphns my method doesn't extend well to the non-square case. You could do something like compute $s=\sum |a_{ij}|(j-i)^2$, which is zero only for diagonal matrices, and define the diagonalness to be $e^{-s}$ or $1/(1+s)$ or something. As before you could easily normalize it so that scalar multiplication doesn't change the diagonalness. – Tad Aug 28 '15 at 02:12
  • @Tad, yes. I independently of you have arrived at the similar general solution. My formula was $s=\sum |a_{ij}||j-i|$, and the final coefficient $1-s/s_{max}$, where $s_{max}$ is the most "far-from diagonal" solution - obtained by co-sorting descendingly the $n$ (the number of nonzero elements) $|a|$s along with the $n$ largest $|i-j|$ values existing in the matrix. – ttnphns Aug 28 '15 at 09:29
  • Hi! This looks very promising, but I don't understand why $n$ becomes the sum of the entries of the matrix. What are the random variables we are calculating the correlation for? Thank you! – milo Jun 08 '16 at 09:05
  • @milo In the original application, the matrix entries were frequencies; they generated a row $X$ and column $Y$ via some process, and incremented a counter. So the $(i,j)$ entry in the matrix represents the number of times a certain outcome $(i,j)$ occurred. So the total number of experiments ($n$) is the sum of all the entries in the matrix. – Tad Jun 08 '16 at 11:59
  • So if the values always fall in the diagonal $(x=y)$ they always co-occur, right? But the idea can be applied to matrices whose entries are not frequencies, right? Particularly, I have square variance-covariance matrices. Thanks! – milo Jun 09 '16 at 11:15
  • @milo yes, that's why I included some examples of matrices where the entries aren't even integers. – Tad Jun 09 '16 at 18:54
  • (+1). I am wondering what you do in the following case: {{-2, 1, 0, 0}, {1, -2, 1, 0}, {0, 1, -2, 1}, {0, 0, 1, -2}}. That SPD matrix gives a 5/3 measure. Any idea? Thanks. – user21 Aug 29 '16 at 16:30
  • The matrix above is not SPD, but the issue remains, I think. – user21 Aug 29 '16 at 17:13
  • @user21 Thank you for underscoring the point that a solution designed for one scenario may not work in another. I verified with my first question to the OP that the matrices in question consist of frequencies or counts of events, and provided a caveat to that effect in my solution. I have not thought about how to interpret $r$ in instances like yours: in some sense by allowing negative numbers you get a matrix which is "more diagonal than possible." – Tad Aug 29 '16 at 17:34
  • A = np.array([[1,0,0],[0,2,0],[0,0,3]]) d = A.shape[0] j = np.ones(d) n = j.dot(A.dot(j.T)) r = np.arange(d) + 1 r2 = r ** 2 Sx = r.dot(A.dot(j.T)) Sy = j.dot(A.dot(r.T)) Sx2 = r2.dot(A.dot(j.T)) Sy2 = j.dot(A.dot(r2.T)) Sxy = r.dot(A.dot(r.T)) r = (n*Sxy - Sx*Sy) / (((n*Sx2 - Sx ** 2) ** 0.5) * ((n*Sy2 - Sy ** 2) ** 0.5)) – e271p314 Feb 21 '23 at 06:48
5

Here's an easy one. Let $M$ be your measured matrix, and $A$ be the matrix which agrees with $M$ along the diagonal, but is zero elsewhere. Then pick your favorite matrix norm (operator probably works well here) and use $\|M-A\|$ as your measurement.

If you want more fine tuned understanding of 'clustering', instead of making all the entries off the diagonal $0$, weight them by what band they are on. So the super and sub diagonal might take half the corresponding value in $M$.

Zach Stone
  • 5,812