44

I am representing my 3d data using its sample covariance matrix. I want to know what the determinant of covariance Matrix represents. If the determinant is positive, zero, negative, high positive, high negative, what does it mean or represent?

Thanks

EDIT:

Covariance is being used to represent variance for 3d coordinates that I have. If my covariance matrix A determinant is +100, and the other covariance matrix B determinant is +5. Which of these values show if the variance is more or not. Which value tells that data points are more dispersed. Which value shows that readings are further away from mean.

orange14
  • 583

4 Answers4

40

I would like to point out that there is a connection between the determinant of the covariance matrix of (Gaussian distributed) data points and the differential entropy of the distribution.

To put it in other words: Let's say you have a (large) set of points from which you assume it is Gaussian distributed. If you compute the determinant of the sample covariance matrix then you measure (indirectly) the differential entropy of the distribution up to constant factors and a logarithm. See, e.g, Multivariate normal distribution.

The differential entropy of a Gaussian density is defined as:

$$H[p] = \frac{k}{2}(1 + \ln(2\pi)) + \frac{1}{2} \ln \vert \Sigma \vert\;,$$

where $k$ is the dimensionality of your space, i.e., in your case $k=3$.

$\Sigma$ is positive semi-definite, which means $\vert \Sigma \vert \geq 0$.

The larger $\vert \Sigma \vert$, the more are your data points dispersed. If $\vert \Sigma \vert = 0$, it means that your data ponts do not 'occupy the whole space', meaning that they lie, e.g., on a line or a plane within $\mathbb{R}^3$. Somewhere I have read, that $\vert \Sigma \vert$ is also called generalized variance. Alexander Vigodner is right, it captures the volume of your data cloud.

Since a sample covariance matrix is defined somewhat like: $$\Sigma = \frac{1}{N-1} \sum_{i=1}^N (\vec{x}_i - \vec{\mu})(\vec{x}_i - \vec{\mu})^T\; $$ it follows, that you do not capture any information about the mean. You can verify that easily by adding some large constant vectorial shift to your data; $\vert \Sigma \vert$ should not change.

I don't want to go to much into detail, but there is also a connection to PCA. Since the eigenvalues $\lambda_1, \lambda_2, \lambda_3$ of $\Sigma$ correspond to the variances along the principal component axis of your data points, $\vert \Sigma \vert$ captures their product, because by definition the determinant of a matrix is equal to the product of its eigenvalues.

Note that the largest eigenvalue corresponds to the maximal variance w.r.t. to your data (direction given by the corresponding eigenvector, see PCA).

JTB
  • 558
tmp
  • 401
18

It cannot be negative, since the covariance matrix is positively (not necessary strictly) defined. So all it's eigenvalues are not negative and the determinant is product of these eigenvalues. It defines (square root of this) in certain sense the volume of n (3 in your case) dimensional $\sigma$-cube. It is analog $\sigma$ for 1 dimensional case.
Notice that mulitvarite normal distribution is defined as $$ f_{\mathbf x}(x_1,\ldots,x_k) = \frac{1}{\sqrt{(2\pi)^k|\boldsymbol\Sigma|}} \exp\left(-\frac{1}{2}({\mathbf x}-{\boldsymbol\mu})^T{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu}) \right), $$ Here $|\Sigma|$ is determinant of $\Sigma$.

  • Sorry but your answer does not help me. Covariance is being used to represent variance for 3d coordiantes that I have. If my covariance matrix A determinant is +100, and the other other covariance matrix B determinant is +5. Which of these values show if the variance is more or not. Which value tells that datapoints are more dispersed. Which value shows that readings are further away from mean. – orange14 Aug 06 '14 at 21:21
  • 1
    Determinant is not so good characteristic for this. Use eigenvalues. – Alexander Vigodner Aug 06 '14 at 21:28
  • 3
    Assume you have in 3 dimensional space eigenvalues $\lambda_1=\lambda_2=\lambda_3=1$ and in the second case $\lambda_1=1$, $\lambda_2=100$, $\lambda_3=0.01$. In both cases the determinant will be 1 but these two systems are completely different. This is why the comparing of determinants is not a good idea unless you have additional properties of your matrix $A$ $B$ that can help. – Alexander Vigodner Aug 06 '14 at 22:12
  • 1
    so with eigen values what can I deduce. Which of these eigen values show if the variance is more or not. Which eigen value tells that data points are more dispersed. Which eigen value shows that readings are further away from mean – orange14 Aug 06 '14 at 22:49
  • As tmp explained eigenvalues define variances along eigenvectors or principal component. – Alexander Vigodner Aug 07 '14 at 00:19
13

The determinant of the covariance matrix is referred to as generalized variance by Wilks in 1932. Comparing the density of the univariate and multivariate normal, it is easy to see that $|\Sigma|$ plays a similar role to $\sigma^2$.

This has several interpretations (see for example Anderson 2003, Section 7.5)

  • A geometric interpretation: it is proportional to the volume of the ellipsoid $\left\{u \in \mathcal{R}^{k} \mid(u-\mu)^{\prime} \Sigma^{-1}(u-\mu)=c^{2}\right\}$
  • An entropy interpretation, as discussed by @tmp

Relation to generalized correlation?

If $|\Sigma|$ is the generalized variance, is there also a generalized correlation? Defining $\sqrt{1-\frac{|\Sigma|}{\sigma_1^2\cdot\ldots\cdot\sigma^2_N}}$, this is called sometimes the collective correlation coefficient. You can verify that for N=2, this is indeed the usual correlation coefficent: $\sqrt{1-\frac{\sigma_1^2\sigma_2^2-\rho \sigma_1^2\sigma_2^2}{\sigma_1^2\sigma^2_2}}=\sqrt{1-(1-\rho^2)}=\rho$

References

Anderson, T. W., An introduction to multivariate statistical analysis., Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley (ISBN 0-471-36091-0/hbk). xx, 721 p. (2003). ZBL1039.62044.

Wilks, S. S., Certain generalizations in the analysis of variance., Biometrika 24, 471-494 (1932). ZBL58.1172.02.

Matifou
  • 193
0

It might help to break down the parts "determinant" and "covariance".

The determinant generally gives you the magnitude of a matrix transformation. You could think about it as how "big" it is.

The covariance matrix gives you how variables in the matrix vary with each other.

Thus the determinant of the covariance matrix gives you the measure of magnitude of how much the variables "vary" with eachother.

In the case of comparing a determinant of matrix A with 100 vs a determinant of matrix B which is 5, the smaller determinant would suggest that the data you are looking at in matrix b has variables that are more independent of each other compared to the variables in matrix A.

jarvis
  • 111
  • 2
    Isn't this answer wrong? If you have two variables (for sake of simplicity) with given autocovariances, then an increased covariance between the variables (i.e. an increased dependence between the variables) would reduce the determinant of the covariance matrix. This would mean that a smaller determinant is not equivalent to more independence between variables. – AlphaOmega Jul 02 '20 at 15:38