I'm reading the book Pattern Recognition and Machine Learning by Christopher Bishop, and on page 80, with regard to the multivariate gaussian distribution:
$$ \mathcal{N}(\mathbf{x} | \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{(2\pi)^{D/2}}\frac{1}{| \boldsymbol{\Sigma}|^{1/2}}~ \exp \biggl \{ -\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^{\mathrm{T}}~ \boldsymbol{\Sigma}^{-1}~(\mathbf{x} - \boldsymbol{\mu}) \biggr \} $$ it says:
First of all, we note that the matrix $ \boldsymbol{\Sigma} $ can be taken to be symmetric, without loss of generality, because any antisymmetric component would disappear from the exponent.
It's not clear to me what this means. Can someone explain?
When I plot such a distribution (e.g. using octave) using a non-symmetric matrix, I still get a valid distribution out. E.g. if I use $ \boldsymbol{\Sigma}$ = [1, 0.25; 0.5, 1], I get something out that looks half-way between $ \boldsymbol{\Sigma}$ = [1, 0.25; 0.25, 1], and $ \boldsymbol{\Sigma}$ = [1, 0.5; 0.5, 1].
Does the phrase "without loss of generality" here simply imply that for any asymmetric $ \boldsymbol{\Sigma}$ there is an equivalent symmetric one which would have resulted in the exact same Mahalanobis distance, and therefore we might as well only deal with the symmetric versions for mathematical convenience?