When we are using spectral clustering methods, we often construct similarity matrices $S$ between data, and use the similarity matrix to derive the Laplacian matrix $L$ for further clustering. But in some recent work, the author directly used the similarity matrix $X^TX$ instead of $L$ to do clustering. Why is this approach possible?
-
I am not expert in this domain, but you may be interested as well by this question – Jean Marie Oct 13 '22 at 07:19
-
1Thank you for you recommend, but that question is about how the eigenvectors of the Laplacian can work. And for me, the eigenvectors of the similartity matrix may not have the same properties. – Haoyi Lei Oct 13 '22 at 08:24
1 Answers
The Laplacian matrix of a graph is the Gram matrix of the signed incidence operator $B$, having entries $$ B(i, e) = \begin{cases} 1 &\text{if } e = (x, i) \\ -1 &\text{if } e = (i, x) \\ 0 &\text{otherwise,} \end{cases} $$ where $i$ ranges over the vertices of the graph and $e$ over the oriented edges. The Laplacian matrix has relatively simple properties:
- $\mathrm{null}(L)$ is spanned by vectors that are constant over the connected components of the graph.
- $L$ is positive semidefinite.
The second property is a consequence of $L = B B^T$ and the first occurs because the columns of $L$ are linearly dependent.
To extend this definition to the realm of random matrices, we generalize our definition of $B$ using a function $w: E \to \mathbb F$ that assigns each edge a weight in a field $\mathbb F$. If $e$ represents the unoriented edge corresponding to $(u, v)$, the signed incidence operator of our weighted graph is the matrix $$ B(i, e') = \begin{cases} \sqrt{w(e)} &\text{if $e'$ is positively oriented} \\ -\sqrt{w(e)} &\text{if $e'$ is negatively oriented} \\ 0 & \text{otherwise.} \end{cases} $$ Where $i$ ranges across the vertices and $e'$ is an edge $e \in E$ with an orientation $\mu \in \{+, -\}$. The matrix $L = B B^T$ is positive semidefinite and it's nullspace is spanned by vectors constant on the connected components of the graph. Conversely, for any positive semidefinite $L$ with zero row sums, there exists some undirected, weighted graph $G = (V, E, w)$ of which that matrix is the Laplacian.
In the publication, the construction of the matrix $W$ pulls from a normal distribution centered at zero, and examining the spectral properties of the random matrix $W^T W$ we observe that the eigenvalues are heavily clustered near zero, meaning that $W^T W$ is "almost" a Laplacian matrix. The remainder of the paper is dedicated to showing that the spectra of $W^T W$ and $W W^T$ also satisfy several other "Laplacian-like" properties. In other words, $W^T W$ can be used instead of the Laplacian matrix because it is "sufficiently Laplacian" in the statistical sense, and sampling columns from various classes creates matrices that are "approximately incidence matrices."
To sum it up, a similarity matrix can be used instead of a Laplacian matrix for spectral clustering when its Gram matrix is nearly a Laplacian matrix.
- 360