1

I'm working on understanding the derivation of the solution for principal components analysis.

Let $\mathbf{S} \in \mathbb{R}^{p \times p}$ be a positive semi-definite matrix with rank $d < p$. Let $[\vec{w}_1 \; \cdots \; \vec{w}_d] = \mathbf{W} \in \mathbb{R}^{p \times d}$ be a matrix of column vectors. We seek $d$ principal components. In other words, our maximization problem is this: $$\text{maximize} \qquad\mathrm{trace}(\mathbf{W}^{T}\mathbf{S}\mathbf{W}) \quad \text{s.t.}\; \mathbf{W}^{T}\mathbf{W} = \mathbf{I}_d$$

So the columns of $\mathbf{W}$ need to be orthogonal to each other. Due to the orthogonality constraint, this problem is not convex.

I tried encoding the constraint as follows to get the Lagrangian:

$$ L(\mathbf{W},\Lambda) = \mathrm{trace}(\mathbf{W}^{T}\mathbf{S}\mathbf{W}) + \mathrm{trace}\left((\mathbf{I}_d - \mathbf{W}^{T}\mathbf{W})\Lambda\right) $$ Where $\Lambda \in \mathbb{R}^{d \times d}$ is a non-negative symmetric matrix.

Taking the derivative and setting the equation to zero yields:

$$ \frac{\partial L}{\partial \mathbf{W}} = (\mathbf{S} + \mathbf{S}^{T} )\mathbf{W} - (\mathbf{I}_d + \mathbf{I}_d)\mathbf{W}\Lambda :=\mathbf{0} \quad \implies \quad \mathbf{S}\mathbf{W}^\star = \mathbf{W}^\star\Lambda $$

Which looks like an eigendecomposition of $\mathbf{S}$.

At this point I'm stuck. I don't know how to show that this would be optimal or that strong duality holds in this situation... or if it's even possible to show this.

PS: I found a proof of this in Matrix Analysis (Horn and Johnson), but they did not take a calculus approach to this.

0 Answers0