Why are the following equivalent formulations of the PCA optimization problem correct?

Question

In these notes, the author formulates the PCA problem as follows. Given a matrix of data $X$ the PCA problem is:

$$\text{min}_Y \|Y - X\|_F \qquad \text{rank}(Y) = k$$

This reads to me as obviously correct.

But then the author claims that this problem is the same one as,

$$\text{min}_Z \|X^\top X - Z\|_F \qquad \text{rank}(Z) = k \qquad \text{Z is a projection}$$

This seems to be pulled out of nowhere and ill-formulated because projection is undefined.

I found another set of notes which seems to be along the line of this reformulation of the original PCA problem. On page 25 it reads that the PCA problem is the same as,

$$\text{min}_Z \|X - XZ\|_F \qquad \text{rank}(Z) = k \qquad \text{Z is a projection}$$

(Again, projection is not defined).

I am puzzled as to how this projection term appears. It seems that the authors of these notes simply replaced $Y$ with $Y = XZ$. But then wouldn't the condition be $\text{rank}(XZ) = k$? Does $Z$ being a "projection" somehow allows us to make the argument $\text{rank(XZ)} = \text{rank}(Z), \text{Z a projection}$?

In this context, a "projection" is probably an orthogonal projection matrix, which is to say that $Z$ should be a symmetric matrix satisfying $Z^2 = Z$. — Ben Grossmann, Mar 03 '25 at 15:50
I find the second formulation with $X^\top X$ is completely unfamiliar, so I suspect that there is a typo involved somewhere — Ben Grossmann, Mar 03 '25 at 15:52
The third formulation that you found in the other notes is a valid alternative formulation of PCA. It so happens that the optimal $Y$ for your first formulation and $XZ$ for the optimal $Z$ in that third problem are the same matrix (which is the "best low rank approximation" of $X$ that you get through PCA). — Ben Grossmann, Mar 03 '25 at 15:55
Your first variation seems to have a typo' (relative to the notes), the condition in the notes is "$\mathrm{rank}(X) = k$". — Eric Towers, Mar 03 '25 at 16:57
In the first and the third problems, when the rank of $X$ is smaller than $k$, the minimum does not exist (although there is an infimum). The second problem is obviously different from the other two. E.g. when $X=2I_k$, the minimum values in the first and the third problems are zero, but in the second one, the minimum is positive. — user1551, Mar 03 '25 at 17:01
The second formulation is "minimize the residual over all $k$-dimensional projections of the given covariance matrix". PCA of covariance can be implemented by eigendecomposition, the projection is just keeping a rank $k$ subspace and the minimization is requiring the the big energy PCs are preserved by the projection. Some relevant discussion here: https://en.wikipedia.org/wiki/Principal_component_analysis#Singular_value_decomposition — Eric Towers, Mar 03 '25 at 17:18

score 3 · Accepted Answer · answered Mar 03 '25 at 16:16

As I explain in my comments, both the first and third optimizations are valid ways of framing PCA. However, I would not say that these are "equivalent" optimization problems, even if they do lead to the same approximation $Y$ of $X$.

The "least constrained" optimization problem that leads you to the approximation of $X$ that you're after is as follows: $$ \min_Y \|X - Y\|_F \quad \text{s.t.} \quad \text{rank}(Y) \leq k. \tag{1} $$ As it so happens, as long as $\text{rank}(X) \geq k$, the optimal value of $Y$ (which I'll call $Y^*$) will satisfy $\text{rank}(Y^*) = k$. In other words, you never "accidentally" or "luckily" end up with a $Y$ of a lower rank than the upper bound you set. In other words, this formulation gives you the same optimum as the problem
$$ \min_Y \|X - Y\|_F \quad \text{s.t.} \quad \text{rank}(Y) = k. \tag{2} $$

Going from (1) to (2) is one way of applying a constraint to arrive at the same solution. Another way to apply a constraint is to use a "recipe" for the low-rank matrices $Y$. It turns out that a "natural" way to produce a matrix $Y$ with rank at most $k$ that approximates $X$ is to choose a subspace with dimension at most $k$, then use a projection matrix $Z$ to project the rows of $X$ onto that subspace. That is, set $Y = XZ$, where $Z$ is a projection matrix with $\text{rank}(Z) \leq k$. With that, we end up with the problem $$ \min_Y \|X - XZ\|_F \quad \text{s.t.} \quad \text{rank}(Z) \leq k. \tag{3} $$ As it so happens, as long as $\text{rank}(X) \geq k$, the optimal $Z$ here always turns out to be a rank-$k$ projection, and this projection so happens to be such that $\text{rank}(XZ) = k$. As such, this problem has the same solution as the further-constrained optimization $$ \min_Y \|X - XZ\|_F \quad \text{s.t.} \quad \text{rank}(Z) = k. \tag{4} $$ Unfortunately, these facts that "so happen" to hold are difficult to justify directly. The only proofs that I'm aware of use the truncated-SVD formula for the optimal approximation $Y$ in one way or another.

One fact that is straightforward to show, however, is the following:

Claim: If $Y = Y^*$ is the minimizer for $(1)$, then there exists a projection matrix $Z^*$ with rank at most $k$ such that $Y^* = XZ^*$.

If you're interested, I would suggest that you try to prove this yourself.

Why are the following equivalent formulations of the PCA optimization problem correct?

1 Answers1