let's say I have a "data" matrix $X$ of $N$ rows and $p$ cols with $N \gg p$. Now PCA with $L$ components can be formulated as $$X_L = argmin_{Y:rank(Y) = L} ||X- Y||^2_F $$, where Y is taken to be an $L$-dimensional approximation of $X$.
It's known that $X_L = X W_L W^T_L$ (see here e.g.) where $W_L$ is $pxL$ matrix of first $L$ principal components of $X$. However, suppose I instead want to choose an arbitrary orthonormal basis $w_1, w_2, ... w_p$ and construct $W_L = [w_1 w_2 ... w_L]$ for $L = 1,2, .. p$. Am I guaranteed that the "unexplained" variance $||X- X_L||^2_F$ is going to monotonically decrease in $L$, the number of dimensions I retain, if the matrix $W_L$ is built iteratively (that is, if we fixed $w_1$ we may only add $w_2, w_3$ etc, but never throw away $w_1$)? It feels like the answer is "yes", but maybe I'm missing something? It definitely decreases from $L = 1$ to $L = p$.