Derivation of subgradient of a matrix's nuclear norm

Question

I was going through the derivation of subgradient of the nuclear norm of a matrix from an old homework of a Convex Optimization course (CMU Convex Optimization Homework 2 - Problem 2).

Question Image

The setup is as follows: $A$ is a $m$ by $n$ matrix ($A \in \mathbb{R}^{m \times n}$) with rank $r$ and has singular value decomposition $A = U \Sigma V^T $, where $U \in \mathbb{R}^{m \times r}$, $\Sigma \in \mathbb{R}^{r \times r} $, and $V \in \mathbb{R}^{n \times r} $.

The first part of the proof states that for any $W \in \mathbb{R}^{m \times n} $, with $\Vert W \Vert _{op} \leq 1$, $U^T W = 0$ and $WV = 0$, we have that $\Vert UV^T + W \Vert _{op} \leq 1$. It also suggests that we look at singular value decomposition of $ UV^T + W $ to prove this statement.

In my first attempt, I noticed that $ U^T(UV^T + W)V = I $ and after that, $ UV^T + W = (U^T)_{left}^{-1}IV_{right}^{-1}$ where $(U^T)_{left}^{-1}$, and $V_{right}^{-1}$ are the left inverse of $U^T$ and right inverse of $V$ respectively, and so the singular values of $UV^T + W$ must be $1$. However, this is wrong because $U^T$ and $V$ need not have these left and right inverses.

Can anyone guide me in the right direction to prove this first part using singular value decomposition as suggested? Any help is much appreciated! Apologies if the notations are too messy!

score 1 · Answer 1 · answered Jun 30 '24 at 22:28

When you have a nice set of orthogonality properties like $U^\mathsf{T}W = 0, WV = 0$ and you are asked to analyze a sum of form $UV^\mathsf{T}+W$, you should try expressing everything as a block matrix multiplication to see if things simplify from there. Let $W = U_W\Sigma_WV^\mathsf{T}_W$ be the thin-svd of $W$, that is, $\Sigma_W$ is a square matrix with dimension equal to the rank of $W$. With enough practice, this suggests an expression: $$ UV^\mathsf{T} + W=UV^\mathsf{T} + U_W\Sigma_WV^\mathsf{T}_W = \begin{bmatrix}U & U_W \end{bmatrix}\begin{bmatrix}I\\ & \Sigma_W\end{bmatrix}\begin{bmatrix}V^\mathsf{T}\\V^\mathsf{T}_W \end{bmatrix} $$ From here, we observe that the center matrix satisfies the properties needed to be a potential singular value matrix of $UV^\mathsf{T}+W$. Does the rest also satisfy the needed properties, say to be a valid thin-SVD? This is what the hint suggests, after all. By problem specification, we have that: $$ U^\mathsf{T}W = U^\mathsf{T}U_WS_WV_W^\mathsf{T} = 0, \qquad WV = U_WS_WV_W^\mathsf{T}V = 0 $$ Since all matrices involved are full rank, the only possible way for each expression to evaluate to the zero matrix is if $U^\mathsf{T}U_W = 0$ and $V^\mathsf{T}_WV = 0$. The only thing left to check is that the left/right inverse properties which are: $$ \begin{bmatrix}U & U_W \end{bmatrix}^\mathsf{T}\begin{bmatrix}U & U_W \end{bmatrix} = \begin{bmatrix}U^\mathsf{T}U & U^\mathsf{T}U_W\\U_W^\mathsf{T}U & U_W^\mathsf{T}U_W \end{bmatrix} = \begin{bmatrix}I\\& I \end{bmatrix} $$ where the last step follows from our derived relationship, and: $$ \begin{bmatrix}V^\mathsf{T}\\V^\mathsf{T}_W \end{bmatrix}\begin{bmatrix}V^\mathsf{T}\\V^\mathsf{T}_W \end{bmatrix}^\mathsf{T} = \begin{bmatrix} V^\mathsf{T}V & V^\mathsf{T}V_W\\ V_W^\mathsf{T}V & V_W^\mathsf{T}V_W \end{bmatrix} = \begin{bmatrix}I\\& I \end{bmatrix} $$ once again, by our derived relationship. With all of this complete, we conclude that: $$ UV^\mathsf{T} + W = \begin{bmatrix}U & U_W \end{bmatrix}\begin{bmatrix}I\\ & \Sigma_W\end{bmatrix}\begin{bmatrix}V^\mathsf{T}\\V^\mathsf{T}_W \end{bmatrix} $$ is a valid (block partitioned) singular-value decomposition, and by problem specification, all of the singular values in $\Sigma_W$ are 1 (or smaller). To me, the block expression and recognizing the orthogonal complements are the key steps, while expressing everything as a thin-SVD cleans everything up.

Derivation of subgradient of a matrix's nuclear norm

1 Answers1