3

I'm deriving a gradient for a method in a control system engineering setting, and I have this subproblem in the gradient's derivation.

I have a matrix $\mathbf{B}$ $\in \mathbb{R}^{m \times n}$, and matrix $\mathbf{A}$ $\in \mathbb{R}^{m \times n}$. I need to derive $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$ $\in \mathbb{R}^{mn \times mn}$.

I know that these matrices have nearly identical SVDs. For some scalar function $\mathcal{f}(.)$ that is the same for all singular values, $\mathbf{B}$ and $\mathbf{A}$ are related by:

$\mathbf{A}$ $=$ $\mathbf{U}$ $\mathbf{S}_{\mathbf{A}}$ $\mathbf{V}^\top$ , $\mathbf{B}$ $=$ $\mathbf{U}$ $\mathcal{f} ( \mathbf{S}_{\mathbf{A}} )$ $\mathbf{V}^\top$ .

Thus, $\mathbf{B}$ and $\mathbf{A}$ have the same singular vectors $\mathbf{U}$ and $\mathbf{V}$, and the $k$ singular value of $\mathbf{B}$, denoted by $(s_{\mathbf{B}})_k$, is given by $(s_{\mathbf{B}})_k$ $=$ $\mathcal{f}( (s_{\mathbf{A}})_k) $, where $(s_{\mathbf{A}})_k$ is the $k$th singular value of $\mathbf{A}$, and $\mathcal{f}(.)$ is a scalar function that is the same for all singular values.

I am trying to use these recourses to derive the Jacobian: "Estimating the Jacobian of the Singular Value Decomposition: Theory and Applications", and "An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation". I thought a good exercise would to try to first derive $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$ when we specifically have $\mathcal{f} ( (s_{\mathbf{A}})_{k} ) $ $=$ $(s_{\mathbf{A}})_{k}$, in which case $\mathbf{B}$ and $\mathbf{A}$ are identical, and thus if the derivations are correct we would have $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$ $=$ $\mathbf{I}_{mn}$. Then to proceed from this trivial case, I would then modify the derivation to have $\mathcal{f}(.)$ be any function and see what form $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$ takes with this generalization. However, I am greatly struggling to see how to derive this, because the Jacobian of the singular vectors of a matrix with respect to the matrix are confusing to work with.

When I was writing this question, stackexchange recommended this post: "Derivative of a matrix function that applies on the singular values". This question is almost exactly the same as mine, except (1) I am not applying a function $g(.)$ to anything, and (2) I need $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$ $\in \mathbb{R}^{mn \times mn}$, preferably in matrix form rather than elementwise entries (although elementwise entries are OK, I ultimately would need to construct $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$ from those elements).

edit: I also found some incomplete notes related to this problem, in a doc called: "notes on the SVD". In that paper, I basically want to derive the Jacobian of eq (2) w.r.t. eq (1).

Cal
  • 107
  • 1
    Are $\mathbf U, \mathbf V$ considered to be constant matrices in this context, or should we think of them as being a function of $\mathbf B$ somehow? – Ben Grossmann Jan 18 '25 at 18:24
  • I think $\mathbf{U}$ and $\mathbf{V}$ have to be considered as functions of $\mathbf{B}$, unfortunately. Though I wonder if there's a shortcut in the proof that avoids needing for instance $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{U})}$, or $\frac{\partial \text{vec}(\mathbf{U})}{\partial \text{vec}(\mathbf{B})}$, or $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{V})}$, or $\frac{\partial \text{vec}(\mathbf{V})}{\partial \text{vec}(\mathbf{B})}$, since $\mathbf{B}$ and $\mathbf{A}$ have the same singular vectors... – Cal Jan 18 '25 at 18:34
  • although if $\mathbf{B}$ and $\mathbf{A}$ have the same singular vectors, the non-reliance on the Jacobian of the singular vectors (if my above comment is right) may allow one to treat $\mathbf{U}$ and $\mathbf{V}$ as constant matrices... – Cal Jan 18 '25 at 18:36
  • 2
    This paper by Vanni Noferini may be of interest. – greg Jan 18 '25 at 21:33
  • thank you greg, very helpful! Yes, then $\mathbf{B}$ is the generalized matrix function of $\mathbf{A}$ w.r.t. some $\mathcal{f}(.)$ . I read the paper, and I believe that Vanni actually derived $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$ , expressed as equation (4.3), but I am not certain as my understanding is weak. Can someone confirm whether this is true? Is the Kronecker form of the Fréchet derivative the same as saying $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$? – Cal Jan 20 '25 at 00:32
  • hi greg, I tested that equation (4.3) in the instance when $(s_{\mathbf{B}})k$ $=$ $\mathcal{f}( (s{\mathbf{A}})k) $ $=$ $(s{\mathbf{A}})k$, in which case $\mathbf{B}$ $=$ $\mathbf{A}$, and thus $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$ should be the identity matrix. In that case, (4.3) does indeed simplify to $\mathbf{I}{mn}$ $\in \mathbb{R}^{mn \times mn}$, so (4.3) may in fact be $\frac{\partial \text{vec}(\mathbf{B})}{\partial \text{vec}(\mathbf{A})}$!! But if anyone is reading this, I would greatly appreciate a confirmation as a sanity check! – Cal Jan 20 '25 at 01:09
  • 1
    $\def\v{\operatorname{\partial vec}}$ Your understanding is correct: $\sf; Equation;4.3$ in the paper represents $\large\frac{\v(B)}{\v(A)}$ – greg Jan 20 '25 at 05:26

0 Answers0