0

I want to get the derivative of function $f(B) = w'B's$ with respect to $B$, where $w,s$ are column vectors, then $df = w'(B+dB)'s - w'B's = w'dB's = s'(dB)w$. I know the answer is $sw'$, but I could not go further. I tried to use the Kronecker product to write it as $vec(s'(dB)w) = w'\otimes s' *vec(dB)$ and I believe it is correct. But how to connect $w'\otimes s'$ with $sw'$?

I have a followup question: How about the function $g(B) = w'B's [\nabla_{B}f(B)]$? I can use the product rule $dg = d(w'B's)[\nabla_{B}f(B)]+ w'B's \underbrace{d(\nabla_{B}f(B))}_{=0} = d(w'B's)[\nabla_{B}f(B)]$. I feel difficult to unify these notations. The Kronecker product cannot be directly used for this question.


Update: Gradient of $X \mapsto a^T X b$ seems to be relevant, but my question is really how to connect between the Kronecker product to the classic derivative notation, i.e., $\frac{\partial f}{\partial B}$ or $\nabla_Bf$.

Moreover, for the second derivative, it is a $\mathbb{R}^{n\times m\times n\times m}$ tensor, and using Kronecker product, I can write it as a $\mathbb{R}^{n m\times n m}$ matrix: $\text{vec}(s\omega^\top)\omega^\top\otimes s^\top$. But how to use this thing, for example, if I want to prove the original function $f$ is convex, can I just prove it by showing this matrix is PSD?

Update: for my first question, I have figured out an answer, that is $s\omega' = \text{vec}^{-1}(w'\otimes s')$ if we memorize the $B$'s shape.

  • I think what you are trying to get requires index notation instead of using the $\rm vec$ operator. If you write $f=w_i B'{ij} s_j$ then $df=w_i dB'{ij} s_j=w_i dB_{ji} s_j$ and so $\partial f/\partial B'{ji}=\partial f/\partial B{ij}=w_i s_j$. You can rewrite this as $\partial f/\partial B=ws'$ or as $\partial f/\partial B'=sw'$. – Ted Black Mar 08 '25 at 08:37
  • 1
    You can approach the problem using $\lbrace\star,:\rbrace$ to denote the dyadic and double-dot products $$\eqalign{ \def\q{\quad} \def\qiq{\q\implies\q} \def\T{{\sf T}} \def\g#1#2{\frac{\partial#1}{\partial#2}} M &\equiv sw^\T,\q f = M:B &\qiq \g fB = M \ g &= Mf = M\star M:B &\qiq \g gB = M\star M \ }$$ – greg Mar 08 '25 at 16:02
  • I voted to close as duplicate before you updated the question. If you think the closure was unfair or unwarranted, please let me know and I will vote for reopening. – Rodrigo de Azevedo Mar 09 '25 at 18:34

1 Answers1

0

For me, the best way to look at derivatives of scalar-valued functions w.r.t. (with respect to) a vector/matrix is to think of them as the collection of all the partial derivatives w.r.t. each entry: $$ \bigl[\nabla_Bf(B)\bigr]_{n,m}=\frac{\partial f(B)}{\partial b_{n,m}} $$ where $b_{n,m}$ is the entry of $B$ in row $n$ and column $m$. Then, we have $$ f(B)=w'B's=\sum_{i,j}w_ib_{j,i}s_j \implies \frac{\partial f(B)}{\partial b_{n,m}}=w_ms_n \implies \nabla_Bf(B) = s w'. $$

If you are comfortable taking derivatives with respect to vectors, you can get to the same result as follows. Let $\beta = \operatorname{vec}(B)$ and $F(\beta) = \beta'(w\otimes s)$. The derivative of $F(\beta)$ w.r.t. $\beta$ is $$ \nabla_\beta F(\beta) = w\otimes s. $$

Now, it is easy to see that $f(B) = F(\beta) = F(\beta(B))$ and that $$ \bigl[\nabla_B f(B)\bigr]_{n,m}=\frac{\partial f(B)}{\partial b_{n,m}}=\frac{\partial F(\beta(B))}{\partial b_{n,m}} = \frac{\partial F(\beta)}{\partial \beta_{(m-1)N+n}} = \bigl[\nabla_\beta F(\beta)\bigr]_{(m-1)N+n} = \bigl[w\otimes s\bigr]_{(m-1)N+n} = w_m s_n $$ assuming $B\in\mathbb{R}^{N\times M}$ and, in turn, $\beta \in \mathbb{R}^{NM}$.

As for the second question, note that $g(B)$ is matrix-valued (since $\nabla_B f(B)$ is a matrix). Then, by the calculations above, $$ \bigl[g(B)\bigr]_{n,m} = w'B's\bigl[\nabla_B f(B)\bigr]_{n,m}=(w_m s_n)w'B's $$ and $$ \frac{\partial \bigl[g(B)\bigr]_{n,m}}{\partial b_{k,l}}=(w_l s_k)(w_m s_n) $$ which means that the "derivative" of $g$ (matrix-valued) w.r.t. $B$ (a matrix) is a four-dimensional object.

dvdgrgrtt
  • 547
  • Thank you for your answer! It helps a lot. I agree that it is a four-dimensional tensor, and I have written it as a matrix using Kronecker product: $\text{vec}(s\omega^\top)\omega^\top\otimes s^\top$. My final goal is to prove $f$ is convex (or not), and this 4-dim thing is a Hessian. After writing it as a matrix, can I show this matrix is PSD and then say $f$ is convex wrt $B$? – Leafstar Mar 08 '25 at 17:39
  • 1
    @Leafstar, what you call $g$ is not the Hessian. The entries of the Hessian are given by $\partial^2 f / \partial b_{m,n} \partial b_{k,l}$ and are identically zero (we have been assuming all the time that $s$ y $w$ are independent of $B$). Indeed, from the $F$ notation it's easy to see that it represents a hyperplane in $\mathbb{R}^{NM+1}$, which is both convex and concave (or neither, if you are interested in strict convexity). – dvdgrgrtt Mar 09 '25 at 08:21