2

I have an expression

$$ \mathbf{A}\mathbf{s} $$

where $\mathbf{A}$ is an $n \times n$ matrix and $\mathbf{s}$ is a $n \times 1$ vector. The matrix $\mathbf{A}$ is itself a function of $\mathbf{s}$

$$ \mathbf{A} = \mathbf{f}(\mathbf{s})$$

I am wondering how I firstly, compute the derivative of this expression and secondly, numerically estimate the derivative at some vector $\mathbf{s}_0$. My attempt has been to apply the product rule in some way

$$ \frac{d}{d\mathbf{s}} \mathbf{f}(\mathbf{s})\mathbf{s} = \mathbf{f}(\mathbf{s}) + \mathbf{f}'(\mathbf{s})\mathbf{s} $$

noting that $\mathbf{f}'(\mathbf{s})$ is a $(n \times n) \times n$ matrix so that the derivative above is a matrix of dimension $n \times n$ if we consider the term $\mathbf{f}'(\mathbf{s})\mathbf{s}$ as an $n \times n$ matrix expressed with its columns placed below each other.

I am not completely sure whether my approach is correct and additionally, how to proceed from here in terms of numerically evaluating this derivative at a particular vector. I plan to use Matlab to compute a numerical derivative.

Any assistance or input would be greatly appreciated!

2 Answers2

2

Unfortunately, $f'(s)$ is not a matrix, it is a third-order tensor with dimensions $(n\times n\times n)$

Such tensors are awkward to work with. The easiest way to proceed is to vectorize the matrix $$\eqalign{ \def\bbR#1{{\mathbb R}^{#1}} \def\o{{\tt1}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vecc#1{\op{vec}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} a &= \vecc{A} \:&\in\:&\bbR{n^2} \\\ J &= \grad{a}{s} \:&\in\:&\bbR{n^2\times n}\quad&\{ {\,\rm Jacobian\,} \} \\ da &= J\:ds &&&\{ {\,\rm differential\,} \} \\ }$$ Now the gradient calculation will not involve any tensors $$\eqalign{ y &= As \\ dy &= A\:ds + I_ndA\:s \\ &= A\:ds + \LR{s^T\otimes I_n}da \\ &= \LR{A + \LR{s^T\otimes I_n}J}ds \\ \grad{y}{s} &= A + \LR{s^T\otimes I_n}J \\ }$$ where $I_n$ is the identity matrix and $\otimes$ is the Kronecker product.

greg
  • 40,033
1

You can use the matrix product notation in Tom M. Apostol's Calculus book. In this notation the matrices $$ s= \left\lgroup \begin{array}{c} s_{11}\\ s_{21}\\ \vdots \\ s_{i 1} \\ \vdots \\ s_{n1} \end{array}\right\rgroup \qquad \mbox{ and } \qquad f(s)= \left\lgroup \begin{array}{cc c c c c} f_{11}(s) & f_{12}(s) & \ldots & f_{1j}(s) & \ldots & f_{1n}(s) \\ f_{21}(s) & f_{22}(s) & \ldots & f_{2j}(s) & \ldots & f_{1n}(s) \\ \vdots & \vdots & \ldots & \vdots & \ldots & \vdots \\ f_{i1}(s) & f_{i2}(s) & \ldots & f_{ij}(s) & \ldots & f_{in}(s) \\ \vdots & \vdots & \ldots & \vdots & \ldots & \vdots \\ f_{n1}(s) & f_{n2}(s) & \ldots & f_{nj}(s) & \ldots & f_{nn}(s) \end{array}\right\rgroup $$ are denoted most economically by $$ s=( s_{i1})_{i=1}^{\;\;\,n} \in \mathbb{R}^{n\times 1} \qquad \mbox{ and } \qquad f(s)= (f_{ij})_{i=1,\;j=1}^{\;\;\,n ,\;\;\; n} \in \mathbb{R}^{n\times n} $$ Also in this notation the product $f(s)\cdot s \in \mathbb{R}^{n\times 1}$ is $$ f(s)\cdot s = \Big( \sum_{k=1}^{n} f_{ik}(s)\cdot s_{k 1} \Big)_{i=1}^{\;\;\, n} $$ Thus it is clear to see that the $i$-th coordinate function of $s\mapsto f(s)\cdot s$ is the function $$ \mathbb{R}^{n\times 1}\ni s \mapsto \Big( f(s)\cdot s \Big)_{i1} =\sum_{k=1}^{n} f_{ik}(s)\cdot s_{k 1}\in \mathbb{R} $$ whose derivative at the $\ell$-th coordinate of the variable $s$ is \begin{align} D_{\ell}\Big( f(s)\cdot s \Big)_{i1} = & D_{\ell}\Big( \sum_{k=1}^{n} f_{ik}(s)\cdot s_{k 1} \Big) \\ = & \Big(\sum_{k=1}^{n}D_{\ell}f_{ik}(s)\Big)\cdot s_{k 1} + f_{i\ell}(s) \end{align} Thus the gradient of $\Big( f(s)\cdot s \Big)_{i1}$ is the line vector $$ \Big(\sum_{k=1}^{n}D_{\ell}f_{ik}(s)\cdot s_{k 1} + f_{i\ell}(s)\Big)_{\ell =1 }^{\;\;\;\,n} \in \mathbb{R}^{1\times n} $$ Therefore the gradient of $$ \mathbb{R}^{n \times 1} \ni s \longmapsto \Big\lgroup\big( f(s)\cdot s \big)_{i1}\Big\rgroup_{i=1}^{\;\;\, n}= \left\lgroup \begin{array}{c} \big( f(s)\cdot s \big)_{11}\\ \vdots \\ \big( f(s)\cdot s \big)_{i1} \\ \vdots \\ \big( f(s)\cdot s \big)_{n1} \end{array}\right\rgroup $$ is the matrix $$ \Big\lgroup\sum_{k=1}^{n}D_{\ell}f_{ik}(s)\cdot s_{k 1}\Big\rgroup_{i=1\,\ell=1}^{\;\;\, n \;\;\, n} + \Big\lgroup f_{i\ell}(s)\Big\rgroup_{i=1\,\ell=1}^{\;\;\, n \;\;\, n} $$

Elias Costa
  • 15,282