0

Reference is made to a book appendix enter image description here

From my understanding, $\mathbf{F}=[f_1,\ldots,f_m]^T$ is a column vector function of $\mathbf{x}$ on $\mathbb{R}^n$, then

Where in the last equation, $\nabla\mathbf{F}(\nabla\mathbf{F})^T$ is a matrix but $\sum_if_i\nabla^2f_i$ is a column vector? But if applied with vector identity, the expected is $$\nabla\cdot[(\nabla\mathbf{F})\mathbf{F}]=(\nabla^2\mathbf{F})\mathbf{F}+\nabla\mathbf{F}\cdot\nabla\mathbf{F}.$$

Which is the correct one or where is the pitfall I came into?

MathArt
  • 1,568

2 Answers2

1
  • The formula involving $\nabla \cdot [(\nabla F) F]$ computes the divergence (denoted "$\nabla \cdot$") which is not what you want here.
  • $f_i(x)$ is a scalar, and $\nabla^2 f_i(x)$ is the $n \times n$ Hessian of a $\mathbb{R}^n \to \mathbb{R}$ function, so $\sum_{i=1}^m f_i(x) (\nabla^2 f_i(x))$ is a $n \times n$ matrix.
angryavian
  • 93,534
  • Would be appreciated if some tips can be given to tell the difference between $n\times n$ Hessian $\nabla^2 f_i(x)$ and a scalar Laplacian operator $\nabla^2 f_i(x)$. – MathArt Feb 12 '21 at 07:08
  • Just found one answer here https://math.stackexchange.com/questions/1353761. – MathArt Feb 12 '21 at 07:19
1

We have $F(x)=\begin{pmatrix}f_1\\...\\f_m\end{pmatrix}$ and $\nabla F=\begin{pmatrix}\frac d {dx_1}f_1&...&\frac d {dx_1}f_m\\...\\\frac d {dx_n}f_1&...&\frac d {dx_n}f_m\end{pmatrix}$

So

$$\nabla F(x)F(x)=\begin{pmatrix}\sum_{i=1}^m f_i\frac d {dx_1}f_i\\...\\\sum_{i=1}^m f_i\frac d {dx_n}f_i\end{pmatrix}$$

The part of the chain rule that corresponds to the second part is

$$\nabla F(x)F(x)=\begin{pmatrix}\sum_{i=1}^m f_i\underbrace{\frac d {dx_1}f_i}\\...\\\sum_{i=1}^m f_i\underbrace{\frac d {dx_n}f_i}\end{pmatrix}$$

Apply the chain rule to get

$$\nabla\left(\nabla F(x)\right)F(x)=\begin{pmatrix}\sum_{i=1}^m f_i\frac {d^2} {dx_1^2}f_i&...&\sum_{i=1}^m f_i\frac {d^2} {dx_1dx_n}f_i\\...\\\sum_{i=1}^m f_i\frac {d^2} {dx_ndx_1}f_i&...&\sum_{i=1}^m f_i\frac {d^2} {dx_n^2}f_i\end{pmatrix}$$

Does this match the book appendix's $\sum_{i=1}^m f_i(x)\nabla ^2 f_i(x)$?

See $\nabla f_i(x)=\begin{pmatrix}\frac d {dx_1}f_i(x)\\...\\\frac d {dx_n}f_i(x)\end{pmatrix}$ and $\nabla^2 f_i(x)=\begin{pmatrix}\frac {d^2} {dx_1}f_i(x)&...&\frac {d^2} {dx_1dx_n}f_i(x)\\...\\\frac {d^2} {dx_ndx_1}f_i(x)&...&\frac {d^2} {dx_n^2}f_i(x)\end{pmatrix}$

so after you multiply $\nabla^2 f_i(x)$ by $f_i(x)$ and take the sum, yes, it is correct: $\nabla\left(\nabla F(x)\right)F(x)=\sum_{i=1}^m f_i(x)\nabla ^2 f_i(x)$

Vons
  • 11,285
  • Thanks for detailed expression. Would it be possible that $\nabla\left(\nabla F(x)\right)$ is another standard notation of Hessian matrix $\frac{\partial^2F}{\partial x_i \partial x_j}$ which I did not recognize? – MathArt Feb 12 '21 at 07:03
  • The hessian is the second derivative of a scalar-valued function, so I am not exactly sure that you can call the $\nabla(\nabla F(x))$ the hessian. Certainly $\nabla^2 f_i(x)$ is the hessian. Not sure what $\nabla^2 F(x)$ means when F is a vector-valued function, sorry – Vons Feb 12 '21 at 07:25
  • A minor caveat, missing a superscript $^2$ on the first element $\partial x_1$ in the last matrix. Otherwise is perfect. – MathArt Feb 12 '21 at 08:10
  • One more question, would it be correct to express the gradient of the second part $\left(\nabla F(x)\right)\nabla F(x)$ as $\left(\nabla F(x)\right)(\nabla F(x))^T$? – MathArt Feb 12 '21 at 08:14
  • 1
    I think it's the product rule? $\nabla^2f(x)=\nabla(\nabla F(x)F(x))=\nabla^2 F(x)F(x)+\nabla F(x)\nabla F(x)^T$ – Vons Feb 12 '21 at 19:34
  • 1
    What I was after is to build some connection to the identity $\nabla\cdot[(\nabla\mathbf{F})\mathbf{F}]=(\nabla^2\mathbf{F})\mathbf{F}+\nabla\mathbf{F}\cdot\nabla\mathbf{F}.$ – MathArt Feb 15 '21 at 12:04